Strumenti Utente

Strumenti Sito


bigdataanalytics:bda:start

Big Data Analytics A.A. 2018/19

Instructors - Docenti:

Notice: you can find a list of the papers to read at this link: http://bit.ly/bda_papers. Send an email to Luca Pappalardo with your choice for three papers. We then assign you one of the papers.

Learning goals

In our digital society, every human activity is mediated by information technologies, hence leaving digital traces behind. These massive traces are stored in some, public or private, repository: phone call records, movement trajectories, soccer-logs and social media records are all examples of “Big Data”, a novel and powerful “social microscope” to understand the complexity of our societies. The analysis of big data sources is a complex task, involving the knowledge of several technological and methodological tools. This course has three objectives:

  • introducing to the emergent field of big data analytics and social mining;
  • introducing to the technological scenario of big data, like programming tools to analyze big data, query NoSQL databases, and perform predictive modeling;
  • guide students to the development of a open-source and reproducible big data analytics project, based on the analyis of real-world datasets.

Module 1: Big Data Analytics and Social Mining

In this module, analytical methods and processes are presented thought exemplary cases studies in challenging domains, organized according to the following topics:

  • The Big Data Scenario and the new questions to be answered
  • Sport Analytics:
    1. Soccer data landscape and injury prediction
    2. Analysis and evolution of sports performance
  • Mobility Analytics
    1. Mobility data landscape and mobility data mining methods
    2. Understanding Human Mobility with vehicular sensors (GPS)
    3. Mobility Analytics: Novel Demography with mobile-phone data
  • Social Media Mining
    1. The social media data landscape: Facebook, Linked-in, Twitter, Last_FM
    2. Sentiment analysis. example from human migration studies
    3. Discussion on ethical issues of Big Data Analytics
  • Well-being&Now-casting
    1. Nowcasting influenza with retail market data
    2. Predicting well-being from human mobility patterns
  • Paper presentations by students

Module 2: Big Data Analytics Technologies

This module will provide to the students the technologies to collect, manipulate and process big data. In particular the following tools will be presented:

  • Python for Data Science
  • The Jupyter Notebook: developing open-source and reproducible data science
  • MongoDB: fast querying and aggregation in NoSQL databases
  • GeoPandas: analyze geo-spatial data with Python
  • Scikit-learn: programming tools for data mining and analysis
  • M-Atlas: a toolkit for mobility data mining

Module 3: Laboratory for Interactive Project Development

During the course, teams of students will be guided in the development of a big data analytics project. The projects will be based on real-world datasets covering several thematic areas. Discussions and presentation in class, at different stages of the project execution, will be performed.

  • Data Understanding and Project Formulation
  • Mid Term Project Results
  • Final Project results

Calendar

17/09 (Mod. 1) Introduction to the course, The Big Data scenario mod1.introduction_bigdatalandscape_newquestions_.pdf

21/09 (Mod. 1) Big Data Analytics: new questions to be solved + Presentation of datasets

24/09 (Mod. 2) Python for Data Science: The Jupyter Notebook: developing open-source and reproducible data science

28/09 (Mod. 1) Soccer data landscape and players’ injury prediction

01/10 (Mod. 2) Scikit-learn: programming tools for data mining and analysis.

05/10 (Mod. 1) Analysis and evolution of sports performance

08/10 (Mod. 1) The mobility data landscape and mobility data mining methods

12/10 (Mod. 1) Soccer Data Challenge

15/10 (Mod. 1) Understanding Human Mobility with GPS

19/10 (Mod. 3) Data Understanding and Project Formulation

22/10 (Mod. 2) MongoDB: fast querying and aggregation in NoSQL databases

05/11 (Mod. 2) GeoPandas: analyze geo-spatial data with Python

09/11 (Mod. 1) Predicting well-being from human mobility patterns

12/11 (Mod. 1) Nowcasting influenza with retail market data

16/11 (Mod. 1) papers presentation

19/11 (Mod. 1) papers presentation

23/11 (Mod. 3) Mid Term Project Results

26/11 (Mod. 1) The social media data landscape and social media mining methods

30/11 No lessons

03/12 (Mod. 1) Sentiment analysis: examples from Human Migration studies

07/12 (Mod. 1) Discussion on Ethical issues in Big Data Analytics

10/12 (Mod. 3) Final Project results

14/12 (Mod. 3) Final Project results

12/01 14,00 @ CNR (Entrance 20 - Room C36b) - Exam

Exam

The two mid-terms will be 40% of the final grade, the remaining 60% is the evaluation of the Project and the Discussion (prepare some Slides to present your project). There is the possibility to do the a final test about technologies if the Mid-Terms are not sufficient.

The following table describe the expected content of a project:

Previous Big Data Analytics websites

bigdataanalytics/bda/start.txt · Ultima modifica: 08/10/2018 alle 09:02 (7 giorni fa) da Luca Pappalardo