Strumenti Utente

Strumenti Sito


bigdataanalytics:bda:bda2015

Big Data Analytics A.A. 2015/16

Instructors - Docenti:

Learning goals -- Obiettivi del corso

In our digital society, every human activity is mediated by information technologies. Therefore, every activity leaves digital traces behind, that can be stored in some repository. Phone call records, transaction  records, web search logs, movement trajectories, social media texts and tweets, Every minute, an  avalanche of “big data” is produced by humans, consciously or not, that represents a novel, accurate digital  proxy of social activities at global scale. Big data provide an unprecedented “social microscope”, a novel  opportunity to understand the complexity of our societies, and a paradigm shift for the social sciences. This course is an introduction to the emergent field of big data analytics and social mining, aimed at  acquiring and analyzing big data from multiple sources to the purpose of discovering the patterns and  models of human behavior that explain social phenomena. The focus is on what can be learnt from big  data in different domains: mobility and transportation, urban planning, demographics, economics, social  relationships, opinion and sentiment, etc.; and on the analytical and mining methods that can be used. An  introduction to scalable analytics is also given, using the “map-reduce” paradigm.

Module1: Big Data Analytics and Social Mining

In this module complex analytical methods and processes are presented thought exemplar cases studies in challenging domains organized according the following topics:

  • Big Data scenario
  • Opportunities, risks, Big data sources, social sensing challenges
  • Big data analytics and social mining methods
  • Mobility data analytics: understanding human mobility with GPS traces
  • Mobility data analytics: understanding City dynamics with GSM traces
  • Novel Demographic Indicators with GSM traces
  • Social Media Mining
  • Sport Analytics - Elenco numerato
  • Ethics issues in Big Data Analytics

Module2: Scalable Data Analytics Technologies

This module will provide to the students the technologies to collect, manipulate and process big data. In particular the following tools will be presented:

  • R
  • Web Scraping
  • Hadoop
  • Spark and MLlib
  • Hive: schema and data storage

Module3: Students projects

The students will realize analytical projects in team work. Discussions and presentation in class at different stages of the project execution are forecasted.

Hours - Orario e Aule

Classes - Lezioni

Giorno Orario Aula
Lunedì/Monday 11:00 - 13:00 Aula Fib N1
Venerdì/Friday 09:00 - 11:00 Aula Fib L1

Office hours - Ricevimento:

  • Roberto Trasarti: Venerdì/Friday h 11:30 - 12:30, CNR Room: C36b (Entrance n.20)

Learning Material -- Materiale didattico

Two alternatives VM:

Class calendar - Calendario delle lezioni

Day Aula Topic Materials/Notes Student Presentation Instructor
1. 21.09, 11-13 N1 MOD1 Course Presentation https://goo.gl/4HKzkf Giannotti/Trasarti
2. 25.09, 09-11 L1 MOD1 Big data scenario https://goo.gl/0xY0a7 Giannotti
3. 02.10, 09-11 L1 MOD2 Introduction to Hadoop https://goo.gl/0UiFg8 Trasarti
4. 05.10, 11-13 N1 MOD2 Hadoop Ecosystem & Design Patterns https://goo.gl/rmRZ5C https://goo.gl/9GSlgZ Trasarti
5. 09.10, 09-11 L1 MOD2 Hadoop ground level [LAB] https://goo.gl/aOn0rm https://goo.gl/luhYzB Trasarti
6. 12.10, 11-13 N1 MOD2 Analyzing data with Python [LAB] https://goo.gl/7NgdVE Trasarti*
7. 19.10, 11-13 N1 MOD2 Web Scraping [LAB] https://goo.gl/8SojKJ (P1) Trasarti*
8. 23.10, 09-11 L1 MOD1 Understanding Human Mobility with GPS traces https://goo.gl/km9199 https://goo.gl/6tjhyJ https://goo.gl/k5XRLj https://goo.gl/u6Q04b (P2) Giannotti
9. 26.10, 11-13 N1 MOD1 City dynamics with Mobile Phone Traces Giannotti
10. 30.10, 09-11 L1 MOD1 Novel Demography with Phone Traces Project Assignment https://goo.gl/11ZfYm Giannotti
11. 09.11, 11-13 N1 MOD2 Hive [LAB] Student Groups definition https://goo.gl/CLFgJV (P3) Trasarti
12. 13.11, 09-11 L1 MOD2 Spark [LAB] https://goo.gl/niOL5z (P4) Trasarti
13. 16.11, 11-13 N1 MOD1 Sport analytics https://goo.gl/ntt1S8 (P5) Giannotti
14. 20.11, 09-11 L1 Project proposal presentations Giannotti/Trasarti
15. 23.11, 11-13 N1 MOD2 Data Mining with Spark [LAB] https://goo.gl/Xoz6Hl (P6) Trasarti
16. 27.11, 09-11 L1 MOD2 Introducing R [LAB] https://goo.gl/98dF4x Trasarti
17. 30.11, 11-13 N1 Project alignment (P7,8) Trasarti
18. 04.12, 09-11 L1 MOD1 Sentiment analysis https://goo.gl/Sf8KDL Giannotti*
19. 11.12, 09-11 L1 MOD3 Open Lab/Discussion [LAB] Project preliminary results (Taxi Group) (P9,10,11) Giannotti/Trasarti
20. 14.12, 11-13 N1 MOD3 Open Lab/Discussion [LAB] Project preliminary results (Reddit and Crime Groups) Giannotti/Trasarti
  • * - Some guest will be at the lesson to provide his/her expertise on the topic.
  • [LAB] - Bring your laptop in class, some practical example will be shown.

Exam

The exam is composed by two parts:

  • A project, assigned among those proposed during the classes, or proposed by the students themselves. In the latter case, they are invited to submit a short project proposal (max. 2 pages) describing the data to use and the analysis objectives and to prepare a presentation of 15 minutes. The work done should be summarized in a report (max. 10 pages), to be sent to the teachers at least a week before the oral exam (project discussion) with a presentation of 30 minutes. https://goo.gl/ike9vi
  • An oral exam, that includes: (1) discussing the project report with a group presentation (15 minutes for all the group); (2) A small presentation describing the analytical process from a research paper (10 minutes for each student).

Papers assignment

Paper Link Student Discussion day
P1. Twitter as an indicator for whareabouts of peole? https://goo.gl/Vk7Sox Florencio Paucar Sedano 19/10
P2. Explaining International Migration with skype data https://goo.gl/IlJSmm Pierluca Serra 23/10
P3. Big Data System for Analyzing Risky Procurement Entities https://goo.gl/N2u3Lx Marco Vicariucci 9/11
P4. Detecting and understanding big events in big cities (GSM data) https://goo.gl/sGDeZ3 Tommaso Inghirami 13/11
P5. CoMobile – Human Mobility with Mobile Phone https://goo.gl/ZqVKB8 William Tisdall 16/11
P6. Estimating Potential Customers Anywhere and Anytime Based on Location-Based Social Networks https://goo.gl/rJKVqQ Martina Vasapollo 23/11
P7. Product Assortment and Customer mobility https://goo.gl/gwUbwy Victoria Kotova 30/11
P8. Analyzing traffic with GPS using R https://goo.gl/RHVOZR Andrea Buccarella 30/11
P9. Tweet Sentiment: From Classification to Quantification http://goo.gl/tWe3xm Alessandro Marrella 11/12
P10. Small Area Model-Based Estimators Using Big Data https://goo.gl/ZIRzLU Raffaele Grezzi 11/12
P11. The purpouse of motion (GPS data) https://goo.gl/clw8vd Filippo Todeschini 11/12

The presentation must be 5 slides:

  1. Data description
  2. Problem statement
  3. Data manipulation
  4. The analytical process and the results
  5. Validation

Please send the slides to both of us by e-mail (or a link is the size is over 5 MB)

bigdataanalytics/bda/bda2015.txt · Ultima modifica: 12/09/2016 alle 13:00 (3 anni fa) da Roberto Trasarti