Instructors - Docenti:
In our digital society, every human activity is mediated by information technologies. Therefore, every activity leaves digital traces behind, that can be stored in some repository. Phone call records, transaction records, web search logs, movement trajectories, social media texts and tweets, Every minute, an avalanche of “big data” is produced by humans, consciously or not, that represents a novel, accurate digital proxy of social activities at global scale. Big data provide an unprecedented “social microscope”, a novel opportunity to understand the complexity of our societies, and a paradigm shift for the social sciences. This course is an introduction to the emergent field of big data analytics and social mining, aimed at acquiring and analyzing big data from multiple sources to the purpose of discovering the patterns and models of human behavior that explain social phenomena. The focus is on what can be learnt from big data in different domains: mobility and transportation, urban planning, demographics, economics, social relationships, opinion and sentiment, etc.; and on the analytical and mining methods that can be used. An introduction to scalable analytics is also given, using the “map-reduce” paradigm.
In this module complex analytical methods and processes are presented thought exemplar cases studies in challenging domains organized according the following topics:
This module will provide to the students the technologies to collect, manipulate and process big data. In particular the following tools will be presented:
The students will realize analytical projects in team work. Discussions and presentation in class at different stages of the project execution are forecasted.
Classes - Lezioni
Giorno | Orario | Aula |
---|---|---|
Lunedì/Monday | 11:00 - 13:00 | Aula Fib N1 |
Venerdì/Friday | 09:00 - 11:00 | Aula Fib L1 |
Office hours - Ricevimento:
Two alternatives VM:
Day | Aula | Topic | Materials/Notes | Student Presentation | Instructor | |
---|---|---|---|---|---|---|
1. | 21.09, 11-13 | N1 | MOD1 Course Presentation | https://goo.gl/4HKzkf | Giannotti/Trasarti | |
2. | 25.09, 09-11 | L1 | MOD1 Big data scenario | https://goo.gl/0xY0a7 | Giannotti | |
3. | 02.10, 09-11 | L1 | MOD2 Introduction to Hadoop | https://goo.gl/0UiFg8 | Trasarti | |
4. | 05.10, 11-13 | N1 | MOD2 Hadoop Ecosystem & Design Patterns | https://goo.gl/rmRZ5C https://goo.gl/9GSlgZ | Trasarti | |
5. | 09.10, 09-11 | L1 | MOD2 Hadoop ground level [LAB] | https://goo.gl/aOn0rm https://goo.gl/luhYzB | Trasarti | |
6. | 12.10, 11-13 | N1 | MOD2 Analyzing data with Python [LAB] | https://goo.gl/7NgdVE | Trasarti* | |
7. | 19.10, 11-13 | N1 | MOD2 Web Scraping [LAB] | https://goo.gl/8SojKJ | (P1) | Trasarti* |
8. | 23.10, 09-11 | L1 | MOD1 Understanding Human Mobility with GPS traces | https://goo.gl/km9199 https://goo.gl/6tjhyJ https://goo.gl/k5XRLj https://goo.gl/u6Q04b | (P2) | Giannotti |
9. | 26.10, 11-13 | N1 | MOD1 City dynamics with Mobile Phone Traces | Giannotti | ||
10. | 30.10, 09-11 | L1 | MOD1 Novel Demography with Phone Traces | Project Assignment https://goo.gl/11ZfYm | Giannotti | |
11. | 09.11, 11-13 | N1 | MOD2 Hive [LAB] | Student Groups definition https://goo.gl/CLFgJV | (P3) | Trasarti |
12. | 13.11, 09-11 | L1 | MOD2 Spark [LAB] | https://goo.gl/niOL5z | (P4) | Trasarti |
13. | 16.11, 11-13 | N1 | MOD1 Sport analytics | https://goo.gl/ntt1S8 | (P5) | Giannotti |
14. | 20.11, 09-11 | L1 | Project proposal presentations | Giannotti/Trasarti | ||
15. | 23.11, 11-13 | N1 | MOD2 Data Mining with Spark [LAB] | https://goo.gl/Xoz6Hl | (P6) | Trasarti |
16. | 27.11, 09-11 | L1 | MOD2 Introducing R [LAB] | https://goo.gl/98dF4x | Trasarti | |
17. | 30.11, 11-13 | N1 | Project alignment | (P7,8) | Trasarti | |
18. | 04.12, 09-11 | L1 | MOD1 Sentiment analysis | https://goo.gl/Sf8KDL | Giannotti* | |
19. | 11.12, 09-11 | L1 | MOD3 Open Lab/Discussion [LAB] | Project preliminary results (Taxi Group) | (P9,10,11) | Giannotti/Trasarti |
20. | 14.12, 11-13 | N1 | MOD3 Open Lab/Discussion [LAB] | Project preliminary results (Reddit and Crime Groups) | Giannotti/Trasarti |
The exam is composed by two parts:
Paper | Link | Student | Discussion day | |
---|---|---|---|---|
P1. | Twitter as an indicator for whareabouts of peole? | https://goo.gl/Vk7Sox | Florencio Paucar Sedano | 19/10 |
P2. | Explaining International Migration with skype data | https://goo.gl/IlJSmm | Pierluca Serra | 23/10 |
P3. | Big Data System for Analyzing Risky Procurement Entities | https://goo.gl/N2u3Lx | Marco Vicariucci | 9/11 |
P4. | Detecting and understanding big events in big cities (GSM data) | https://goo.gl/sGDeZ3 | Tommaso Inghirami | 13/11 |
P5. | CoMobile – Human Mobility with Mobile Phone | https://goo.gl/ZqVKB8 | William Tisdall | 16/11 |
P6. | Estimating Potential Customers Anywhere and Anytime Based on Location-Based Social Networks | https://goo.gl/rJKVqQ | Martina Vasapollo | 23/11 |
P7. | Product Assortment and Customer mobility | https://goo.gl/gwUbwy | Victoria Kotova | 30/11 |
P8. | Analyzing traffic with GPS using R | https://goo.gl/RHVOZR | Andrea Buccarella | 30/11 |
P9. | Tweet Sentiment: From Classification to Quantification | http://goo.gl/tWe3xm | Alessandro Marrella | 11/12 |
P10. | Small Area Model-Based Estimators Using Big Data | https://goo.gl/ZIRzLU | Raffaele Grezzi | 11/12 |
P11. | The purpouse of motion (GPS data) | https://goo.gl/clw8vd | Filippo Todeschini | 11/12 |
The presentation must be 5 slides:
Please send the slides to both of us by e-mail (or a link is the size is over 5 MB)