Strumenti Utente

Strumenti Sito


bigdataanalytics:bda:start

Big Data Analytics A.A. 2017/18

Instructors - Docenti:

Learning goals -- Obiettivi del corso

Objective In our digital society, every human activity is mediated by information technologies. Therefore, every activity leaves digital traces behind, that can be stored in some repository. Phone call records, transaction records, web search logs, movement trajectories, social media texts and tweets, Every minute, an avalanche of “big data” is produced by humans, consciously or not, that represents a novel, accurate digital proxy of social activities at global scale. Big data provide an unprecedented “social microscope”, a novel opportunity to understand the complexity of our societies, and a paradigm shift for the social sciences. Objective of the course is twofold: an introduction to the emergent field of big data analytics and social mining, aimed at acquiring and analyzing big data from multiple sources to the purpose of discovering the patterns and models of human behavior that explain social phenomena and an introduction to the technological scenario of scalable analytics.

Intro lectures

Lecture 1: Course Presentation, Course organization, Big Data Landscape: Opportunities, risks, big data sources, challenges. Slides:https://goo.gl/WztPDg

Lecture 2-3: Big Data Analytic scenario, New questions to be answered, Panel: “What is Big Data?”, The 4Vs of Big Data, Dataset presentation: GPS Traces, Call data records, LastFM log, Sport data, Scientific collaborations, Store-Product data

Technologies lectures:

Lecture 1-2: Introduction to Parallel computing and Hadoop Overview/Recall parallel computing Introduction to Hadoop Map-Reduce Patterns

Lecture 2-3: HDFS and Spark Managing HDFS Actions and Transformations in Spark

Lecture 4-5-6: Data Mining with Spark and Mllib

Lecture 7: NoSQL systems Pig vs Hive Other NoSQL systems Case of study implementation

Methodological scenarios lectures:

Lecture 1-2: What is possible to observe with Mobile Phone Data? Formulation of novel questions to be answered: estimating population, understanding city dynamics, estimating unemployment or gender Distribution, Wellbeing; The complexity of feature construction; Model Construction; new mining algorithms; validation strategies.

Lecture 3-4: What is possible to observe with GPS data? Formulation of novel questions to be answered: Understanding Human Mobility; the complexity of feature construction, new Model Construction, ew mining algorithms; validation strategies.

Lecture 5-6: What is possible to observe with Social Media Data? Formulation of novel questions to be answered: Understanding Sentiment, Wellbeing, Happyness; the complexity of feature construction, new Model Construction, ew mining algorithms; validation strategies.

Lecture 7: What is possible to observe with IoT Data? Formulation of novel questions to be answered: Understanding performance in Sport; the complexity of feature construction, new Model Construction, ew mining algorithms; validation strategies.

Calendar

18/09 - (Intro.Part1) Course Presentation, Big Data Landscape

22/09 - (Tech.Part1) Overview/Recall parallel computing

25/09 - (Intro.Part2) New questions to be answered and the Big Data Analytic Technological scenario

29/09 - (Method.Part1) What is possible to do observe with Mobile Phone Data?

02/10 - (Tech.Part1) Introduction to Hadoop e Design Pattern. Datasets presentation

06/10 - (Method.Part1) What is possible to do observe with Mobile Phone Data?

09/10 - (Tech.Part2) Managing HDFS and Introduction to Spark (Lab)

13/10 - (Tech.Part2) Data Analytic with Spark (Lab)

16/10 - (Tech.Part2) Data Analytic with Spark (Lab)

20-23-27/10 - No Class (Time to practice!)

30/10 Mid-term Tech I

6/11 - (Tech.Part3) Data Mining with Spark and Mllib I (Lab)

10/11 - (Method.Part2) What is possible to do observe with GPS data?

13/11 - (Tech.Part3) Data Mining with Spark and Mllib II (Lab)

17/11 - (Method.Part2) What is possible to do observe with GPS data?

20/11 - Discussing the final project proposal - Collective discussion (not evaluated)

24/11 - (Tech.Part3) Data Mining with Spark and Mllib III (Lab)

27/11 - (Tech.Part3) NoSQL Systems

01/12 - (Method.Part3) What is possible to do observe with Social Media Data?

4/12 - (Tech.Part4) Implementing a big data analytical process

11/12 - (Method.Part3) What is possible to do observe with Social Media Data?

15/12 - (Method.Part3) What is possible to do observe with IoT data: examples from sport ?

18/12 Mid-term Tech II and Final Project proposal

15/01 - 16/02 Final Project and Discussion

Exam

The two mid-terms will be 40% of the final grade, the remaining 60% is the evaluation of the Project and the Discussion. There is the possibility to do the a final test about technologies if the Mid-Terms are not sufficient.

Laboratories

Student should bring their own laptop (especially for technology lectures).

Software & Links

Virtual Machines:

Previous Big Data Analytics websites

bigdataanalytics/bda/start.txt · Ultima modifica: 21/09/2017 alle 17:21 (49 minuti fa) da Roberto Trasarti