Strumenti Utente

Strumenti Sito


dm:start

Data Mining A.A. 2017/18

DM 1: Foundations of Data Mining (6 CFU)

Instructors - Docenti:

Teaching assistant - Assistente:

DM 2: Advanced topics on Data Mining and case studies (6 CFU)

Instructors:

DM: Data Mining (9 CFU)

Instructors:

Teaching assistant - Assistente:

News

  • Mid-term Exam question time: Wed 15th November 16.00-17.00 in Anna Monreale's office.
  • The Results of the first mid-term are online 2017-10-30-first-midterm.pdf. For any problem and question about the written exam please contact Riccardo Guidotti and Anna Monreale.
  • Please, fill the doodle about the project group. In the field of Participant you should insert the list of surnames of the group components. I sent the doodle link by email. If you do not have access to the DM group please send me an email following the below instructions.
  • Please, send to anna [dot] monreale [at] unipi [dot] it an email with:
    1. subject: DATA MINING
    2. content: your name, your surname, your studentID, the credits of your exam (12CFU, 6CFU, 9CFU)

Learning goals -- Obiettivi del corso

… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.

Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.

La grande disponibilità di dati provenienti da database relazionali, dal web o da altre sorgenti motiva lo studio di tecniche di analisi dei dati che permettano una migliore comprensione ed un più facile utilizzo dei risultati nei processi decisionali. L'obiettivo del corso è quello di fornire un'introduzione ai concetti di base del processo di estrazione di conoscenza, alle principali tecniche di data mining ed ai relativi algoritmi. Particolare enfasi è dedicata agli aspetti metodologici presentati mediante alcune classi di applicazioni paradigmatiche quali il Basket Market Analysis, la segmentazione di mercato, il rilevamento di frodi. Infine il corso introduce gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza. Il corso consiste delle seguenti parti:

  1. i concetti di base del processo di estrazione della conoscenza: studio e preparazione dei dati, forme dei dati, misure e similarità dei dati;
  2. le principali tecniche di datamining (regole associative, classificazione e clustering). Di queste tecniche si studieranno gli aspetti formali e implementativi;
  3. alcuni casi di studio nell’ambito del marketing e del supporto alla gestione clienti, del rilevamento di frodi e di studi epidemiologici.
  4. l’ultima parte del corso ha l’obiettivo di introdurre gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza

Reading about the "data scientist" job

  • Data, data everywhere. The Economist, Feb. 2010 download
  • Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011 link
  • Welcome to the yotta world. The Economist, Sept. 2011 download
  • Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, Sept 2012 link
  • Il futuro è già scritto in Big Data. Il SOle 24 Ore, Sept 2012 link
  • Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics download
  • Peter Sondergaard, Gartner, Says Big Data Creates Big Jobs. Oct 22, 2012: YouTube video
  • Towards Effective Decision-Making Through Data Visualization: Six World-Class Enterprises Show The Way. White paper at FusionCharts.com. download

Hours - Orario e Aule

DM1 & DM

Classes - Lezioni

Day of Week Hour Room
Mercoledì/Wednesday 14:00 - 16:00 Aula C1
Giovedì/Thursday 16:00 - 18:00 Aula C1
Venerdì/Friday 11:00 - 13:00 Aula A1

Office hours - Ricevimento:

  • Prof. Pedreschi: Lunedì/Monday h 14:00 - 16:00, Dipartimento di Informatica
  • Prof. Monreale: Giovedì/Thursday h 14:00 - 16:00, Dipartimento di Informatica
  • Dr. Guidotti: Mercoledì/Wednesday h 16:00 - 18:00, Dipartimento di Informatica

DM 2

Classes - Lezioni

Day of week Hour Room

Office hours - Ricevimento:

  • Nanni : appointment by email, c/o ISTI-CNR

Learning Material -- Materiale didattico

Textbook -- Libro di Testo

  • Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining. Addison Wesley, ISBN 0-321-32136-7, 2006
  • Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. GUIDE TO INTELLIGENT DATA ANALYSIS. Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7
  • Laura Igual et al. Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications. 1st ed. 2017 Edition.

Slides of the classes -- Slides del corso

Le slide utilizzate durante il corso verranno inserite nel calendario al termine di ogni lezione. In buona parte esse sono tratte da quelle fornite dagli autori del libro di testo: Slides per "Introduction to Data Mining"

Past Exams

Data mining software

Class calendar - Calendario delle lezioni (2017-2018)

First part of course, first semester (DM1 - Data mining: foundations & DM - Data Mining)

Day Aula Topic Learning material Instructor
1. 20.09.2017 14:00-16:00 C1 Introduction Pedreschi
2. 21.09.2017 16:00-18:00 C1 Introduction Introduction Pedreschi
3. 22.09.2017 11:00-13:00 A1 Lecture canceled Data Understanding Pedreschi
4. 27.09.2017 14:00-16:00 C1 Data Understanding Data Understanding For this topic we suggest: “Guide to Intelligent Data Analysis” Monreale
5. 28.09.2017 16:00-18:00 C1 Introduction to Python, Knime python_tutorial knime_tutorial Monreale/Guidotti
6. 29.09.2017 11:00-13:00 A1 Data Understanding Monreale
7. 04.10.2017 14:00-16:00 C1 Data Preparation 4.data_preparation.pdf Pedreschi
8. 05.10.2017 16:00-18:00 C1 Data Preparation Pedreschi
9. 06.10.2017 11:00-13:00 A1 Canceled
10. 11.10.2017 14:00-16:00 C1 Knime - Python: Data Understanding Pandas knime_data_understanding python_data_understanding Pedreschi/Guidotti
11. 12.10.2017 16:00-18:00 C1 Clustering analysis: Centroid-based methods. dm2014_clustering_intro.pdf dm2014_clustering_kmeans.pdf Pedreschi
12. 13.10.2017 11:00-13:00 A1 Hierarchical methods. dm2014_clustering_hierarchical.pdf Monreale
13. 18.10.2017 14:00-16:00 C1 Clustering analysis: Density-based methods. Exercises on Data Understanding dm2014_clustering_dbscan.pdf exercises-dm1.pdf Monreale/Guidotti
14. 19.10.2017 16:00-18:00 C1 Exercises on Clustering Online Didactic Data Mining Monreale/Guidotti
15. 20.10.2017 11:00-13:00 A1 Knime - Python: Clustering knime_clustering python_clustering Monreale/Guidotti
16. 25.10.2017 14:00-16:00 C1 Clustering Validation dm2014_clustering_validation.pdf Monreale
17. 26.10.2017 16:00-18:00 C1 Exercises on Clustering 2016-01-18-dm1-prima.pdf dm-clustering.pdf Monreale
18. 27.10.2017 11:00-13:00 A1 Canceled
30.10.2017 14:00-18:00 A1,C1 First Mid-term test
19. 08.11.2017 14:00-16:00 C1 Frequent Pattern & Association Rules restructured_assoc.pdf Chapter 6 of textbook (avoid sections 6.4.2, 6.5, 6.6, 6.7.2, 6.7.2, 6.8) Pedreschi
20. 09.11.2017 16:00-18:00 C1 Frequent Pattern & Association Rules Pedreschi
21. 10.11.2017 11:00-13:00 A1 Knime - Frequent Patterns & Association Rules knime_pattern_mining python_pattern_mining Borgelt Web Page Guidotti / Pedreschi
22. 15.11.2017 14:00-16:00 C1 Classification/1 11.chap4_basic_classification.pdf Pedreschi
23. 16.11.2017 16:00-18:00 C1 Classification/2 Monreale
24. 17.11.2017 11:00-13:00 A1 Knime - Python: Classification knime_classification python_classification Guidotti/Pedreschi
25. 22.11.2017 14:00-16:00 C1 Classification/3 Pedreschi
26. 23.11.2017 16:00-18:00 C1 Exercises on Classification & Frequent Patterns Guidotti/Pedreschi
24.11.2017 11:00-13:00 A1 Canceled – The next lectures are dedicated to the DM of 9 credits
27. 29.11.2017 14:00-16:00 C1 Alternative methods for classification/1 Pedreschi
28. 30.11.2017 16:00-18:00 C1 Alternative methods for classification/2 Pedreschi
29. 01.12.2017 11:00-13:00 A1 Alternative methods for classification/3 Pedreschi
30. 06.12.2017 14:00-16:00 C1 Exercises on alternative method for classification Monreale
31. 07.12.2017 16:00-18:00 C1 Alternative methods for clustering. Monreale
32. 13.12.2017 14:00-16:00 C1 Transactional Clustering Monreale
33. 14.12.2017 16:00-18:00 C1 Alternative methods for frequent patterns and AR Monreale
34. 15.12.2017 11:00-13:00 A1 Exercises on the second part of the course Guidotti/Pedreschi
35. 20.12.2017 11:00-13:00 A1,C1 Second Mid-term test: See Mid-term section for details

Second part of course, second semester (DMA - Data mining: advanced topics and case studies)

  • To Be Defined

Exams

Exam DM part I (DMF)

The exam is composed of three parts:

  • A written exam, with exercises and questions about methods and algorithms presented during the classes. It can be substitute with the first and second mid-term tests of November and December.
  • An oral exam, that includes: (1) discussing the project report with a group presentation; (2) discussing topics presented during the classes, including the theory of the parts already covered by the written exam.
  • A project consists in exercises that require the use of data mining tools for analysis of data. Exercises include: data understanding, clustering analysis, frequent pattern mining, and classification. The project has to be performed by max 4 people. It has to be performed by using Knime, Python or a combination of them. The results of the different tasks must reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The project must be delivered at least 2 days before the oral exam. The paper must emailed to datamining [dot] unipi [at] gmail [dot] com. Please, use “[DM 2017-2018] Project” in the subject. Tasks of the project:
    1. Data Understanding (Assigned on: 03/10/2017): Explore the dataset with the analytical tools studied and write a concise “data understanding” report describing data semantics, assessing data quality, the distribution of the variables and the pairwise correlations.
    2. Clustering analysis (Assigned on: 14/11/2017): Explore the dataset using various clustering techniques. Carefully describe your's decisions for each algorithm and which are the advantages provided by the different approaches. (see Guidelines for details)
    3. Association Rules (Assigned on: 21/11/2017): Explore the dataset using frequent pattern mining and association rules extraction. Then use them to predict a variable either for replacing missing values or to predict if an employee will leave prematurely or not. (see Guidelines for details)
    4. Classification (Assigned on: 12/12/2017): Explore the dataset using classification trees and random forest. Use them to predict if an employee will leave prematurely or not. (see Guidelines for details)

Guidelines for the project are here.

Exam DM part II (DMA)

The exam is composed of three parts:

  • A written exam, with exercises and questions about methods and algorithms presented during the classes. It can be substitute with the first and second mid-term tests of April and June.
  • An oral exam, that includes: (1) discussing the project report with a group presentation; (2) discussing topics presented during the classes, including the theory of the parts already covered by the written exam.
  • A project consists in exercises that require the use of data mining tools for analysis of data. Exercises include: sequential patterns, time series, classification (alternative methods and validation), outlier detection. The project has to be performed by max 3 people. It has to be performed by using Knime, Python, other software or a combination of them. The results of the different tasks must reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The project must be delivered at least 2 days before the oral exam.

Appelli di esame

Mid-term exams

Date Hour Place Notes Marks
First Mid-term 2017 30.10.2017 14:00 - 17:00 Room A1, C1 Please, use the system for registration: https://esami.unipi.it/
Second Mid-term 2017 20.12.2017 14:00 - 17:00 Room A1, C1 Please, use the system for registration: https://esami.unipi.it/

Appelli regolari / Exam sessions

Session Date Time Room Notes Marks
1. 10 Jan 2018 09:00 C1 Oral exam for students who passed the mid-term exam and delivered the project work. https://esami.unipi.it/
2. 17 Jan 2018 09:00 A1 Witten Exam. In the same date we will define the dates for the next oral exams. https://esami.unipi.it/
3. 06 Feb 2018 09:00 C Witten Exam. In the same date we will define the dates for the next oral exams. https://esami.unipi.it/

Appelli straordinari A.A. 2016/17 / Extra sessions A.A. 2016/17

Date Time Room Notes Results
30.10.2017 14:00 - 18:00 Room A1, C1
20.12.2017 Room A1, C1

Previous years

dm/start.txt · Ultima modifica: 20/11/2017 alle 09:11 (3 giorni fa) da Anna Monreale