Indice

Data Mining (309AA) - 9 CFU A.Y. 2025/2026

Instructors:

Teaching Assistant:

News

Learning Goals

The Data Mining course tackles the analysis of large collections of data, and the extraction of information and patterns. It aims to explore core components of the Knowledge Discovery from Data (KDD) process, and focuses on:

Schedule

Classes

Day of Week Hour Room
Tuesday 11:00 - 13:00 Room C
Wednesday 14:00 - 16:00 Room C
Thursday 14:00 - 16:00 Room A1

Office hours - Ricevimento:

A Teams Channel will be used ONLY to post news, Q&A, and other stuff related to the course. The lectures will be only in presence and will NOT be live-streamed.

Teaching Material

Books

Title Authors Edition
Introduction to Data Mining Pang-Ning Tan, Michael Steinbach, Vipin Kumar 2nd
Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications Laura Igual, Santi Seguí 2nd
Python Data Science Handbook: Essential Tools for Working with Data Jake VanderPlas 1st
Deep Learning Ian Goodfellow, Yoshua Bengio, Aaron Courville
Introduction to Linear Algebra Gilbert Strang 5th

Online tutorials

Slides

The slides used in the course will be inserted in the calendar after each class. Some are part of the slides provided by the textbook's authors Slides per "Introduction to Data Mining".

Past Excercises and past exams of similar courses

Class Calendar (2025/2026)

First Semester

Day Topic Teaching material References Teacher
1. 18.09 Course Overview. Introduction to Data Mining Introduction to DM Chap. 1 Kumar Book Setzu
23.09 Canceled for Teacher's health issues
2. 24.09 Data Understanding + Data Preparation data_understanding.pdf Data Preparation Chap. 2 Kumar Book and additioanl resource of Kumar Book: Data Exploration Chap. If you have the first ed. of KUMAR this is the Chap 3 Setzu
3. 25.09 Data representation data_representation.pdf References: Introduction to linear algebra (Sections 1, 3.1, 4.2, 6.1, 6.4, 6.5, 7.3), t-SNE paper, UMAP paper (Section 3) Setzu
4. 30.09 Data Cleaning + Transformations. PyLab: Data Understanding Data Cleaning & Transformations Monreale, Mannocci
5. 01.10 PyLab: Data Understanding + Preparation 1_basics_and_understanding.ipynb.zip 2_feature_engineering_and_data_representation.ipynb.zip data_notebook.zip Monreale, Mannocci
6. 02.10 Similarities + Introduction to Clustering and Centroid-based clustering 6-data_similarity.pdf 6-basic_cluster_analysis-intro.pdf 8-basic_cluster_analysis-kmeans.pdf Monreale
7. 07.10 K-means 8-basic_cluster_analysis-kmeans.pdf} Monreale
8. 08.10 Hierarchical Clustering + Density Based Clustering + Validity 9-basic_cluster_analysis-hierarchical.pdf 8.basic_cluster_analysis-dbscan-validity.pdf Monreale
9. 14.10 Clustering evaluation and Python notebooks Clustering validation 3_clustering.ipynb.zip Setzu, Mannocci
10. 15.10 Anomaly detection Slides Setzu
11. 16.10 Anomaly detection Slides , Notebook , Rule extraction from isolation forests Setzu
12. 21.10 Variants of K-means + Association Rule Mining 11-basic_cluster_analysis-kmeans-variants.pdf 17_association_analysis2023.pdf Monreale
13. 22.10 Association Rule Mining: Apriori 17_association_analysis2023.pdf Monreale
14. 23.10 Association Rule Mining: CORELS Slides , Online tool Setzu
15. 28.10 Visual Analytcs Slides Code for data visualization with Altair Monreale, Rinzivillo
16. 29.10 Association Rule Mining: FP-Growth + Sequential Pattern Mining FP-GrowthSPM Monreale
30.10 Lecture is canceled
17. 04.11 Sequential Pattern Mining with time constraints + Python Lab: FPM + SPM. For SPM the same set of slides used in the previous lecture 5_patternmining.ipynb.zip Monreale
18. 05.11 Supervised learning and classification Slides Setzu
19. 06.11 Classification: Decision Trees Decision Trees Video Monreale
20. 07.11 Classification: Decision Trees Monreale
21. 11.11 Classification: Decision Trees & evaltuation + Decision Rules Evaluation Decision Rules Monreale
22. 12.11 Classification: Decision Rules + Instance based methods + Q&A for Project work 10-knn.pdf Monreale
23. 13.11 Exercises: DT simulation, CLustering, sequences dt-learning-simulation.pdf learnedtree.pdf2025-ex-clustering.pdf ex-sequences.pdf Monreale
24. 18.11 Advanced Decision Trees, GAMs, and ensemble models Slides Setzu
25. 25.11 Neural networks Slides Setzu
26. 26.11 Time series, Python Supervised Learning & Imbalanced Scenarios Slides supervised_learning.zip data_notebook.zip Setzu, Mannocci
27. 27.11 Time series, Python Supervised Learning & Imbalanced Scenarios Slides , Slides in HTML (w/ working animation) Setzu
28. 02.12 Shapelet-based Classification, Motif discovery Slides shaplet.pdf matrixprofile.pdf Papers and resourse on motif Monreale
29. 03.12 Py: Time Seriestimeseries.zip Monreale, Mannocci
30. 04.12 Responsible AI: introduction and EU Regulations SlidesMonreale
31. 09.12 Responsible AI: privacy. Same slides of previous lecture chap-anonymity.pdf MIA attack against ML Monreale
32. 10.12 Responsible AI: Explaianble AI XAIDigital book where students can find some basic XAI models and notions XAI Survey describing the taxonony and dimensions of XAI LORE apaproach, ABELE approachLASTS SHAPLIMEMonreale
33. 11.12 XAI Python Notebook + Private and explanable FL, Assessing privacy in XAI XAI Notebook Slides GLOR-FLEX FASTSHAP++ REVEALNaretto
34. 16.12 Project Presentations - second check - ONLINE - MANDATORY
35. 17.12 Project Presentations - second check - ONLINE - MANDATORY
36. 18.12 Project Presentations - second check - ONLINE - MANDATORY

Exam

The exam can be taken in one of two ways:

Project track:

During the course, you will have some “Project presentation” sessions wherein you’ll briefly (~3 minutes) present your work, and receive feedback from the lecturers. These sessions do not contribute to your grade.

Written test track

Note that a passing grade for the project/written exam is required to be admitted to the oral exam.

Project Guidelines: A project consists in data analyses based on the use of data mining tools. The project has to be performed by a team of 3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 25 pages of text including figures. The students must deliver both: paper (single column) and well commented Python Notebooks.

Specifically, if any of these tasks appear in the project track, make sure to focus on the following:

Data understanding

Clustering Analysis

Anomaly detection

Time series analysis

Supervised learning

Explainability

Project and Deadlines Information about the dataset to be analyzed and project description:

Previous years

Data Mining (309AA) - 9 CFU A.Y. 2024/2025

Data Mining (309AA) - 9 CFU A.Y. 2023/2024

DM-INF 2022-2023

Data Mining (309AA) - 9 CFU A.Y. 2021/2022

Data Mining (309AA) - 9 CFU A.Y. 2020/2021

DM-2019/20