Data Mining A.A. 2025/26

DM1 - Data Mining: Foundations (6 CFU)

Instructors:

Dino Pedreschi
- KDDLab, Università di Pisa
- http://www-kdd.isti.cnr.it
- dino [dot] pedreschi [at] unipi [dot] it

Riccardo Guidotti
- KDDLab, Università di Pisa
- https://kdd.isti.cnr.it/people/guidotti-riccardo
- riccardo [dot] guidotti [at] di [dot] unipi [dot] it

Teaching Assistant

Alessio Cascione
- KDDLab, Università di Pisa
- https://www.linkedin.com/in/alessio-cascione-a77224159/?originalSubdomain=it
- alessio [dot] cascione [at] phd [dot] unipi [dot] it

DM2 - Data Mining: Advanced Topics and Applications (6 CFU)

Instructors:

Riccardo Guidotti
- KDDLab, Università di Pisa
- https://kdd.isti.cnr.it/people/guidotti-riccardo
- riccardo [dot] guidotti [at] di [dot] unipi [dot] it

Teaching Assistant

Alessio Cascione
- KDDLab, Università di Pisa
- https://www.linkedin.com/in/alessio-cascione-a77224159/?originalSubdomain=it
- alessio [dot] cascione [at] phd [dot] unipi [dot] it

News

[04.05.2026] Despite the stop, we can keep the planned lectures for these days if we stay as follows: (a) Tomorrow Tuesday 5/5 we start at 14.00 sharp and we end at 15.30 without break; (b) Thursday 7/5 we start at 11.00 sharp and we end at 12.30 without break (as usual); © Next Monday 11/5 we can have regular lecture 9.15-11.00.
[10.03.2026] The lecture of the09.03.2026 will be recovered Wed 18.03.2026 at 11-13 in Room H. The lecture of the 23.03.2026 will be recovered Wed 25.03.2026 at 11-13 in Room D3*
[17.12.2025] DM Exam Registration instruction available in Exam section.
[01.12.2025] The lecture of Thursday 04/12/2025 is moved to Friday 05/12/2025 9-11 in room C (project presentation of Prof.ssa Pierotti will start at 11 after DM lecture). The last lecture will be held on Tuesday 09/12/2025 9-11 in room M1 (as Monday 08/12/2025 is holiday), while the lecture of P4DS is moved to 09/12/2025 16-18 in room C1.
[19.11.2025] The lecture of Thursday 20/11/2025 will be held in room N1 due to not usability of room E.
[07.10.2025] The lecture of Thursday 10/10/2025 is canceled due to the UniPi Orienta event. The recovery lecture is Tuesday 14/10/2025 9-11 room M1.
[06.10.2025] Link to Project Groups Registration DM1 [25/26] (max 3 students for each group - access with your University of Pisa account, deadline 17/10/2025:) Link
[28.07.2025] Lectures will start on Monday 29 September 2025 at 09.00 room E. Lectures will be in presence only. Registrations of the lectures of past years can be found at the bottom of this web page.

Learning Goals

DM1
- Fundamental concepts of data knowledge and discovery.
- Data understanding
- Data preparation
- Clustering
- Classification
- Pattern Mining and Association Rules
- Sequential Pattern Mining

DM2
- Outlier Detection
- Dimensionality Reduction
- Regression
- Advanced Classification and Regression
- Time Series Analysis
- Transactional Clustering
- Explainability

Hours and Rooms

DM1

Classes

Day of Week	Hour	Room
Monday	09:00 - 11:00	E
Thursday	09:00 - 11:00	E

Office hours - Ricevimento:

Prof. Pedreschi
- Monday 15:00-17:00 or Appointment by email
- Room 318 Dept. of Computer Science or MS Teams

Prof. Guidotti
- Thursday 16:00 - 18:00 or Appointment by email
- Room 363 Dept. of Computer Science or MS Teams

Alessio Cascione
- Appointment by email

DM 2

Classes

Day of Week	Hour	Room
Monday	09:00 - 11:00	C
Thursday	11:00 - 13:00	C

Office Hours - Ricevimento:

Tuesday 15.00-17.00 or Appointment by email
Room 363 Dept. of Computer Science or MS Teams

Learning Material -- Materiale didattico

Textbook -- Libro di Testo

Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining. Addison Wesley, ISBN 0-321-32136-7, 2006
- http://www-users.cs.umn.edu/~kumar/dmbook/index.php
- I capitoli 3, 5, 7 sono disponibili sul sito del publisher. – Chapters 3,5 and 7 are also available at the publisher's Web site.
Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. GUIDE TO INTELLIGENT DATA ANALYSIS. Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7
Laura Igual et al. Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications. 1st ed. 2017 Edition.
Jake VanderPlas. Python Data Science Handbook: Essential Tools for Working with Data. 1st Edition.

Slides

The slides used in the course will be inserted in the calendar after each class. Most of them are part of the slides provided by the textbook's authors Slides per "Introduction to Data Mining".

FAQ

For the academic year 2025/2026, we make available a document containing frequently asked questions (FAQs) about the project at the end of the lecture. Please consult this document first, as your question may already be answered there. The FAQ will be updated regularly after each lecture with new relevant questions from students.

Check the document: https://docs.google.com/document/d/1OLa02xofxRPj1zUJ7zm_boxL_ZeAFR1HWCB4lgozgz8/edit?usp=sharing

Recording past years

Link to past years recordings (incrementally updated with respect to the current lectures of the course)

https://unipiit-my.sharepoint.com/:f:/g/personal/a_cascione_studenti_unipi_it/IgCdnqZe6wTKQJR_4yVrXE3gAcmqWHBSxvxW0HtsA596LWQ?e=OCa34K

Software

Python - Anaconda (>3.7): Anaconda is the leading open data science platform powered by Python. Download page (the following libraries are already included)
Scikit-learn: python library with tools for data mining and data analysis Documentation page
Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Documentation page

Other softwares for Data Mining

KNIME The Konstanz Information Miner. Download page
WEKA Data Mining Software in JAVA. University of Waikato, New Zealand Download page
Didactic Data Mining DDMv1, DDMv2

Class Calendar (2025/2026)

First Semester (DM1 - Data Mining: Foundations)

	Day	Time	Room	Topic	Material	Lecturer
	15.09.2025			No Lecture
	18.09.2025			No Lecture
	22.09.2025			No Lecture
	25.09.2025			No Lecture
01.	29.09.2025	09-11	E	Overview, Introduction	Intro	Pedreschi
02.	02.10.2025	09-11	E	The KDD process	Intro	Pedreschi
03.	06.10.2025	09-11	E	Introduction to Python	06.10.25_python_basic_2025_lecture_in_class.zip	Pedreschi, Cascione
	09.10.2025			No Lecture (UNIPI Orienta)
04.	13.10.2025	09-11	E	Data Understanding	Data Understanding	Pedreschi
05.	14.10.2025	09-11	C1	Data Preparation	Data Preparation, Data Similarity	Guidotti
06.	16.10.2025	09-11	E	Data Understanding Lab	16.10.25_data_understanding_2025_lecture_in_class.zip	Guidotti, Cascione
07.	20.10.2025	09-11	E	Data Similarity and Introduction to Clustering	Data Similarity, Introduction to Clustering	Guidotti
08.	23.10.2025	09-11	E	Centroid-based Clustering Algorithm	Centroid-based Clustering	Guidotti
09.	27.10.2025	09-11	E	Hierarchical Clustering Algorithm	Hierarchical Clustering	Guidotti
10.	27.10.2025	09-11	E	Density-based Clustering Algorithm	Density-based Clustering	Guidotti
11.	03.11.2025	09-11	E	Clustering Lab	03.11.25_clustering_2025_lecture_in_class.zip	Pedreschi, Cascione
12.	04.11.2025	09-11	C1	Classification: Overview and K-Nearest Neighbours	Classification Overview KNN Classifier	Pedreschi
13.	06.11.2025	09-11	E	Classification: Naive Bayes Classifier and Exercises	Naive Bayes	Pedreschi
14.	10.11.2025	09-11	E	Classification: Evaluation	Model evaluation	Pedreschi
15.	13.11.2025	09-11	E	Classification: Decision Trees (1)	Decision trees	Pedreschi
16.	17.11.2025	09-11	D5	Classification: Decision Trees (2)		Pedreschi
17.	18.11.2025	09-11	C1	Classification: Decision Trees (3)		Pedreschi
18.	20.11.2025	09-11	N1	Classification Lab	20.11.25_classification_2025_lecture_in_class.zip	Guidotti, Cascione
19.	24.11.2025	09-11	E	Pattern Mining: Apriori	Pattern mining & association rules	Pedreschi
20.	25.11.2025	09-11	C	Pattern Mining: Lift, Interest, Multiattribute		Pedreschi
21.	27.11.2025	09-11	E	Regression: Problem, Linear, KNN, Decision Tree	Regression	Pedreschi
22.	01.12.2025	09-11	E	Lab on Regression and Pattern Mining; FPGROWTH	01.12.25_regression_2025_lecture_in_class.zip, 01.12.25_pattern_mining_2025_lecture_in_class.zip, FPGROWTH	Guidotti, Cascione
23.	04.12.2025	09-11	C	Exercises Pattern Mining & Decision Trees		Guidotti
24.	09.12.2025	09-11	M1	Rule-based Classifiers	Rule-Based Classifier	Guidotti

Second Semester (DM2 - Data Mining: Advanced Topics and Applications)

	Day	Time	Room	Topic	Material	Lecturer
01.	16.02.2026	9-11	C	Overview, Imbalanced Learning	Introduction, Imbalanced Learning, ImbLearLab	Guidotti
02.	19.02.2026	11-13	C	Imbalanced Learning, Dimensionality Reduction	Imbalanced Learning, ImbLearLab, Dimensionality Reduction	Guidotti
03.	23.02.2026	9-11	C	Imbalanced Learning	Dimensionality Reduction, DimRedLab	Guidotti
04.	26.02.2026	11-13	C	Outlier Detection	Outlier Detection, OutDetLab	Guidotti
05.	02.03.2026	9-11	C	Outlier Detection	Outlier Detection, OutDetLab	Guidotti
06.	05.03.2026	11-13	C	Outlier Detection	Outlier Detection, OutDetLab	Guidotti
	09.03.2026	9-11	C	Canceled.		Guidotti
07.	12.03.2026	11-13	C	Gradient Descent, Maximum Likelihood Estimation, Odds, Log Odds, Logistic Regression Intro	GD, MLE, Odds	Guidotti
08.	16.03.2026	9-11	C	Logistic Regression, (Linear)SVM	LogReg, SVM LabLogReg, LabSVM	Guidotti
09.	18.03.2026	11-13	H	(NonLinear)SVM, MulticlassSVM	SVM, LabSVM, Video	Guidotti
10.	19.03.2026	11-13	C	Perceptron, Deep Neural Networks	Perceptron, DeepNN, LabNN	Guidotti
	25.03.2026	11-13	D3	Canceled.		Guidotti
11.	25.03.2026	11-13	C	Deep Neural Networks	DeepNN, LabNN, Video	Guidotti
12.	26.03.2026	11-13	C	Ensemble Models: Bagging and Random Forest	EnsembleModels, LabEnsemble	Guidotti
13.	30.03.2026	9-11	C	Ensemble Models: Boosting	EnsembleModels, GradientBoosting, LabEnsemble	Guidotti
14.	09.04.2026	11-13	C	Ensemble Models: Boosting, Introduction to XAI	GradientBoosting, LabEnsemble, XAI	Guidotti
15.	13.04.2026	9-11	C	Explainable Artificial Intelligence	XAI, LabXAI	Guidotti
16.	16.04.2026	11-13	C	Transactional Clustering	Transactional Clustering	Guidotti
17.	20.04.2026	9-11	C	Sequential Pattern Mining	SeqPatternMining	Guidotti
18.	23.04.2026	11-13	C	Sequential Pattern Mining	SeqPatternMining, LabGSP	Guidotti
19.	27.04.2026	9-11	C	Time Series Analytics Introduction	TimeSeriesPreprocessing, LabTSPrep	Guidotti
	30.04.2026	11-13	C	No Lecture		Guidotti
20.	04.05.2026	9-11	C	Time Series Distances	TimeSeriesDistances, LabTSDist	Guidotti
21.	05.05.2026	14-16	C	Time Series Approximation and Clustering	TimeSeriesApproxClustering, LabTSAppClus	Guidotti
22.	07.05.2026	11-13	C	Time Series Motif and Discords	Matrix Profile, LabMatrixProfile	Guidotti
23.	11.05.2026	9-11	C	Time Series Classification (part 1)	TimeSeriesClassification, LabTSClf	Guidotti
24.	14.05.2026	11-13	C	Time Series Classification (part 2)	TimeSeriesClassification, LabTSClf	Guidotti

Exams

How and Where: The exam will take place in oral mode only at the teacher's office or classroom previously designated. The exam will be held online on the 420AA Data Mining course channel only at the request of the student in accordance with current legislation.

When: The dates relating to the start of the three exams are/will be published on the online platform https://esami.unipi.it/. Within each session, we will identify dates and slots in order to distribute the various orals. The dates and slots to take the exam will be published on the course page by the end of May. Each student must also register on https://esami.unipi.it/. The examination can only be carried out after the delivery of the project. The project must be delivered one week before when you want to take the exam. Group oral discussions will be preferred in respect of the project groups in order to parallelize any discussion on the project. It is not mandatory to take the oral exam together with the other members of the group. In the event that the oral exam is not passed, it will not be possible to take until the next exam session. If the project is not considered sufficient, it must be carried out again on a new dataset or a very updated version of the current one.

What: The oral test will evaluate the practical understanding of the algorithms. The exam will evaluate three aspects.

Understanding of the theoretical aspects of the topics addressed during the course. The student may be required to write on formulas or pseudocode. During the explanations, the student can use pen and paper.
Understanding of the algorithms illustrated during the course and their practical implementation. You will be asked to perform one or more simple exercises. The student will have to use pen and paper to show how the exercise is solved.
Discussion of the project with questions from the teacher regarding unclear aspects, questionable steps or choices.

Final Mark: for 12-credit exam, the final mark will be obtained as the average mark of DM1 and DM2.

Exam Enrollment Instruction

If you are a student of Data Science 1st year
Then register here: here
Else (not Data Science first year or other degrees like Digital Humanities or any other) register here
Deadline: 01/02/2026
Oral Exams will start from the 05/02/2026
Some days after the 01/02/2026 and before the 05/02/2026 all those registered will receive an email with a link to an Agenda to select the exam day and the time slot.

Exam DM1

The exam is composed of two parts:

An oral exam, that includes: (1) discussing the project report; (2) discussing topics presented during the classes, including the theory and practical exercises.

A project, that consists in exercises requiring the use of data mining tools for analysis of data. Exercises include: data understanding, clustering analysis, pattern mining, and classification (guidelines will be provided for more details). The project has to be performed by min 2, max 3 people. It has to be performed by using Python or any other data mining software. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The paper must be emailed to alessio [dot] cascione [at] phd [dot] unipi [dot] it and riccardo [dot] guidotti [at] unipi [dot] it. Please, use “[DM1 2025-2026] Project” in the subject.

Dataset
1. Assigned: 15/10/2025
2. MidTerm Submission: 15/11/2025 (+0.5) (half project required, i.e., Data Understanding & Preparation and Clustering)
3. Final Submission: 31/12/2025 (+0.5) one week before the oral exam (complete project required).
4. Dataset: Download here dm1_25_26_dataset.zip

DM1 Project Guidelines See dm1_project_guidelines_25_26.pdf

Exam DM2