Indice
Data Analytics for Digital Health (DAD) - 9 CFU A.Y. 2025/2026
Instructors:
- Anna Monreale
- KDDLab, Università di Pisa
- Francesca Naretto
- KDDLab, Università di Pisa
News
- [08.09.2025] - Lecture of the first week will be canceled, so they will start on 22nd September 2025
- [30.09.2025] - All Students must fill this document for exam information
Learning Goals
- Fundamental concepts of data knowledge and discovery.
- Data Types in Healthcare Data and Public Databases
- Data understanding
- Data preparation
- Clustering
- Classification
- Rule-based methods
- Outlier Detection
- Time Series Analysis
- Sequential Pattern Mining
Hours and Rooms
Classes
Day of Week | Hour | Room |
---|---|---|
Monday | 09:00 - 11:00 | Room FIB PS4 |
Tuesday | 14:00 - 16:00 | Room C1 |
Friday | 11:00 - 13:00 | Room FIB PS4 |
Office hours - Ricevimento: Anna Monreale: TBD - Online using Teams or in my Office (Appointment by email). Francesca Naretto: TBD - Online using Teams or in my Office (Appointment by email).
A Teams Channel will be used ONLY to post news, Q&A, and other stuff related to the course. The lectures will be only in presence and will NOT be live-streamed.
Learning Material -- Materiale didattico
Textbook -- Libro di Testo
- Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining. Addison Wesley, ISBN 0-321-32136-7, 2006
- Chapters 4,6 and 8 are also available at the publisher's Web site.
- Jake VanderPlas. Python Data Science Handbook: Essential Tools for Working with Data. 1st Edition.
- For Python Notions: Very basic notions on Python
Slides
- The slides used in the course will be inserted in the calendar after each class. Most of them are part of the slides provided by the textbook's authors Slides per "Introduction to Data Mining".
Software
- Python - Anaconda (at least 3.7 version!!!): Anaconda is the leading open data science platform powered by Python. Download page (the following libraries are already included)
- Scikit-learn: python library with tools for data mining and data analysis Documentation page
- Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Documentation page
Class Calendar (2025/2026)
First Semester
Day | Topic | Learning material | References | Teacher | |
---|---|---|---|---|---|
22.09 | Strike | ||||
23.09 | CANCELED for Teacher's health issues | ||||
1. | 26.09 | Overview. Introduction to Data Analyics for DH + Data Types | Overview 1-intro-da-dm-tecs.pdf | Chap. 1 Kumar Book | Monreale |
2. | 29.09 | Data Understanding TD | Data Understanding | Chap.2 Kumar Book and additioanl resource of Kumar Book: Data Exploration Chap. If you have the first ed. of KUMAR this is the Chap 3 | Naretto |
3. | 30.09 | Data Preparation TD | Data Preparation | Chap.2 Kumar Book and additioanl resource of Kumar Book: Data Exploration Chap. If you have the first ed. of KUMAR this is the Chap 3 | Monreale |
4. | 01.10 - Room I | Python Lab: Data Understaing & Preparation TD | Naretto | ||
5. | 03.10 | Project Presentation + Data Understanding and Preparation for images | Naretto | ||
6. | 06.10 | Data Understanding and Preparation for TS | Naretto | ||
7. | 07.10 | Data Understanding and Preparation for TS | Naretto | ||
8. | 08.10 Room I | Python Lab: Data Understanding and Preparation for Images and Time Series | Naretto | ||
10.10 | suspension of teaching activities | ||||
9. | 13.10 | Intro Clustering. Centroid-based clusteting | Monreale | ||
10. | 14.10 | Density-based clusering + Clustering Validity | Monreale | ||
11. | 17.10 | Python Lab: K-means + DBScan | Monreale | ||
20.10 | Canceled: this lecture will be recovered on 22.10 | ||||
21.10 | Canceled: this lecture will be recovered on December | ||||
12. | 22.10 Room TBD | Hierarchical Clustering + k-means variants + Lab. Python | |||
13. | 24.10 | Lab for Project Work | |||
14. | 27.10 | Clustering and similarity for Image | |||
15. | 28.10 | Clustering and similarity for Time Series | |||
16. | 31.10 | Python Lab:Clustering Images, Time Series | |||
17. | 03.11 | ||||
18. | 04.11 | ||||
19. | 07.11 | ||||
20. | 10.11 | ||||
21. | 11.11 | ||||
22. | 14.11 | ||||
23. | 17.11 | ||||
24. | 18.11 | ||||
25. | 21.11 | ||||
26. | 24.11 | ||||
27. | 25.11 | ||||
28. | 28.11 | ||||
29. | 01.12 | ||||
30. | 02.12 | ||||
31. | 05.12 | ||||
32. | 09.12 | ||||
33. | 12.12 | ||||
34. | 15.12 | ||||
35. | 16.12 | ||||
36. | 19.12 |
Exams
The exam consists of: a group project (in teams of two or three) and an oral exam that includes a discussion of the project and an assessment of the theoretical knowledge acquired, for those who complete the project during the course and meet all intermediate and final deadlines set by the instructors.
Alternatively, students who do not complete or submit the project within the established deadlines will be required to take a written exam and an oral exam covering all course topics.
PROJECT
A project consists in data analyses based on the use of data mining tools. The project has to be performed by a team of 2 max 3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 25 pages of text including figures. The students must deliver both: paper (single column) and well commented Python Notebooks.