magistraleinformatica:dmi:start
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
| Entrambe le parti precedenti la revisioneRevisione precedenteProssima revisione | Revisione precedente | ||
| magistraleinformatica:dmi:start [18/09/2024 alle 09:33 (13 mesi fa)] – Update teaching material, and office hours Mattia Setzu | magistraleinformatica:dmi:start [22/10/2025 alle 16:54 (33 ore fa)] (versione attuale) – added corels lecture Mattia Setzu | ||
|---|---|---|---|
| Linea 1: | Linea 1: | ||
| - | ====== Data Mining (309AA) - 9 CFU A.Y. 2024/2025 ====== | + | ====== Data Mining (309AA) - 9 CFU A.Y. 2025/2026 ====== |
| **Instructors: | **Instructors: | ||
| Linea 12: | Linea 12: | ||
| * * **Lorenzo Mannocci** | * * **Lorenzo Mannocci** | ||
| * University of Pisa | * University of Pisa | ||
| - | * [[lorenzo.mannocci@phd.unipi.it]] | + | * [[lorenzo.mannocci@di.unipi.it]] |
| ====== News ====== | ====== News ====== | ||
| - | * [14.09.2024] ** The lectures will start on 19th September 2024** | + | [23-09-2025]: Please register yourself and your group for the project |
| ====== Learning Goals ====== | ====== Learning Goals ====== | ||
| - | * Fundamental concepts | + | The Data Mining course tackles the analysis of large collections |
| - | | + | * Data understanding |
| - | | + | * Data cleaning, |
| - | | + | * Data analysis: outlier detection and data representation |
| - | | + | * Data clustering |
| - | | + | * Pattern |
| - | | + | * Inference models: trees, and ensemble models |
| - | * Time Series Analysis | + | * Responsible data use: privacy and interpretability |
| - | | + | |
| - | * Ethical Issues | + | |
| ====== Schedule ====== | ====== Schedule ====== | ||
| Linea 34: | Linea 32: | ||
| ^ Day of Week ^ Hour ^ Room ^ | ^ Day of Week ^ Hour ^ Room ^ | ||
| - | | Tuesday | + | | Tuesday |
| - | | | + | | |
| - | | | + | | |
| **Office hours - Ricevimento: | **Office hours - Ricevimento: | ||
| - | * Anna Monreale: TBD | + | * Anna Monreale: |
| * Mattia Setzu: Infos on [[https:// | * Mattia Setzu: Infos on [[https:// | ||
| - | A [[https:// | + | A [[ https:// |
| ====== Teaching Material ====== | ====== Teaching Material ====== | ||
| Linea 68: | Linea 67: | ||
| The slides used in the course will be inserted in the calendar after each class. Some are part of the slides provided by the textbook' | The slides used in the course will be inserted in the calendar after each class. Some are part of the slides provided by the textbook' | ||
| + | ====== Class Calendar (2025/2026) ====== | ||
| + | |||
| + | ===== First Semester | ||
| + | |||
| + | ^ ^ Day ^ Topic ^ Teaching material ^ References ^ Teacher ^ | ||
| + | |1. | 18.09 | Course Overview. Introduction to Data Mining | {{ : | ||
| + | | | 23.09 | Canceled for Teacher' | ||
| + | |2. | 24.09 | Data Understanding + Data Preparation | ||
| + | |3. | 25.09 | Data representation | ||
| + | |4. | 30.09 | Data Cleaning + Transformations. PyLab: Data Understanding | ||
| + | |5. | 01.10 | PyLab: Data Understanding + Preparation | ||
| + | |6. | 02.10 | Similarities + Introduction to Clustering and Centroid-based clustering | ||
| + | |7. | 07.10 | K-means | ||
| + | |8. | 08.10 | Hierarchical Clustering + Density Based Clustering + Validity | ||
| + | | 9. | 14.10 | Clustering evaluation and Python notebooks | {{ https:// | ||
| + | | 10. | 15.10 | Anomaly detection | {{ https:// | ||
| + | | 11. | 16.10 | Anomaly detection | {{ https:// | ||
| + | |12. | 21.10 | Variants of K-means + Association Rule Mining | {{ : | ||
| + | |14. | 23.10 | Association Rule Mining: CORELS | {{ https:// | ||
| + | |||
| + | |||
| + | |||
| - | | ||
| - | **Software** | ||
| - | Software material available in the [[https:// | ||
| - | ====== Class Calendar (2024/2025) ====== | ||
| - | ===== First Semester | + | ====== |
| + | |||
| + | The exam can be taken in one of two ways: | ||
| + | |||
| + | **Project track**: | ||
| + | * Project (70% of the final score) to be delivered after the end of the course | ||
| + | * Oral exam (30% of the final score) | ||
| + | During the course, you will have some “Project presentation” sessions wherein you’ll briefly (~3 minutes) present your work, and receive feedback from the lecturers. These sessions do not contribute to your grade. | ||
| + | |||
| + | **Written test track** | ||
| + | * Written exam (70% of the final score): to be delivered after the end of the course during the exam sessions and can include both theoretical questions and exercises. | ||
| + | * Oral exam (30% of the final score) | ||
| + | Note that a passing grade for the project/ | ||
| + | |||
| + | **Project Guidelines: | ||
| + | A project consists in data analyses based on the use of data mining tools. | ||
| + | The project has to be performed by a team of 3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 25 pages of text including figures. The students must deliver both: paper (single column) and well commented Python Notebooks. | ||
| + | |||
| + | Specifically, | ||
| + | |||
| + | **Data understanding** | ||
| + | * An analysis of all variables, their relations, distributions, | ||
| + | * An eventual feature imputation and/or selection | ||
| + | * The engineering of additional features, including the aforementioned analyses | ||
| + | |||
| + | **Clustering Analysis** | ||
| + | * A properly justified feature selection phase | ||
| + | * Tackling all clusternig families, exploring their respective hyperparameters | ||
| + | * An analysis of the best clusterings per family, including cluster description | ||
| + | * A comparison of the best clusterings per family | ||
| + | |||
| + | **Anomaly detection** | ||
| + | * A selection of outliers through appropriate algorithms | ||
| + | * An interpretation of such outliers | ||
| + | * An analysis of the impact of the outliers on the previously performed data understanding | ||
| + | |||
| + | **Time series analysis** | ||
| + | * Appropriate representation choice for the task at hand | ||
| + | |||
| + | **Supervised learning** | ||
| + | * Feature selection | ||
| + | * Test different families of models | ||
| + | * Proper model validation, including both model performance and model complexity | ||
| + | * Comparison of the best models of each family | ||
| + | |||
| + | **Explainability** | ||
| + | |||
| + | * Justified selection of instances to explain | ||
| + | * Analysis of the explanations | ||
| + | |||
| + | **Project and Deadlines** | ||
| + | Information about the dataset to be analyzed and project description: | ||
| + | * **Dataset.** https:// | ||
| + | * **Project description.** {{ : | ||
| + | * **Project Question & Answers.** https:// | ||
| + | * **Deadline.** | ||
| + | * **Delivery instructions.** | ||
| - | ^ ^ Day ^ Topic ^ Learning material ^ References ^ Video Lectures ^ Teacher ^ | ||
| - | | | 17.09 | Candeled | ||
| - | |1. | 19.09 | Overview. Introduction to KDD | ||
| - | | ||
| - | ====== Exams ====== | ||
| - | TBD | ||
| ====== Previous years ===== | ====== Previous years ===== | ||
| + | [[DM-INF 2024-2025]] | ||
| + | |||
| [[DM-INF 2023-2024]] | [[DM-INF 2023-2024]] | ||
magistraleinformatica/dmi/start.1726651990.txt.gz · Ultima modifica: 18/09/2024 alle 09:33 (13 mesi fa) da Mattia Setzu
