Strumenti Utente

Strumenti Sito


dm:start

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
dm:start [30/11/2021 alle 08:25 (4 anni fa)] – [Exam DM1] Fosca Giannottidm:start [19/05/2025 alle 09:04 (8 settimane fa)] (versione attuale) – [Second Semester (DM2 - Data Mining: Advanced Topics and Applications)] Riccardo Guidotti
Linea 1: Linea 1:
-<html> +====== Data Mining A.A. 2024/25 ======
-<!-- Google Analytics --> +
-<script type="text/javascript" charset="utf-8"> +
-(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ +
-(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), +
-m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) +
-})(window,document,'script','//www.google-analytics.com/analytics.js','ga'); +
- +
-ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true}); +
-ga('personalTracker.require', 'linker'); +
-ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it'] ); +
-   +
-ga('personalTracker.require', 'displayfeatures'); +
-ga('personalTracker.send', 'pageview', 'ruggieri/teaching/dm/'); +
-setTimeout("ga('send','event','adjusted bounce rate','30 seconds')",30000);  +
-</script> +
-<!-- End Google Analytics --> +
-<!-- Global site tag (gtag.js) - Google Analytics --> +
-<script async src="https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W"></script> +
-<script> +
-  window.dataLayer = window.dataLayer || []; +
-  function gtag(){dataLayer.push(arguments);+
-  gtag('js', new Date()); +
- +
-  gtag('config', 'G-LPWY0VLB5W'); +
-</script> +
-<!-- Capture clicks --> +
-<script> +
-jQuery(document).ready(function(){ +
-  jQuery('a[href$=".pdf"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'PDFs', fname); +
-  }); +
-  jQuery('a[href$=".r"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Rs', fname); +
-  }); +
-  jQuery('a[href$=".zip"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'ZIPs', fname); +
-  }); +
-  jQuery('a[href$=".mp4"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Videos', fname); +
-  }); +
-  jQuery('a[href$=".flv"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Videos', fname); +
-  }); +
-}); +
-</script> +
-</html> +
-====== Data Mining A.A. 2021/22 ======+
  
 ===== DM1 - Data Mining: Foundations (6 CFU) ===== ===== DM1 - Data Mining: Foundations (6 CFU) =====
Linea 61: Linea 9:
     * [[dino.pedreschi@unipi.it]]       * [[dino.pedreschi@unipi.it]]  
  
-  * **Mirco Nanni** +  * **Riccardo Guidotti** 
-    * KDDLab, ISTI - CNR, Pisa +    * KDDLab, Università di Pisa 
-    * [[http://www-kdd.isti.cnr.it]] +    * [[https://kdd.isti.cnr.it/people/guidotti-riccardo]]    
-    * [[mirco.nanni@isti.cnr.it]]  +    * [[riccardo.guidotti@di.unipi.it]]
  
 Teaching Assistant Teaching Assistant
-  * **Salvatore Citraro**+  * **Andrea Fedele**
     * KDDLab, Università di Pisa     * KDDLab, Università di Pisa
-    * [[http://www-kdd.isti.cnr.it]] +    * [[https://www.linkedin.com/in/andrea-fedele/?originalSubdomain=it]] 
-    * [[salvatore.citraro@phd.unipi.it]]  +    * [[andrea.fedele@phd.unipi.it]]  
 ===== DM2 - Data Mining: Advanced Topics and Applications (6 CFU) ===== ===== DM2 - Data Mining: Advanced Topics and Applications (6 CFU) =====
  
Linea 79: Linea 27:
     * [[riccardo.guidotti@di.unipi.it]]     * [[riccardo.guidotti@di.unipi.it]]
  
 +Teaching Assistant 
 +  * **Andrea Fedele** 
 +    * KDDLab, Università di Pisa 
 +    * [[https://www.linkedin.com/in/andrea-fedele/?originalSubdomain=it]] 
 +    * [[andrea.fedele@phd.unipi.it]]   
 +    * Meeting: https://calendly.com/andreafedele/
 ====== News ====== ====== News ======
-     * **[14.10.2021] The class planned for Monday 18.10.2021 is cancelled for public ceremony.** +     * **[11.03.2025]** The lecture of DM2 planned for the 14/03/2025 will be held in Room C instead of in Room E    
-     * [06.09.2021The first lesson will be held on 16/09/2021.+     * *[04.03.2025]The sixth lecture of DM2 planned for the 04/03/2025 will be in Room C instead of in Room E.     
 +     * [27.01.2025] The first lecture of DM2 will be held the 18.02.2025 in Room A1 exchanging with S4DS that will be held the 17.02.2025 in Room E.     
 +     * [07.01.2025] Exams Registration Instructions for DM1 (second term):  
 +        - Use the Google registration form: [[https://forms.gle/NuAvCa3YK2h8MgrX7|here]] before the 23/01/2025.  
 +        - When the registration closes you will receive a link to the Agenda 
 +        - Register on the Agenda selecting day and time (do not change you choice or cancel, if you book you want to do the exam) 
 +        - Submit the project at least 1 week before the day you selected in the Agenda. 
 +     [03.12.2024] This year' lectures available at [[https://unipiit-my.sharepoint.com/:f:/g/personal/a_fedele7_studenti_unipi_it/Er7vET5iUWtGhScjXe7XzHUBDd3aYv8j87VYil6moFVyzw|link]] 
 +     [07.09.2024] Past years' lectures available at [[https://unipiit-my.sharepoint.com/:f:/g/personal/a_fedele7_studenti_unipi_it/EkecHQpnojVLqX0OqTlfrbMBBRMFbIJfNCw_RdFPN2276g?e=Y2uIcu|link]] 
 +     * [02.09.2024Lectures will start on Monday 30 September 2024 at 11.00 room C1. 
 +     * [02.09.2024] Lectures will be in presence only. Registrations of the lectures of past years can be found at the bottom of this web page. 
 +     * [02.09.2024] Project Groups [[https://docs.google.com/spreadsheets/d/1RFWIwKM5Myaehh4tHceaf3olMYm_CktGvoNOFX2Oovc/edit?usp=sharing|link]] 
 +     * [11.09.2023] MS Teams [[https://teams.microsoft.com/l/team/19%3AMMVIsw09XAOGOcd8-D8dKmNUO2hKXsFKpgkOoiFnwJM1%40thread.tacv2/conversations?groupId=3f7fd5a7-5c84-4930-92e4-0704013877f2&tenantId=c7456b31-a220-47f5-be52-473828670aa1|link]] 
 ====== Learning Goals ====== ====== Learning Goals ======
   * DM1   * DM1
Linea 91: Linea 56:
      * Classification      * Classification
      * Pattern Mining and Association Rules      * Pattern Mining and Association Rules
-     Clustering+     Sequential Pattern Mining
  
   * DM2   * DM2
      * Outlier Detection      * Outlier Detection
-     * Regression and Forecasting +     * Dimensionality Reduction 
-     * Advanced Classification+     * Regression  
 +     * Advanced Classification and Regression
      * Time Series Analysis      * Time Series Analysis
-     * Sequential Pattern Mining 
-     * Advanced Clustering 
      * Transactional Clustering      * Transactional Clustering
-     Ethical Issues+     Explainability
  
 ====== Hours and Rooms ====== ====== Hours and Rooms ======
Linea 110: Linea 74:
  
 ^  Day of Week  ^  Hour  ^  Room  ^  ^  Day of Week  ^  Hour  ^  Room  ^ 
-|  Monday  |  11:00 - 13:00  |  Aula C / [[https://teams.microsoft.com/l/team/19%3aRQK4eHK7Z7ogIuZu84k30riyA7YW6fCTF7f54PblHzc1%40thread.tacv2/conversations?groupId=c101108b-7634-4982-9d61-b38deee14681&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  |  +|  Monday  |  11:00 - 13:00  |  C1   |  
-|  Thursday  |  11:00 - 13:00  |  Aula A1 / [[https://teams.microsoft.com/l/team/19%3aRQK4eHK7Z7ogIuZu84k30riyA7YW6fCTF7f54PblHzc1%40thread.tacv2/conversations?groupId=c101108b-7634-4982-9d61-b38deee14681&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  +|  Tuesday  |  14:00 - 16:00  |  C1  
  
 **Office hours - Ricevimento:** **Office hours - Ricevimento:**
  
-  * Prof. Pedreschi: Monday 16:00 - 18:00, Online +  * Prof. Pedreschi 
-  ProfNanni: appointment by email, Online+      * TBD 
 +      * Online 
 +  * Prof. Guidotti 
 +      * Thursday 16:00 - 18:00 or Appointment by email 
 +      Room 363 Deptof Computer Science or MS Teams
  
      
Linea 125: Linea 93:
  
 ^  Day of Week  ^  Hour  ^  Room  ^  ^  Day of Week  ^  Hour  ^  Room  ^ 
-|  Monday  |  14:00 - 16:00  |  [[https://teams.microsoft.com/l/team/19%3aRQK4eHK7Z7ogIuZu84k30riyA7YW6fCTF7f54PblHzc1%40thread.tacv2/conversations?groupId=c101108b-7634-4982-9d61-b38deee14681&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  |  +|  Monday   |  11:00 - 13:00  |  E   |  
-|  Wednesday  |  16:00 - 18:00  |  [[https://teams.microsoft.com/l/team/19%3aRQK4eHK7Z7ogIuZu84k30riyA7YW6fCTF7f54PblHzc1%40thread.tacv2/conversations?groupId=c101108b-7634-4982-9d61-b38deee14681&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  |  +|  Wednesday  |  09:00 - 11:00  |   |  
  
 **Office Hours - Ricevimento:** **Office Hours - Ricevimento:**
  
-  * Room 268 Dept. of Computer Science +  * Tuesday 15.00-17.00 or Appointment by email 
-  * Tuesday: 15-17, Room: MS Teams +  * Room 363 Dept. of Computer Science or MS Teams
-  * Appointment by email+
  
 ====== Learning Material -- Materiale didattico ====== ====== Learning Material -- Materiale didattico ======
Linea 140: Linea 107:
   * Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, 2006   * Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, 2006
     * [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php]]     * [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php]]
-    * I capitoli 46sono disponibili sul sito del publisher. -- Chapters 4,and are also available at the publisher's Web site.+    * I capitoli 35sono disponibili sul sito del publisher. -- Chapters 3,and are also available at the publisher's Web site.
   * Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7   * Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7
   * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition.   * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition.
Linea 154: Linea 121:
 ===== Software===== ===== Software=====
  
-  * Python - Anaconda (3.7 version!!!): Anaconda is the leading open data science platform powered by Python. [[https://www.anaconda.com/distribution/| Download page]] (the following libraries are already included)+  * Python - Anaconda (>3.7): Anaconda is the leading open data science platform powered by Python. [[https://www.anaconda.com/distribution/| Download page]] (the following libraries are already included)
   * Scikit-learn: python library with tools for data mining and data analysis [[http://scikit-learn.org/stable/ | Documentation page]]   * Scikit-learn: python library with tools for data mining and data analysis [[http://scikit-learn.org/stable/ | Documentation page]]
   * Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. [[http://pandas.pydata.org/ | Documentation page]]   * Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. [[http://pandas.pydata.org/ | Documentation page]]
 +
 +Other softwares for Data Mining
   * [[http://www.knime.org | KNIME ]] The Konstanz Information Miner. [[http://www.knime.org/download-desktop| Download page ]]   * [[http://www.knime.org | KNIME ]] The Konstanz Information Miner. [[http://www.knime.org/download-desktop| Download page ]]
   * [[http://www.cs.waikato.ac.nz/ml/weka/ | WEKA ]] Data Mining Software in JAVA. University of Waikato, New Zealand [[http://www.cs.waikato.ac.nz/ml/weka/ | Download page ]]   * [[http://www.cs.waikato.ac.nz/ml/weka/ | WEKA ]] Data Mining Software in JAVA. University of Waikato, New Zealand [[http://www.cs.waikato.ac.nz/ml/weka/ | Download page ]]
-  * Didactic Data Mining [[http://matlaspisa.isti.cnr.it:5055/DDM]]+  * Didactic Data Mining [[http://matlaspisa.isti.cnr.it:5055/HelpDDMv1]], [[https://kdd.isti.cnr.it/ddm/#/| DDMv2]] 
    
-====== Class Calendar (2021/2022) ======+====== Class Calendar (2024/2025) ======
  
 ===== First Semester (DM1 - Data Mining: Foundations) ===== ===== First Semester (DM1 - Data Mining: Foundations) =====
  
-^ ^ Day ^ Room ^ Topic ^ Learning material Recording ^ Instructor +^ ^ Day ^ Time ^ Room ^ Topic ^ Material Lecturer 
-|1. 16.09.2021  11:00-12:45 Aula Fib A1 Introduction. {{ :dm:1.dm_2021-22.overview-corso.pptx_1_.pdf Introducing DM1 }} {{ dm1_project_guidelines.pptx.pdf Project-work guidelines (updated 22.11.2021) }} [[https://unipiit.sharepoint.com/sites/a__td_52415/Shared%20Documents/General/Recordings/Meeting%20in%20_General_-20210916_111002-Meeting%20Recording.mp4?web=1|Lecture 1]] Pedreschi +  | 16.09.2024 |          |  No Lecture   | 
-|2.|  20.09.2021  11:00-12:45 Aula Fib C Course overview | {{ :dm:2.introduction-short.pdf | Overview of contents }} |[[https://unipiit.sharepoint.com/sites/a__td_52415/Shared%20Documents/General/Recordings/Meeting%20in%20_General_-20210920_111429-Meeting%20Recording.mp4?web=1|Lecture 2]] | Pedreschi | +|   | 17.09.2024 | |  | No Lecture |  |  | 
-|3.|  23.09.2021  11:00-12:45 Aula Fib A1 Data Understanding | {{ :dm:3.dataunderstanding-2019.pdf Slides }} | [[https://unipiit.sharepoint.com/sites/a__td_52415/Shared%20Documents/General/Recordings/Data%20Mining%20Lecture%203-20210923_111708-Meeting%20Recording.mp4?web=1|Lecture 3]] | Pedreschi | +  | 23.09.2024 |  | No Lecture |  |  | 
-|4.|  27.09.2021  11:00-12:45 Aula Fib C | Data  Preparation  | {{ :dm:3.dm_ml_data_preparation.pdf | Slides }} | [[https://unipiit.sharepoint.com/sites/a__td_52415/Shared%20Documents/General/Recordings/General-20210927_111919-Meeting%20Recording.mp4?web=1|Lecture 4]] |Pedreschi | +|   | 24.09.2024 |  | No Lecture |  |  
-|5.|  30.09.2021  11:00-12:45 Aula Fib A1 Lab: Data Understanding & Preparation -- Python  | {{ :dm:python_basics.ipynb.zip |Python Introduction}} Dataset: {{ :dm:iris.csv.zip Iris}} {{ :dm:hands_on_dm1_pt1.zip |Hands-On Python (Iris)}} | [[https://unipiit.sharepoint.com/sites/a__td_52415/Shared%20Documents/General/Recordings/General-20210930_110913-Registrazione%20della%20riunione.mp4?web=1|Lecture 5]] |Citraro +|01.| 30.09.2024 | 11-13 |C1Overview, Introduction | {{ :dm:00_dm1_introduction_2024_25.pdf | Intro}} | Pedreschi| 
-|6.|  04.10.2021  11:00-12:45 Aula Fib C Lab: Data Understanding & Preparation -- Python (cont.) KNIME  Dataset: {{ :dm:titanic.csv.zip Titanic}} {{ :dm:hands_on_dm1_pt2.zip |Hands-On Python (Titanic)}}, {{ :dm:titanic_data_understanding2.ipynb.zip Titanic DU+DP (complete)}}  KMIME: {{ :dm:00_start_with_knime.zip Intro}}, {{ :dm:01_data_understanding.zip | KNIME DU+DP}}  [[https://unipiit.sharepoint.com/sites/a__td_52415/Shared%20Documents/General/Recordings/Lecture%206%20-%20Lab_%20Data%20Understanding%20%26%20Preparation%20(cont.)-20211004_110939-Registrazione%20della%20riunione.mp4?web=1|Lecture 6]] | Citraro +|02.| 01.10.2024 | 14-16 |C1Lab. Introduction to Python | {{ :dm:dm1_lab01_python_basics_2024_25.zip Python Basics}} | Pedreschi| 
-|7. |  07.10.2021  11:00-12:45 Aula Fib A1 | Clustering: Intro & K-means | {{ :dm:clustering_1_intro-kmeans_v2.pdf |Clustering intro and k-means}} [revised version] | [[https://unipiit.sharepoint.com/:v:/r/sites/a__td_52415/Shared%20Documents/General/Recordings/Lecture%207%20-%20Clustering_1-20211007_110727-Meeting%20Recording.mp4?csf=1&web=1&e=N0MiPM|Lecture 7]] Nanni +|03.| 07.10.2024 | 11-13 |C1| Data Understanding | {{ :dm:01_dm1_data_understanding_2024_25.pdf | Data Understanding}} | Pedreschi| 
-| |  <del>11.10.2021  11:00-12:45</del> <del>Aula Fib C</del>   | | |  +|04.| 08.10.2023 | 14-16 |C1| Data Understanding & Preparation | {{ :dm:01_dm1_data_understanding_2024_25.pdf Data Understanding}}{{ :dm:02_dm1_data_preparation_2024_25.pdf Data Preparation}} | Pedreschi
-|8. |  14.10.2021  11:00-12:45 Aula Fib A1 | Clustering: k-means | [[https://unipiit.sharepoint.com/:v:/r/sites/a__td_52415/Shared%20Documents/General/Recordings/Lecture%208%20%20-%20Clustering_2-20211014_110548-Meeting%20Recording.mp4?csf=1&web=1&e=3NTFy6|Lecture 8]] Nanni +|05.| 14.10.2023 | 11-13 |C1| Data Preparation & Similarity | {{ :dm:02_dm1_data_preparation_2024_25.pdf Data Preparation}}{{ :dm:03_dm1_data_similarity_2024_25.pdf Data Similarity}} | Pedreschi| 
-| |  <del>18.10.2021  11:00-12:45</del> <del>Aula Fib C</del>  | | |  +|06.| 15.10.2024 14-16 |C1| LabData Understanding | {{ :dm:dm1_lab02_data_understanding.zip | Data Understanding}}| Pedreschi
-|9. |  21.10.2021  11:00-12:45 Aula Fib A1 | Clustering: Hierarchical methods |{{ :dm:clustering_2_hierarchical.pdf |Clustering: Hierarchical Methods}} | [[https://unipiit.sharepoint.com/:v:/s/a__td_52415/EeNn5yX1cgVJoPKX-6lhndkB0XWKk9bfeYyau6MFJto5Hg?email=a010703%40unipi.it&e=dGXqcS|Lecture 9]] | Nanni +|07.| 21.10.2024 | 11-13 |C1Introduction to ClusteringK-Means | {{ :dm:04_dm1_clustering_intro_2024_25.pdf | Intro Clustering}}, {{:dm:05_dm1_kmeans_2024_25.pdf K-Means }} Pedreschi
-|10. |  25.10.2021  11:00-12:45 Aula Fib C Clustering: density-base methods exercises| {{ :dm:clustering_3_dbscan.pdf |Clustering: Density-based methods}}  | [[https://unipiit.sharepoint.com/:v:/r/sites/a__td_52415/Shared%20Documents/General/Recordings/Lecture%2010%20-%20Clustering_4-20211025_111320-Meeting%20Recording.mp4?csf=1&web=1&e=nLgpyy|Lecture 10]] Nanni +|08.22.10.2024 | 14-16 |C1Centroid-based Clustering {{:dm:05_dm1_kmeans_2024_25.pdf K-Means }} Pedreschi
-|11. |  28.10.2021  11:00-12:45 Aula Fib A1 Lab: Clustering | {{ :dm:hands_on_dm1_clustering.zip Python Hands-On Clust. (Iris)}} {{ :dm:titanic_clustering.ipynb.zip Python Titanic}} {{ :dm:knime_clustering.zip | Knime }} | [[https://unipiit.sharepoint.com/:v:/r/sites/a__td_52415/Shared%20Documents/General/Recordings/General-20211028_110713-Registrazione%20della%20riunione.mp4?csf=1&web=1&e=QWV4UW|Lecture 11]] Citraro +|09.| 28.10.2023 | 11-13 |C1Hierarchical Clustering & Density-based Clustering {{ :dm:06_dm1_hierarchical_clustering_2024_25.pdf | Hierarchical Clustering}}, {{ :dm:07_dm1_density_based_2024_25.pdf Density-based Clustering}} Pedreschi
-|12. |  04.11.2021  11:00-12:45 Aula Fib A1 Classification: intro and decision trees | {{ :dm:classification_1_decision_trees_v3.pdf |Classification and decision trees (updated 11.11.2021)}}  [[https://unipiit.sharepoint.com/:v:/r/sites/a__td_52415/Shared%20Documents/General/Recordings/Lecture%2012%20-%20Classification_1-20211104_110838-Meeting%20Recording.mp4?csf=1&web=1&e=3Z3FhD|Lecture 12]] Nanni +|10.29.10.2024 | 14-16 |C1Lab. Clustering {{ :dm:dm1_lab03_clustering.zip Clustering}}Pedreschi
-|13. |  08.11.2021  11:00-12:45 Aula Fib C Classificationdecision trees/2 |  | [[https://unipiit.sharepoint.com/:v:/r/sites/a__td_52415/Shared%20Documents/General/Recordings/Lecture%2013%20-%20Classification_2-20211108_111102-Meeting%20Recording.mp4?csf=1&web=1&e=V4JgxX|Lecture 13]] Nanni +|11.| 04.11.2024 | 11-13 |C1Ex. Clustering | {{ :dm:ex1_dm1_clustering_2023_24.pdf | ExClustering}}| Guidotti
-|14. |  11.11.2021  11:00-12:45 Aula Fib A1 Classification: decision trees/ | [[https://unipiit.sharepoint.com/:v:/r/sites/a__td_52415/Shared%20Documents/General/Recordings/Lecture%2014%20-%20Classification_3-20211111_111114-Meeting%20Recording.mp4?csf=1&web=1&e=jXOes5|Lecture 14]] Nanni +|12.| 05.11.2024 | 14-16 |C1Intro Classification kNN | {{ :dm:08_dm1_classification_intro_2024_25.pdf | Intro Classification}}, {{ :dm:09_dm1_knn_2024_25.pdf kNN}} Guidotti
-|15. |  15.11.2021  11:00-12:45 Aula Fib C Classification: decision trees/ | [[https://unipiit.sharepoint.com/sites/a__td_52415/Shared%20Documents/General/Recordings/Lecture%2015%20-%20Classification%20_%204-20211115_111659-Meeting%20Recording.mp4?web=1|Lecture 15]] Nanni +|13.| 11.11.2024 | 11-13 |C1Naive Bayes, Exercises | {{ :dm:10_dm1_naive_bayes_2024_25.pdf Naive Bayes}} | Guidotti| 
-|16. |  18.11.2021  11:00-12:45 Aula Fib A1 Classification: decision trees exercises | {{ :dm:exercise_classification_18.11.2021.pdf |Exercise}} | [[https://unipiit.sharepoint.com/sites/a__td_52415/Shared%20Documents/General/Recordings/Lecture%2016%20-%20Classification%20_%205-20211118_111344-Meeting%20Recording.mp4?web=1|Lecture 16]] | Nanni +|14.| 12.11.2024 | 14-16 |C1| Model Evaluation, LabClassification (kNN,NB{{ :dm:11_dm1_classification_eval_2024_25.pdf Model Evaluation}}{{ :dm:dm1_lab04_classification.zip  Classification}} | Guidotti| 
-|17. |  22.11.2021  11:00-12:45 Aula Fib C | Lab:Classification | {{ :dm:knime_classification.zip knime_classification}} {{ hands_on_dm1_classification_titanic.zip | Hands_on_Python_Titanic}} {{ classification_iris_python.zip Python_Iris}}  Citraro +|15.| 14.11.2024 | 9-11 |C1| Decision Tree Classifier | {{ :dm:12_dm1_decision_trees_2024_25.pdf Decision Tree}} Guidotti
- +|16.| 18.11.2024 | 11-13 |C1Decision Tree Classifier | {{ :dm:12_dm1_decision_trees_2024_25.pdf | Decision Tree}} | Guidotti| 
 +|17.| 19.11.2024 | 14-16 |C1| Decision Tree Classifier | {{ :dm:12_dm1_decision_trees_2024_25.pdf Decision Tree}} Guidotti
 +|18.| 21.11.2024 | 9-11 |C1Decision Tree Classifier Exercises and Lab | {{ :dm:12_dm1_decision_trees_2024_25.pdf | Decision Tree}}, {{ :dm:dm1_lab04_classification.zip  Classification}} Guidotti
 +|19.| 25.11.2024 | 11-13 |C1Regression & Lab. Regression {{ :dm:13_dm1_linear_regression_2024_25.pdf | Regression}}, {{ :dm:dm1_lab05_regression.zip | Regression}}, {{ :dm:dm1_2425_imdb_rating.zip IMDb Rating}} Guidotti
 +|20.| 26.11.2024 | 14-16 |C1Into Pattern Mining and Apriori {{ :dm:14_dm1_pattern_mining_2024_25.pdf Pattern Mining}} Pedreschi
 +|21.| 28.11.2024 | 9-11 |C1Apriori & FP-Growth | {{ :dm:14_dm1_pattern_mining_2024_25.pdf | Pattern Mining}} | Guidotti
 +|22.| 02.12.2024 | 11-13 |C1| Lab. Pattern Mining & Exercises | {{ :dm:14_dm1_pattern_mining_2024_25.pdf Pattern Mining}}{{ :dm:dm1_lab06_pattern_mining.zip | Pattern Mining}}  | Guidotti| 
 +|23.| 03.12.2024 | 14-16 |C1| Rule-based Classifiers | {{ :dm:15_dm1_rule_based_classifier_2024_25.pdf Rule-based Classifiers}}  | Guidotti
 +|24.| 05.12.2024 | 9-11 |C1| FP-Growth Exercises & Project Discussion     | Guidotti|
 ===== Second Semester (DM2 - Data Mining: Advanced Topics and Applications) ===== ===== Second Semester (DM2 - Data Mining: Advanced Topics and Applications) =====
  
-^ ^ Day ^ Room ^ Topic ^ Learning material ^ Instructor Recordings +^ ^ Day ^ Time ^ Room ^ Topic ^ Material Lecturer 
-|1.| ??.02.2022 ??:00-??:00 link teams IntroductionCRIPSKNN | {{ :dm:00_dm2_intro_2021.pdf | Intro}}, {{ :dm:01_dm2_crispdm_2021.pdf | CRISP}}, {{ :dm:02_dm2_knn_2021.pdf | KNN}} | Guidotti |link registrazione +|01.| 18.02.2025 | 14-16 |A1| Overview, Imbalanced Learning | {{ :dm:16_dm2_intro_2024_25.pdf | Introduction}}, {{ :dm:dm2_project_guidelines_24_25.pdf | Guidelines}}, {{ :dm:17_dm2_imbalanced_learning_2024_25.pdf | Imbalanced Learning}}, [[https://unipiit.sharepoint.com/:v:/s/a__td_64992/EWrX2F6xAS9JtNXh1l5JIgMByAU0eMWBFr5sbGIYL3jakA|Link]] | Guidotti| 
 +|02.| 19.02.2025 | 09-11 |E| Dimensionality Reduction (Overview, Random, PCA) | {{ :dm:18_dm2_dimred_2024_25.pdf | Dimensionality Reduction}}, {{ :dm:dm2_lab01_imbalance.zip | LabImbLearn}}, {{ :dm:dm2_lab02_dimred.zip | LabDimRed}}, [[https://unipiit.sharepoint.com/:v:/s/a__td_64992/EXtiUFDI075FsJ5JZxc1AjkBZ06S7MJhVKhVUD8plrEMrg?e=RbH3To|Link]] | Guidotti| 
 +|03.| 24.02.2025 | 14-16 |E| Dimensionality Reduction (MDS, tSNE), Outlier Detection (Overview) | {{ :dm:19_dm2_anomaly_detection_2024_25.pdf | Outlier Detection}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EWeMgbhOR0lGjrIRnidGv_QBltzRtyriUeXqMHlOZXq1bA?e=8GN0eC |Link]] | Guidotti| 
 +|04.| 26.02.2025 | 09-11 |E| Outlier Detection (Methods) | {{ :dm:19_dm2_anomaly_detection_2024_25.pdf | Outlier Detection}}, {{ :dm:dm2_lab03_outlier_det.zip |LabOutDet}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EWrBINmEU_9En7HBfFl9RV8Bsfh0gv2_1scH5P0BqDkdXQ |Link]], [[https://unipiit.sharepoint.com/:v:/s/a__td_64992/EW-ssQNGV8JGvCMhhUD-xNUBFrJFU5MKNwBMy8i6u5g5IA?e=Mc3HHN | Link2]]  | Guidotti| 
 +|05.| 04.03.2025 | 11-13 |D3| Outlier Detection (Methods) | {{ :dm:19_dm2_anomaly_detection_2024_25.pdf | Outlier Detection}}, {{ :dm:dm2_lab03_outlier_det.zip |LabOutDet}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EY6wbzFDXkRHndtIx90Ayb0B77VTX8mcXQXI_Z3nIOYo9g?e=BJ3CNf |Link]], [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/Ect99g3RSAlDrdTYJgZ2Fm0B2BbVTtr-WqUE3hYjkJuQEA?e=2LnXZG |Link2]]  | Guidotti| 
 +|06.| 05.03.2025 | 09-11 |C| Outlier Detection (Methods), Gradient Descent | {{ :dm:19_dm2_anomaly_detection_2024_25.pdf | Outlier Detection}}, {{ :dm:dm2_lab03_outlier_det.zip |LabOutDet}}, {{ :dm:20_dm2_gradient_descent_2024_25.pdf | GD}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/ETgyScDQOW5Dgaoiu_RmQKsBvu7i5AIXn-hItudrFzvg4g?e=yzD6WN |Link]]  | Guidotti| 
 +|07.| 10.03.2025 | 11-13 |E| Maximum Likelihood Estimation, Odds, Log Odds, Logistic Regression | {{ :dm:21_dm2_maximum_likelihood_estimation_2024_25.pdf MLE}}, {{ :dm:22_dm2_odds_2024_25.pdf Odds}},{{ :dm:23_dm2_logistic_regression_2024_25.pdf | LogReg}}{{ :dm:dm2_lab04_logistic_reg.zip | LabLogReg}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/ETgyScDQOW5Dgaoiu_RmQKsBvu7i5AIXn-hItudrFzvg4g?e=yzD6WN |Link]], [[ [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/ETgyScDQOW5Dgaoiu_RmQKsBvu7i5AIXn-hItudrFzvg4g?e=yzD6WN |Link]] |Link2]] | Guidotti| 
 +|08.| 12.03.2025 | 09-11 |E| Support Vector Machines | {{ :dm:24_dm2_svm_2024_25.pdf | SVM}}, {{ :dm:dm2_lab05_svm.zip | LabSVM}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EU10T1Ogn7pHkUMRB0t3iGgBAJE_TX_FnXtmH2_w3w95Pw?e=wBg4Kn |Link]], [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EVn8waCZketMmS5jE3m-u5sBZHZYTbvAx87DGPgynFMv0g?e=0ohaLq |Link2]]  | Guidotti| 
 +|09.| 17.03.2025 | 11-13 |E| Neural Networks, Linear Perceptron | {{ :dm:25_dm2_perceptron_2024_25.pdf | Neural Network}}, {{ :dm:dm2_lab06_neural_networks.zip | LabNN}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EYpWxyzOb2BOriv-qzdF1LgBULySoNUU5qS18_Kj3D1pTg?e=lpdcyq |Link]]  | Guidotti| 
 +|10.| 19.03.2025 | 09-11 |E| Deep Neural Networks | {{ :dm:26_dm2_neural_network_2024_25.pdf | Deep Neural Network}}, {{ :dm:dm2_lab06_neural_networks.zip | LabNN}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EXn8ZmDM0q5FuqqGSe8VabYBS2jmjjhlS_Z5J8gqNaW-QQ?e=vg5QWG |Link]]  | Guidotti| 
 +|11.| 24.03.2025 | 11-13 |E| Ensemble Methods | {{ :dm:27_dm2_ensemble_2024_25.pdf | Ensemble Methods}}, {{ :dm:dm2_lab07_ensemble.zip |LabEnsemble}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/Efwq4uqQptVGtUFMhzy1_-UBhGUu6dii-4ZextmLvHscug?e=me0tex |Link]]  | Guidotti| 
 +|12.| 26.03.2025 | 09-11 |E| Ensemble Methods | {{ :dm:27_dm2_ensemble_2024_25.pdf | Ensemble Methods}}, {{ :dm:28_dm2_gradient_boost_2024_25.pdf | Gradient Boosting}}, {{ :dm:dm2_lab07_ensemble.zip |LabEnsemble}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EXvE6wTpN2BInKpz7FjWjXQBnBGDO64a6yv73rQ7QOCwmg?e=bQYffK |Link]]  | Guidotti| 
 +|13.| 31.03.2025 | 11-13 |E| Ensemble Methods | {{ :dm:27_dm2_ensemble_2024_25.pdf | Ensemble Methods}}, {{ :dm:28_dm2_gradient_boost_2024_25.pdf | Gradient Boosting}}, {{ :dm:dm2_lab07_ensemble.zip |LabEnsemble}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EQkGUk8URIdDjeaCv-1f4CYBtRSZkUqtdGAKu8C3Lz8QAg?e=oNkk1u |Link]], [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/ETNVlrEbWQFLtcXIh6yPTRoBzjfiWxA_4uNuUr3K7rglhA?e=7wkTSL |Link]]  | Guidotti| 
 +|14.| 02.04.2025 | 09-11 |E| Explainable Artiticial Intelligence | {{ :dm:29_dm2_explainability_2024_25.pdf | XAI}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EZm67TidbZtIs5iscelNLNkBoL5Ou1PZ7O-rO14H0GATfw?e=R97gfd |Link]]  | Guidotti| 
 +|15.| 07.04.2025 | 11-13 |E| Explainable Artiticial Intelligence | {{ :dm:29_dm2_explainability_2024_25.pdf | XAI}}, {{ :dm:dm2_lab08_xai.zip | LabXAI}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/ERnvQmqb59NCkgrLpyd85w4BZYFyxX0XnsOLk0GwqTEgyQ?e=qrmnjs |Link]]  | Guidotti| 
 +|16.| 09.04.2025 | 09-11 |E| Transactional Clustering | {{ :dm:30_dm2_transactional_clustering_2024_25.pdf | Transactional Clustering}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EeRyorJxnYtLh83YVPZ66xwBkMrr14kmDGhrscX5NuTXTw?e=hOb1zx |Link]]  | Guidotti| 
 +|17.| 14.04.2025 | 11-13 |C| Sequential Pattern Mining | {{ :dm:31_dm2_sequential_pattern_mining_2024_25.pdf | GSP}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/Efl8bH67zQ1GnSoOhGLLKgcBzF14ZdF3opb8mi-WV0KXJg?e=akHpP4 |Link]]  | Guidotti| 
 +|18.| 16.04.2025 | 9-11 |C| Sequential Pattern Mining | {{ :dm:31_dm2_sequential_pattern_mining_2024_25.pdf | GSP}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/ERycsmADkv5Nhq8xbRadgSwBa3EWHdtBIZJhYlj1grEKCQ?e=hKYcni |Link]]  | Guidotti| 
 +|19.| 28.04.2025 | 11-13 |E| Time Series - Intro & Preprocessing |{{ :dm:32_dm2_time_series_preprocessing_2024_25.pdf | TS_Preprocessing}}, {{ :dm:dm2_lab09_ts_preprocessing.zip | LabTS_Prep}}, Video Missing | Guidotti| 
 +|20.| 30.04.2025 | 09-11 |E| Time Series - Similarities & Distances | {{ :dm:33_dm2_time_series_similarity_2024_25.pdf | TS_Similarity}}, {{ :dm:dm2_lab10_ts_dist.zip | LabTS_Sim}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/ER3DhpHdMqBArNpEA1NKPgcBZysjQTGU3W9HTc6bzaaCCQ?e=Aeh7CD |Link]] | Guidotti| 
 +|21.| 05.05.2025 | 09-11 |E| Time Series - Aprroximation & Clustering | {{ :dm:34_dm2_time_series_approximation_clustering_2024_25.pdf | TS_ApproxClustering}}, {{ :dm:dm2_lab11_ts_approx_clustering.zip  | LabTS_ApproxClustering}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/ER3DhpHdMqBArNpEA1NKPgcBZysjQTGU3W9HTc6bzaaCCQ?e=Aeh7CD |Link]] | Guidotti| 
 +|22.| 07.05.2025 | 11-13 |E| Time Series - Matrix Profile |{{ :dm:35_dm2_time_series_matrix_profile_2024_25.pdf | TS_MatrixProfile}}, {{ :dm:dm2_lab12_ts_matrixprofile.zip | LabTS_MatrixProfile}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EdtlOrd_oFRPjiOmAx-eKd0BvpEh1j9xriGhEgN8Vv0bkw?e=FpcIGt |Link]] | Guidotti
 +|23.| 12.05.2025 | 09-11 |E| Time Series - Classification | {{ :dm:36_dm2_time_series_classification_2024_25.pdf | TS_Classification}}, {{ :dm:dm2_lab13_ts_classification.zip | LabTS_Classification}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EV_0MjXUb1NEkCijo7f18fgB2NUoSzKUSO6hZZn6sxKJ0w?e=WMGGL3 |Link]] | Guidotti| 
 +|24.| 14.05.2025 | 11-13 |E| Time Series - Classification | {{ :dm:36_dm2_time_series_classification_2024_25.pdf | TS_Classification}}, {{ :dm:dm2_lab13_ts_classification.zip | LabTS_Classification}}, [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EQ6U-VR5E-5AvIxRMtT5jo0B9pLAc6khNdAQIeXr40xByg?e=jvGVph |Link]], [[ https://unipiit.sharepoint.com/:v:/s/a__td_64992/EeumwM-NkM5Jtr3Xt8BeCP4BDWGlbVJSNsAnKS4ogh_Uxg?e=naKzjZ |Link]] | Guidotti|
 ====== Exams ====== ====== Exams ======
  
-===== Exam DM1 ======+** How and Where: ** 
 +The exam will take place in oral mode only at the teacher's office or classroom previously designated. 
 +The exam will be held online on the 420AA Data Mining course channel only at the request of the 
 +student in accordance with current legislation.
  
-The exam is composed of two parts:+** When: ** 
 +The dates relating to the start of the three exams are/will be published on the online platform 
 +https://esami.unipi.it/. Within each session, we will identify dates and slots in order to distribute the 
 +various orals. The dates and slots to take the exam will be published on the course page by the end of 
 +May. Each student must also register on https://esami.unipi.it/. The examination can only be carried out after the delivery of the project. The project must be delivered one week before when you want to take the exam. Group oral discussions will be preferred in respect of the project groups in order to parallelize any discussion on the project. It is not mandatory to take the oral exam together with the other members of the group.  
 +In the event that the oral exam is not passed, it will not be possible to take it for 20 days. If the project is not considered sufficient, it must be carried out again on a new dataset or a very updated version of the current one.
  
-  An **oral exam **, that includes: (1) discussing the project report; (2) discussing topics presented during the classesincluding the theory and practical exercises +** What: **  
- +The oral test will evaluate the practical understanding of the algorithms. The exam will evaluate three aspects. 
-  * A **project**, that consists in exercises requiring the use of data mining tools for analysis of dataExercises include: data understanding, clustering analysis, frequent pattern mining, and classification (guidelines will be provided for more details). The project has to be performed by min 3, max 4 people. It has to be performed by using Knime, Python or a combination of them. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The paper must be emailed to [[datamining.unipi@gmail.com]]. Please, use “[DM1 2021-2022] Project” in the subject.  +  - Understanding of the theoretical aspects of the topics addressed during the course. The student may be required to write on formulas or pseudocode. During the explanations, the student can use pen and paper
-=== Project 1 === +  - Understanding of the algorithms illustrated during the course and their practical implementationYou will be asked to perform one or more simple exercises. The text will be shown on the teacher's screen and / or copied to Miro. The student will have to use pen and paper (if online by Miro https://miro.comto show how the exercise is solved
-  - Assigned30/09/2021 +  - Discussion of the project with questions from the teacher regarding unclear aspects, 
-  - MidTerm Deadline: **21/11/2021** (half project required, i.e., Data understanding & Preparation and at least 2 clustering algorithms) +questionable steps or choices.
-  - Final Deadline: **14/01/2022** (complete project required) +
-  - Data: choose between {{ :dm:glasgow_norms.zip | Glasgow Norms}}, {{ :dm:Seismic_Bumps.zip | Seismic Bumps}}+
  
 +** Final Mark: ** for 12-credit exam, the final mark will be obtained as the
 +average mark of DM1 and DM2.
  
 +*** Exams Registration Instructions for DM1***
 +- Use the Google registration form: [[https://forms.gle/JFULK3nNsHBU6Tqa8|here]] if you cannot register on Esami on Data Mining for year 2024/2025. 
 +- When the registration closes you will receive a link to the Agenda
 +- Register on the Agenda selecting day and time (do not change you choice or cancel, if you book you want to do the exam)
 +- Submit the project at least 1 week before the day you selected (or within 31/12 to get +0.5 extra mark)
  
 +===== Exam Booking Periods =====
 +  * Exam portal link: [[https://esami.unipi.it/|here]]
 +  * Registration Form: [[https://forms.gle/NuAvCa3YK2h8MgrX7|here]]
 +  * 1st Appello: from 08/01/2025 to 16/01/2025
 +  * 2nd Appello: from 30/01/2025 to 05/02/2025
 +  * 3rd Appello: from TBD to TBD
 +  * 4th Appello: from TBD to TBD
 +  * 5th Appello: from TBD to TBD
 +  * 6th Appello: from TBD to TBD
    
-===== Exam DM part II (DMA) ====== 
  
-** Exam Rules** +===== Exam DM1 ======
-  * Rules for DM2 exam available {{ :dm:dm2_exam_rules.pdf | here}}. +
- +
-**Exam Booking Periods** +
-  * 3rd Appello: ??/??/2022 00:00 - ??/??/2022 23:59 +
-  * 4th Appello: ??/??/2022 00:00 - ??/??/2022 23:59 +
-  * 5th Appello: ??/??/2022 00:00 - ??/??/2022 23:59 +
- +
-**Exam Booking Agenda** +
-  * Agenda Link: ??? +
-  * 3rd Appello: starts ??/??/2022 +
-  * 4th Appello: starts ??/??/2022 +
-  * 5th Appello: starts ??/??/2022 +
-  * Important! if you book in the agenda in data in days between ??/??/2022 and ??/??/2022 you MUST be registered for the 3rd appello, if you book in the agenda in data in days between ??/??/2022 and ??/??/2022 you must be registered for the 4th appello, if you book in the agenda in data in days after ??/??/2022 you must be registered for the 5th appello. +
- +
-The link to the agenda for booking a slot for the exam is displayed at the end of the registration. +
-During the exam the camera must remain open and you must be able to share your screen. For the exam could be required the usage of the Miro platform (https://miro.com/app/dashboard/).+
  
 The exam is composed of two parts: The exam is composed of two parts:
  
-  * **project**, that consists in employing the methods and algorithms presented during the classes for solving exercises on a given dataset. The project has to be realized by max 3 people. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 30 pages (suggested 25) of text including figures + 1 cover page (minimum font 11, minimum interline 1). The project must be delivered at least 7 days before the oral exam. The project must be delivered to [[riccardo.guidotti@unipi.it]] AND [[francesco.spinnato@sns.it]] with subject "[DM2 Project]"+  * An **oral exam**, that includes: (1) discussing the project report; (2) discussing topics presented during the classesincluding the theory and practical exercises
  
-  * An **oral exam**, that includes: (1discussing topics presented during the classesincluding the theory of the parts already covered by the written exam; (2resolving simple exercises using the Miro platform; (3discussing the project report with a group presentation;  +  * **project**, that consists in exercises requiring the use of data mining tools for analysis of data. Exercises includedata understanding, clustering analysis, pattern mining, and classification (guidelines will be provided for more details). The project has to be performed by min 2max 3 people. It has to be performed by using Python or any other data mining software. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The paper must be emailed to [[andrea.fedele@phd.unipi.it]] and [[riccardo.guidotti@unipi.it]]. Please, use “[DM1 2024-2025] Project” in the subject. 
 +  
 +  * **Dataset** 
 +    - Assigned: 15/10/2024 
 +    - MidTerm Submission: <del>15/11/2024</del> **22/11/2024** (+0.5) (half project required, i.e., Data Understanding & Preparation and Clustering) 
 +    - Final Submission: 31/12/2024 (+0.5one week before the oral exam (complete project required). 
 +    - Dataset: {{ :dm:dm1_dataset_2425_imdb.zip | IMDb}}
  
-  * **Dataset**: the data is about ??? and can be downloaded here: ??? +** DM1 Project Guidelines ** 
-     * Data can be downloaded here ??? +See {{ :dm:dm1_project_guidelines_24_25.pdf | Project Guidelines}}.
-     * Submission Draft 1??/??/2022 23:59 Italian Time (we expect Module 1 and Module 2) +
-     * Submission Draft 2: ??/??/2022 23:59 Italian Time (we expect Module 3) +
-     * Final Submissionone week before the oral exam.+
  
-** Project Guidelines ** 
  
-  * **Module 1 - Introduction, Imbalanced Learning and Anomaly Detection** +===== Exam DM2 ======
-      - Explore and prepare the dataset. You are allowed to take inspiration from the associated GitHub repository and figure out your personal research perspective (from choosing a subset of variables to the class to predict…). You are welcome in creating new variables and performing all the pre-processing steps the dataset needs. +
-      - Define one or more (simple) classification tasks and solve it with Decision Tree and KNN. You decide the target variable. +
-      - Identify the top 1% outliers: adopt at least three different methods from different families (e.g., density-based, angle-based... ) and compare the results. Deal with the outliers by removing them from the dataset or by treating the anomalous variables as missing values and employing replacement techniques. In this second case, you should check that the outliers are not outliers anymore. Justify your choices in every step. +
-      - Analyze the value distribution of the class to predict with respect to point 2; if it is unbalanced leave it as it is, otherwise turn the dataset into an imbalanced version (e.g., 96% - 4%, for binary classification). Then solve the classification task using the Decision Tree or the KNN by adopting various techniques of imbalanced learning. +
-      - Draw your conclusions about the techniques adopted in this analysis.+
  
-  * **Module 2 - Advanced Classification Methods** +The exam is composed of two parts:
-      - Solve the classification task defined in Module 1 (or define new ones) with the other classification methods analyzed during the course: Naive Bayes Classifier, Logistic Regression, Rule-based Classifiers, Support Vector Machines, Neural Networks, Ensemble Methods and evaluate each classifier with the techniques presented in Module 1 (accuracy, precision, recall, F1-score, ROC curve). Perform hyper-parameter tuning phases and justify your choices. +
-      - Besides the numerical evaluation draw your conclusions about the various classifiers, e.g. for Neural Networks: what are the parameter sets or the convergence criteria which avoid overfitting? For Ensemble classifiers how the number of base models impacts the classification performance? For any classifier which is the minimum amount of data required to guarantee an acceptable level of performance? Is this level the same for any classifier? What is revealing the feature importance of Random Forests? +
-      - Select two continuous attributes, define a regression problem and try to solve it using different techniques reporting various evaluation measures. Plot the two-dimensional dataset. Then generalize to multiple linear regression and observe how the performance varies.+
  
-  * **Module 3 - Time Series Analysis** +  * An **oral exam**, that includes: (1discussing the project report; (2discussing topics presented during the classesincluding the theory and practical exercises
-      - Select the feature(syou prefer and use it (them) as a time series. You can use the temporal information provided by the authors’ datasets, but you are also welcome in exploring the .mp3 files to build your own dataset of time series according to your purposes. You should prepare a dataset on which you can run time series clusteringmotif/anomaly discovery and classification.  +
-      - On the dataset created, compute clustering based on Euclidean/Manhattan and DTW distances and compare the results. To perform the clustering you can choose among different distance functions and clustering algorithms. Remember that you can reduce the dimensionality through approximation. Analyze the clusters and highlight similarities and differences. +
-      - Analyze the dataset for finding motifs and/or anomalies. Visualize and discuss them and their relationship with other features. +
-      - Solve the classification task on the time series dataset(sand evaluate each result. In particularyou should use shapelet-based classifiers. Analyze the shapelets retrieved and discuss if there are any similarities/differences with motifs and/or shapelets+
  
-  * **Module 4 - Sequential Patterns and Advanced Clustering**  +  * **project**, that consists in exercises requiring the use of data mining tools for analysis of data. Exercises includeimbalanced learning, dimensionality reduction, outlier detection, advanced classification/regression methods, time series analysis/clustering/classification (guidelines will be provided for more details)The project has to be performed by min 1max 3 people. It has to be performed by using Python or any other data mining softwareThe results of the different tasks must be reported in a unique paperThe total length of this paper must be max 30 pages of text including figuresThe paper must be emailed to [[andrea.fedele@phd.unipi.it]] and [[riccardo.guidotti@unipi.it]]. Pleaseuse “[DM2 2024-2025] Project” in the subject
-      - Sequential Pattern MiningConvert the time series into a discrete format (e.g., by using SAX) and extract the most frequent sequential patterns (of at least length 3/4) using different values of support, then discuss the most interesting sequences. +  
-      - Advanced Clustering: On a dataset already prepared for one of the previous tasks in Module 1 or Module 2, run at least one clustering algorithm presented in the advanced clustering lectures (e.gX-Means, Bisecting K-Means, OPTICS)Discuss the results that you find analyzing the clusters and reporting external validation measures (e.g SSEsilhouette)+  * **Dataset** 
-      Transactional ClusteringBy using categorical features, or by turning a dataset with continuous variables into a dataset with categorical variables (e.g. by using binning), run at least one clustering algorithm presented in the transactional clustering lectures (e.g. K-Modes, ROCK). Discuss the results that you find analyzing the clusters and reporting external validation measures (e.g SSE, silhouette).+    Assigned: 18/02/2025 
 +    - MidTerm Submission: 07/05/2025 
 +    - Final Submission: one week before the oral exam (complete project required). 
 +    Dataset: {{ :dm:dm2_dataset_2425_imdb.zip | IMDb Extended & IMDb Time Series}}
  
-  * **Module 5 - Explainability (optional)**  +** DM2 Project Guidelines ** 
-      - Try to use one or more explanation methods (e.g., LIME, LORE, SHAP, etc.) to illustrate the reasons for the classification in one of the steps of the previous tasks.+See {{ :dm:dm2_project_guidelines_24_25.pdf | Project Guidelines}}.
  
  
  
  
-N.B. When "solving the classification task", remember, (i) to test, when needed, different criteria for the parameter estimation of the algorithms, and (ii) to evaluate the classifiers (e.g., Accuracy, F1, Lift Chart) in order to compare the results obtained with an imbalanced technique against those obtained from using the "original" dataset.  
- 
- 
- 
-====== Exam Dates ====== 
- 
-===== Exam Sessions ===== 
-^ Session ^ Date            ^ Time        ^ Room   ^ Notes ^ Marks ^ 
-|1.|16.01.2019| 14:00 - 18:00| [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Please, use the system for registration: https://esami.unipi.it/ | | 
  
 ===== Past Exams ===== ===== Past Exams =====
Linea 299: Linea 286:
   * Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics {{:dm:crossroadsxrds2012fall-dl.pdf|download}}   * Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics {{:dm:crossroadsxrds2012fall-dl.pdf|download}}
   * Peter Sondergaard, Gartner, Says Big Data Creates Big Jobs. Oct 22, 2012: [[https://www.youtube.com/watch?v=mXLy3nkXQVM|YouTube video]]   * Peter Sondergaard, Gartner, Says Big Data Creates Big Jobs. Oct 22, 2012: [[https://www.youtube.com/watch?v=mXLy3nkXQVM|YouTube video]]
- 
   * Towards Effective Decision-Making Through Data Visualization: Six World-Class Enterprises Show The Way. White paper at FusionCharts.com. [[http://www.fusioncharts.com/whitepapers/downloads/Towards-Effective-Decision-Making-Through-Data-Visualization-Six-World-Class-Enterprises-Show-The-Way.pdf|download]]   * Towards Effective Decision-Making Through Data Visualization: Six World-Class Enterprises Show The Way. White paper at FusionCharts.com. [[http://www.fusioncharts.com/whitepapers/downloads/Towards-Effective-Decision-Making-Through-Data-Visualization-Six-World-Class-Enterprises-Show-The-Way.pdf|download]]
  
 ====== Previous years ===== ====== Previous years =====
 +  * [[dm_ds2023-24]]
 +  * [[dm.2022-23ds]]
 +  * [[dm.2021-22ds]]
   * [[dm.2020-21]]   * [[dm.2020-21]]
-   * [[dm.2019-20]] +  * [[dm.2019-20]] 
-   * [[dm.2018-19]] +  * [[dm.2018-19]] 
-   * [[dm.2017-18]]+  * [[dm.2017-18]]
   * [[dm.2016-17]]   * [[dm.2016-17]]
   * [[dm.2015-16]]   * [[dm.2015-16]]
Linea 313: Linea 302:
   * [[dm.2012-13]]   * [[dm.2012-13]]
   * [[dm.2011-12]]   * [[dm.2011-12]]
-  * [[dm.2010-11]] 
-  * [[dm.2009-10]] 
-  * [[dm.2008-09]] 
-  * [[dm.2007-08]] 
-  * [[dm.2006-07]] 
-  * [[PhDWorkshop2011]] 
-  * [[SNA.Ingegneria2011]] 
-  * [[SNA.IMT.2011]] 
-  * [[MAINS.SANTANNA.2011-12]] 
-  * [[MAINS.SANTANNA.DM4CRM.2012]] 
-  * [[MAINS.SANTANNA.DM4CRM.2016]] 
-  * [[MAINS.SANTANNA.DM4CRM.2017 | Data Mining for Customer Relationship Management 2017]] 
-  * [[MAINS.SANTANNA.DM4CRM.2018]] 
-  * [[MAINS.SANTANNA.DM4CRM.2019]] 
-  * [[SDM2018 | Instructions for camera ready and copyright transfer]] 
-  * [[DM-SAM | Storie dell'Altro Mondo]] 
-  * [[DM-I40 | Master Industry 4.0]] 
  
dm/start.1638260751.txt.gz · Ultima modifica: 30/11/2021 alle 08:25 (4 anni fa) da Fosca Giannotti

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki