Strumenti Utente

Strumenti Sito


dm:start

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisione Revisione precedente
Prossima revisione
Revisione precedente
dm:start [15/10/2019 alle 17:21 (5 anni fa)]
Anna Monreale [First part of course, first semester (DM1 - Data mining: foundations & DM - Data Mining)]
dm:start [26/03/2024 alle 17:16 (2 giorni fa)] (versione attuale)
Riccardo Guidotti [Second Semester (DM2 - Data Mining: Advanced Topics and Applications)]
Linea 9: Linea 9:
 ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true}); ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true});
 ga('personalTracker.require', 'linker'); ga('personalTracker.require', 'linker');
-ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it'] ); +ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it', 'luciacpassaro.github.io'] );    
-  +
 ga('personalTracker.require', 'displayfeatures'); ga('personalTracker.require', 'displayfeatures');
-ga('personalTracker.send', 'pageview', 'ruggieri/teaching/dm/');+ga('personalTracker.send', 'pageview', 'courses/dm/');
 setTimeout("ga('send','event','adjusted bounce rate','30 seconds')",30000);  setTimeout("ga('send','event','adjusted bounce rate','30 seconds')",30000); 
 </script> </script>
 <!-- End Google Analytics --> <!-- End Google Analytics -->
 +<!-- Global site tag (gtag.js) - Google Analytics -->
 +<script async src="https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W"></script>
 +<script>
 +  window.dataLayer = window.dataLayer || [];
 +  function gtag(){dataLayer.push(arguments);}
 +  gtag('js', new Date());
 +
 +  gtag('config', 'G-LPWY0VLB5W');
 +</script>
 <!-- Capture clicks --> <!-- Capture clicks -->
 <script> <script>
Linea 42: Linea 50:
 </script> </script>
 </html> </html>
-====== Data Mining A.A. 2019/20 ======+====== Data Mining A.A. 2023/24 ======
  
-===== DM 1: Foundations of Data Mining (6 CFU) =====+===== DM1 - Data Mining: Foundations (6 CFU) =====
  
-Instructors - Docenti:+Instructors:
   * **Dino Pedreschi**   * **Dino Pedreschi**
-    * KDD Laboratory, Università di Pisa ed ISTI - CNR, Pisa+    * KDDLab, Università di Pisa
     * [[http://www-kdd.isti.cnr.it]]     * [[http://www-kdd.isti.cnr.it]]
     * [[dino.pedreschi@unipi.it]]       * [[dino.pedreschi@unipi.it]]  
  
-  +  * **Riccardo Guidotti** 
-    +    * KDDLab, Università di Pisa 
 +    * [[https://kdd.isti.cnr.it/people/guidotti-riccardo]]    
 +    * [[riccardo.guidotti@di.unipi.it]]
  
-===== DM 2: Advanced topics on Data Mining and case studies (6 CFU) =====+Teaching Assistant 
 +  * **Andrea Fedele** 
 +    * KDDLab, Università di Pisa 
 +    * [[https://www.linkedin.com/in/andrea-fedele/?originalSubdomain=it]] 
 +    * [[andrea.fedele@phd.unipi.it]]   
 +===== DM2 - Data Mining: Advanced Topics and Applications (6 CFU) =====
  
 Instructors: Instructors:
-  * **Mirco Nanni, Dino Pedreschi** +  * **Riccardo Guidotti** 
-    * KDD Laboratory, Università di Pisa and ISTI - CNR, Pisa +    * KDDLab, Università di Pisa 
-    * [[http://www-kdd.isti.cnr.it]]    +    * [[https://kdd.isti.cnr.it/people/guidotti-riccardo]]    
-    * [[mirco.nanni@isti.cnr.it]] +    * [[riccardo.guidotti@di.unipi.it]]
-    *  [[dino.pedreschi@unipi.it]] +
  
-===== DM: Data Mining (9 CFU) =====+Teaching Assistant 
 +  * **Andrea Fedele** 
 +    * KDDLab, Università di Pisa 
 +    * [[https://www.linkedin.com/in/andrea-fedele/?originalSubdomain=it]] 
 +    * [[andrea.fedele@phd.unipi.it]]   
 +    * Meeting: https://calendly.com/andreafedele/ 
 +====== News ======
  
-Instructors: +     * **[19.01.2024]** DM2 Lectures will start on Mon 19/02only for that lecture the time will be 14-16 instead of 9-11. 
-  * **Dino Pedreschi, Anna Monreale** +     * [13.10.2023] To schedule meeting with the Teaching Assistant you can use: https://calendly.com/andreafedele/ 
-    * KDD LaboratoryUniversità di Pisa and ISTI CNR, Pisa +     * [20.09.2023Recordings of the lectures can be found on the web pages of the course for the years 2020/2021 and 2021/2022 (see links at the bottom of this page) 
-    * [[http://www-kdd.isti.cnr.it]]    +     * [20.09.2023] Thursday 21 September there will be no lecture. 
-    * [[mirco.nanni@isti.cnr.it]+     [11.09.2023] Lectures will start on Monday 18 September 2023 at 11.00 room C1. 
-    * [[dino.pedreschi@unipi.it]]  +     * [11.09.2023Lectures will be in presence only. Registrations of the lectures of past years can be found at the bottom of this web page. 
-    * [[anna.monreale@unipi.it]]   +     * [11.09.2023] Project Groups [[https://docs.google.com/spreadsheets/d/10R5AcqdlXsqTAxSys6zyqArvdytq4HH6Ik8Uy-NHkQ4/edit?usp=sharing|link]] 
 +     * [11.09.2023] MS Teams [[https://teams.microsoft.com/l/team/19%3a7uEgK_aekrBFuOsbREccAa-tfqeSwvfBemfK_lG6HA01%40thread.tacv2/conversations?groupId=84cc4fec-41fc-4208-a9d4-a02675216d22&tenantId=c7456b31-a220-47f5-be52-473828670aa1|link]]  
 +====== Learning Goals ====== 
 +  * DM1 
 +     * Fundamental concepts of data knowledge and discovery. 
 +     * Data understanding 
 +     * Data preparation 
 +     * Clustering 
 +     * Classification 
 +     * Pattern Mining and Association Rules 
 +     * Sequential Pattern Mining
  
 +  * DM2
 +     * Outlier Detection
 +     * Dimensionality Reduction
 +     * Regression 
 +     * Advanced Classification and Regression
 +     * Time Series Analysis
 +     * Transactional Clustering
 +     * Explainability
  
-====== News ===== +====== Hours and Rooms ======
-  * **[03.10.2019] Please, fill the [[https://docs.google.com/spreadsheets/d/1oBz19UhXHcXox7QPfJ1g1jWTsNRW27cK7tc6MZrxvZE/edit?usp=sharing| spreadsheet]] with name of the group (Group1, Group2, ...), the list of students composing the group. ** +
-  * [26.09.2019] Global Climate Strike: teachers of DM course tomorrow Friday September 27 will join the  Global Climate strike, so tomorrow the lecture is suppressed. +
-  * [18.09.2019] Event: "Privacy: limite o opportunità? Gli esempi delle Nuove Tecnologie e dei Dati Sanitari" {{ :dm:locandina_seminario_18_settembre_def_2.pdf | Information here}}.  +
-  +
-====== Learning goals -- Obiettivi del corso ======+
  
-** ... a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the "sexiest" around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them. ** +===== DM1 =====
- +
-//Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.//+
  
-La grande disponibilità di dati provenienti da database relazionali, dal web o da altre sorgenti motiva lo studio di tecniche di analisi dei dati che permettano una migliore comprensione ed un più facile utilizzo dei risultati nei processi decisionali. L'obiettivo del corso è quello di fornire un'introduzione ai concetti di base del processo di estrazione di conoscenza, alle principali tecniche di data mining ed ai relativi algoritmi. Particolare enfasi è dedicata agli aspetti metodologici presentati mediante alcune classi di applicazioni paradigmatiche quali il Basket Market Analysis, la segmentazione di mercato, il rilevamento di frodi. Infine il corso introduce gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza.  Il corso consiste delle seguenti parti:  +**Classes**
-  - i concetti di base del processo di estrazione della conoscenza: studio e preparazione dei dati, forme dei dati, misure e similarità dei dati; +
-  - le principali tecniche di datamining (regole associative, classificazione e clustering). Di queste tecniche si studieranno gli aspetti formali e implementativi; +
-  - alcuni casi di studio nell’ambito del marketing e del supporto alla gestione clienti, del rilevamento di frodi e di studi epidemiologici.  +
-  - l’ultima parte del corso ha l’obiettivo di  introdurre gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza +
- +
-===== Reading about the "data scientist" job ===== +
- +
-  * Data, data everywhere. The Economist, Feb. 2010 {{:dm:economist--010.pdf|download}} +
-  * Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011 [[http://tech.fortune.cnn.com/2011/09/06/data-scientist-the-hot-new-gig-in-tech/|link]] +
-  * Welcome to the yotta world. The Economist, Sept. 2011 {{:dm:economist-2012-dm.pdf|download}} +
-  * Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, Sept 2012 [[http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1|link]] +
-  * Il futuro è già scritto in Big Data. Il SOle 24 Ore, Sept 2012 [[http://www.ilsole24ore.com/art/tecnologie/2012-09-21/futuro-scritto-data-155044.shtml?uuid=AbOQCOhG|link]] +
-  * Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics {{:dm:crossroadsxrds2012fall-dl.pdf|download}} +
-  * Peter Sondergaard, Gartner, Says Big Data Creates Big Jobs. Oct 22, 2012: [[https://www.youtube.com/watch?v=mXLy3nkXQVM|YouTube video]] +
- +
-  * Towards Effective Decision-Making Through Data Visualization: Six World-Class Enterprises Show The Way. White paper at FusionCharts.com. [[http://www.fusioncharts.com/whitepapers/downloads/Towards-Effective-Decision-Making-Through-Data-Visualization-Six-World-Class-Enterprises-Show-The-Way.pdf|download]] +
-====== Hours - Orario e Aule ====== +
- +
-===== DM1 & DM ===== +
- +
-**Classes - Lezioni**+
  
 ^  Day of Week  ^  Hour  ^  Room  ^  ^  Day of Week  ^  Hour  ^  Room  ^ 
-|  Lunedì/Monday |  14:00 - 16:00  |  Aula E1  |  +|  Monday  |  11:00 - 13:00  |  C1   |  
-|  Mercoledì/Wednesday |  16:00 - 18:00  |  Aula A1  |  +|  Wednesday  |  11:00 - 13:00  |  C1  | 
-|  Venerdì/Friday |  11:00 - 13:00  |  Aula C1  |  +
  
 **Office hours - Ricevimento:** **Office hours - Ricevimento:**
  
-  * Prof. Pedreschi: Lunedì/Monday  h 14:00 - 16:00, Dipartimento di Informatica +  * Prof. Pedreschi 
-  * Prof. Monreale:  Lunedì/Monday  h 09:00 - 11:00, Dipartimento di Informatica+      * Monday 16:00 - 18:00 
 +      * Online 
 +  * Prof. Guidotti 
 +      * Tuesday 16:00 - 18:00 or Appointment by email 
 +      * Room 363 Dept. of Computer Science or MS Teams
  
      
Linea 123: Linea 136:
  
  
-**Classes - Lezioni**+**Classes**
  
-^  Day of week   Hour  ^  Room  ^  +^  Day of Week   Hour  ^  Room  ^  
-Thursday 14 16  A1 |  + Monday    09:00 11:00     |  
-Friday 16 18 C1 |  + Wednesday   11:00 13:00   C  |  
  
-**Office hours - Ricevimento:**+**Office Hours - Ricevimento:** 
 + 
 +  * Tuesday 15.00-17.00 or Appointment by email 
 +  * Room 363 Dept. of Computer Science or MS Teams
  
-  * Nanni : appointment by email, c/o ISTI-CNR 
 ====== Learning Material -- Materiale didattico ====== ====== Learning Material -- Materiale didattico ======
  
Linea 138: Linea 153:
   * Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, 2006   * Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, 2006
     * [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php]]     * [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php]]
-    * I capitoli 46sono disponibili sul sito del publisher. -- Chapters 4,and are also available at the publisher's Web site.+    * I capitoli 35sono disponibili sul sito del publisher. -- Chapters 3,and are also available at the publisher's Web site.
   * Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7   * Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7
   * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition.   * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition.
Linea 144: Linea 159:
  
  
-===== Slides of the classes -- Slides del corso =====+===== Slides =====
  
-  * The slides used in the course will be inserted in the calendar after each class. Most of them are part of the the slides provided by the textbook's authors [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php#item4|Slides per "Introduction to Data Mining"]].+  * The slides used in the course will be inserted in the calendar after each class. Most of them are part of the slides provided by the textbook's authors [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php#item4|Slides per "Introduction to Data Mining"]].
        
-===== Past Exams ===== 
  
-* Some text of past exams on **DM1 (6CFU)**:+   
 +===== Software=====
  
-  * {{ :dm:2017-1-19.pdf |}}, {{ :dm:2017-9-6.pdf |}}, {{ :dm:2016-05-30-dm1-seconda.pdf |}}+  * Python - Anaconda (>3.7)Anaconda is the leading open data science platform powered by Python. [[https://www.anaconda.com/distribution/Download page]] (the following libraries are already included) 
 +  * Scikit-learnpython library with tools for data mining and data analysis [[http://scikit-learn.org/stable/ Documentation page]] 
 +  * Pandaspandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. [[http://pandas.pydata.org/ Documentation page]]
  
-* Some solutions of past exams containing exercises on KNN and Naive Bayes classifiers  **DM1 (9CFU)**+Other softwares for Data Mining 
-  * {{ :dm:dm2_exam.2017.06.13_solutions.pdf |}}{{ :dm:dm2_exam.2017.07.04_solutions.pdf |}}, {{ :dm:dm2_mid-term_exam.2017.06.06_solutions.pdf |}}+  [[http://www.knime.org | KNIME ]] The Konstanz Information Miner. [[http://www.knime.org/download-desktop| Download page ]] 
 +  * [[http://www.cs.waikato.ac.nz/ml/weka/ WEKA ]] Data Mining Software in JAVA. University of WaikatoNew Zealand [[http://www.cs.waikato.ac.nz/ml/weka/ Download page ]] 
 +  * Didactic Data Mining [[http://matlaspisa.isti.cnr.it:5055/Help| DDMv1]], [[https://kdd.isti.cnr.it/ddm/#/DDMv2]]  
 +  
 +====== Class Calendar (2023/2024) ======
  
-* Some exercises (partially with solutionson **sequential patterns** and **time series** can be found in the following texts of exams from the last years: +===== First Semester (DM1 - Data Mining: Foundations=====
-    * {{ :dm:dm2_exam.2015.04.13.results.pdf|}}, {{ :dm:dm2_exam.2016.04.4_sol.pdf |}}, {{ :dm:dm2_exam.2016.04.5_sol.pdf |}}, {{ :dm:dm2_exam.2016.06.20_sol.pdf |}}, {{ :dm:dm2_exam.2016.07.08_sol.pdf |}}+
  
 +^ ^ Day ^ Time ^ Room ^ Topic ^ Material ^ Lecturer ^
 +|01.| 18.09.2023 | 11-13 |C1| Overview, Introduction | {{ :dm:00_dm1_introduction_2023_24.pdf | Intro}} | Pedreschi|
 +|   | 20.09.2023 | 11-13 |  | No Lecture |  |  |
 +|02.| 25.09.2023 | 11-13 |C1| Lab. Introduction to Python | {{ :dm:dm1_lab01_python_basics.zip | Python Basic}} | Guidotti|
 +|03.| 27.09.2023 | 11-13 |C1| Lab. Data Understanding | {{ :dm:dm1_lab02_data_understanding.zip | Data Understanding}} | Guidotti|
 +|04.| 02.10.2023 | 11-13 |C1| Data Understanding | {{ :dm:01_dm1_data_understanding_2023_24.pdf | Data Understanding}} | Guidotti|
 +|05.| 04.10.2023 | 11-13 |C1| Data Understanding & Preparation | {{ :dm:01_dm1_data_understanding_2023_24.pdf | Data Understanding}}, {{ :dm:02_dm1_data_preparation_2023_24.pdf | Data Preparation}} | Pedreschi|
 +|06.| 09.10.2023 | 11-13 |C1| Data Preparation & Data Similarity | {{ :dm:02_dm1_data_preparation_2023_24.pdf | Data Preparation}}, {{ :dm:03_dm1_data_similarity_2023_24.pdf | Data Similarity}} | Pedreschi|
 +|07.| 11.10.2023 | 11-13 |C1| Data Similarity & Lab. Data Understanding | {{ :dm:03_dm1_data_similarity_2023_24.pdf | Data Similarity}}, {{ :dm:dm1_lab02_data_understanding.zip | Data Understanding}} | Pedreschi|
 +|08.| 16.10.2023 | 11-13 |C1| Introduction to Clustering, K-Means | {{ :dm:04_dm1_clustering_intro_2023_24.pdf | Intro_Clustering}}, {{:dm:05_dm1_kmeans_2023_24.pdf | K-Means }} | Pedreschi|
 +|09.| 18.10.2023 | 11-13 |C1| Clustering Validation, Hierarchical Clustering | {{ :dm:04_dm1_clustering_intro_2023_24.pdf | Intro_Clustering}}, {{ :dm:06_dm1_hierarchical_clustering_2023_24.pdf | Hierarchical}} | Pedreschi|
 +|10.| 23.10.2023 | 11-13 |C1| Density-based Clustering | {{ :dm:07_dm1_density_based_2023_24.pdf | Density-based Clustering}} | Pedreschi|
 +|11.| 25.10.2023 | 11-13 |C1| Lab. Clustering | {{ :dm:dm1_lab03_clustering.zip | Clustering}}| Guidotti|
 +|12.| 30.10.2023 | 11-13 |C1| Ex. Clustering | {{ :dm:ex1_dm1_clustering_2023_24.pdf | ExClustering}}| Guidotti|
 +|   | 01.11.2023 | 11-13 |  | No Lecture |  |  |
 +|13.| 06.11.2023 | 11-13 |C1| Intro Classification, kNN[[https://unipiit.sharepoint.com/sites/a__td_61280/Shared%20Documents/General/Recordings/Lecture%2006_11_2023-20231106_110052-Registrazione%20della%20riunione.mp4?web=1|(video)]] | {{ :dm:08_dm1_classification_intro_2023_24.pdf | Intro_Classification}}, {{ :dm:09_dm1_knn_2023_24.pdf | kNN}}| Guidotti|
 +|14.| 08.11.2023 | 11-13 |C1| Naive Bayes, Exercises | {{ :dm:10_dm1_naive_bayes_2023_24.pdf | Naive Bayes}} | Guidotti|
 +|15.| 13.11.2023 | 11-13 |C1| Model Evaluation | {{ :dm:11_dm1_classification_eval_2023_24.pdf | Model Evaluation}} | Guidotti|
 +|16.| 15.11.2023 | 11-13 |C1| Model Evaluation Exercises & Lab | {{ :dm:dm1_lab04_classification_regression.zip | Classification}} | Guidotti|
 +|   | 20.11.2023 | 11-13 |  | No Lecture |  |  |
 +|17.| 22.11.2023 | 11-13 |C1| Decision Tree Classifier | {{ :dm:12_dm1_decision_trees_2023_24.pdf | Decision Tree}} | Pedreschi|
 +|18.| 27.11.2023 | 11-13 |C1| Decision Tree Classifier | {{ :dm:12_dm1_decision_trees_2023_24.pdf | Decision Tree}} | Pedreschi|
 +|19.| 29.11.2023 | 11-13 |C1| Exercises and Lab. Decision Tree Classifier | {{ :dm:dm1_lab04_classification.zip | Decision Tree}} | Guidotti|
 +|20.| 04.12.2023 | 11-13 |C1| Decision Tree Classifier, Exercises and Lab | {{ :dm:12_dm1_decision_trees_2023_24.pdf | Decision Tree}} | Pedreschi|
 +|21.| 06.12.2023 | 11-13 |C1| Intro Regression & Lab. Regression | {{ :dm:12_dm1_linear_regression_2023_24.pdf | Regression}}, {{ :dm:dm1_lab05_regression.zip | Regression}} | Guidotti|
 +|22.| 11.12.2023 | 11-13 |C1| Into Pattern Mining and Apriori | {{ :dm:13_dm1_pattern_mining_2023_24.pdf | Pattern Mining}} | Pedreschi|
 +|23.| 13.12.2023 | 16-18 |C1| Apriori & Lab. Pattern Mining | {{ :dm:13_dm1_pattern_mining_2023_24.pdf | Pattern Mining}}, {{ :dm:dm1_lab06_pattern_mining.zip | Pattern Mining}}  | Pedreschi|
 +|24.| 18.12.2023 | 11-13 |C| FP-Growth and Exercises | {{ :dm:13_dm1_pattern_mining_2023_24.pdf | Pattern Mining}} | Guidotti|
 +===== Second Semester (DM2 - Data Mining: Advanced Topics and Applications) =====
  
-  * Some very old exercises (part of them with solutions) are available here, most of them in Italian, not all of them on topics covered in this year program: +^ ^ Day ^ Time ^ Room ^ Topic ^ Material ^ Lecturer ^ 
-    {{tdm:verifica2006.pdf|Verifica 2006}}, {{tdm:verifica2005.pdf|Verifica 2005 (con soluzioni)}}, {{tdm:verifica2004.pdf|Verifica 2004}} +|01.| 19.02.2024 | 14-16 |C| Overview, Rule-based Models | {{ :dm:14_dm2_intro_2023_24.pdf | Introduction}}, {{ :dm:dm2_project_guidelines_23_24.pdf | Guidelines}}, {{ :dm:15_dm2_rule_based_classifier_2023_24.pdf | Rule-based Models }} | Guidotti| 
-    {{dm:verifica.05.06.2007.pdf|Verifica 5 giugno 2007}}, {{dm:verifica.26.06.2007.pdf|Verifica 26 giugno 2007}}, {{dm:verifica.24.07.2007_corretto.pdf|Verifica 24 luglio 2007}} (e {{dm:verifica.24.07.2007_soluzioni.pdf|Soluzioni}}+|   | 21.02.2024 |  | | No Lecture |  |  | 
-    * {{:dm:verifica.2008.04.03.pdf|Verifica 3 aprile 2008}} (e {{:dm:soluzioni.2008.04.03.pdf|Soluzioni}}), {{:dm:dm-tdm.appello_2008_07_18_parte1.pdf|Verifica 18 luglio 2008 parte 1}}, {{:dm:dm-tdm.appello_2008_07_18_parte2.pdf|Verifica 18 luglio 2008 - parte 2}} +|   | 26.02.2024 |  | | No Lecture |  |  | 
-    * {{:dm:appello.2010.06.01_soluzioni.pdfExam with solution 2010-06-01}} {{:dm:appello.2010.06.22_soluzioni.pdf|Exam with solution 2010-06-22}} {{:dm:appello.2010.09.09_soluzioni.pdf|Exam with solution 2010-09-09}}{{:dm:appello.2010.07.13_soluzioni.pdfExam with solution 2010-07-13}}+|02.| 19.02.2024 | 11-13 |C| Sequential Pattern Mining | {{ :dm:16_dm2_sequential_pattern_mining_2023_24.pdf | Sequential Pattern Mining}}, {{ :dm:GSP.zip | GSP}} | Guidotti| 
 +|03.| 04.03.2024 | 9-11 |C| Sequential Pattern Mining | {{ :dm:16_dm2_sequential_pattern_mining_2023_24.pdf | Sequential Pattern Mining}}, {{ :dm:GSP.zip | GSP}} | Guidotti| 
 +|04.| 06.03.2024 | 11-13 |C| Transactional Clustering | {{ :dm:17_dm2_transactional_clustering_2023_24.pdf | Transactional Clustering}} | Guidotti| 
 +|05.| 11.03.2024 | 9-11 |C| Time Series Similarity | {{ :dm:18_dm2_time_series_similarity_2023_24.pdf | Time Series Similarity}}{{ :dm:dm2_lab00_spotify.zip TS_Load}}{{ :dm:dm2_lab01_dist_transf.zip | TS_Similarity}} | Guidotti| 
 +|06.| 13.03.2024 | 11-13 |C| Time Series Approximation | {{ :dm:19_dm2_time_series_clustering_approximation_2023_24.pdf | Time Series Clustering}}, {{ :dm:dm2_lab02_approx_clust.zip | TS_Approx_Clustering}} | Guidotti| 
 +|07.| 18.03.2024 | 9-11 |C| Time Series Clustering & Motifs| {{ :dm:20_dm2_time_series_matrix_profile_2023_24.pdf | Time Series Motifs}}{{ :dm:dm2_lab03_motifs.zip | TS_Motifs}} | Guidotti| 
 +|08.| 20.03.2024 11-13 |C| Time Series Classification | {{ :dm:21_dm2_time_series_classification_2023_24.pdf | Time Series Classification}}{{ :dm:dm2_lab04_classification.zip | TS_Classification}} | Guidotti| 
 +|09.| 25.03.2024 9-11 |C| Imbalanced Learning | {{ :dm:22_dm2_imbalanced_learning_2023_24.pdf | Imbalanced Learning}}{{ :dm:dm2_lab05_imbalance.zip |ImbLearn}} | Guidotti|  
 +|10.| 27.03.2024 11-13 |C| Dimensionality Reduction | {{ :dm:23_dm2_dimred_2023_24.pdf | Dimensionality Reduction}}, {{ :dm:dm2_lab06_dimred.zip |DimRed}} | Guidotti| 
 +====== Exams ======
  
-===== Data mining software=====+** How and Where: ** 
 +The exam will take place in oral mode only at the teacher's office or classroom previously designated. 
 +The exam will be held online on the 420AA Data Mining course channel only at the request of the 
 +student in accordance with current legislation.
  
-  [[http://www.knime.org | KNIME ]] The Konstanz Information Miner. [[http://www.knime.org/download-desktop| Download page ]] +** When** 
-  * Python - Anaconda (3.7 version!!!): Anaconda is the leading open data science platform powered by Python. [[https://www.anaconda.com/distribution/| Download page]] (the following libraries are already included) +The dates relating to the start of the three exams are/will be published on the online platform 
-  * Scikit-learn: python library with tools for data mining and data analysis [[http://scikit-learn.org/stable/ | Documentation page]] +https://esami.unipi.it/. Within each session, we will identify dates and slots in order to distribute the 
-  * Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language[[http://pandas.pydata.org| Documentation page]] +various orals. The dates and slots to take the exam will be published on the course page by the end of 
-  * [[http://www.cs.waikato.ac.nz/ml/weka/ | WEKA ]] Data Mining Software in JAVAUniversity of WaikatoNew Zealand [[http://www.cs.waikato.ac.nz/ml/weka/ | Download page ]] +MayEach student must also register on https://esami.unipi.it/. The examination can only be carried out after the delivery of the projectThe project must be delivered one week before when you want to take the examGroup oral discussions will be preferred in respect of the project groups in order to parallelize any discussion on the projectIt is not mandatory to take the oral exam together with the other members of the group.  
-  +In the event that the oral exam is not passedit will not be possible to take it for 20 daysIf the project is not considered sufficient, it must be carried out again on a new dataset or a very updated version of the current one.
-====== Class calendar - Calendario delle lezioni (2019/2020) ======+
  
-===== First part of course, first semester (DM1 Data miningfoundations & DM Data Mining) =====+** What: **  
 +The oral test will evaluate the practical understanding of the algorithms. The exam will evaluate three aspects. 
 +  - Understanding of the theoretical aspects of the topics addressed during the course. The student may be required to write on formulas or pseudocode. During the explanationsthe student can use pen and paper. 
 +  Understanding of the algorithms illustrated during the course and their practical implementation. You will be asked to perform one or more simple exercises. The text will be shown on the teacher's screen and / or copied to Miro. The student will have to use pen and paper (if online by Miro https://miro.com/ to show how the exercise is solved. 
 +  Discussion of the project with questions from the teacher regarding unclear aspects, 
 +questionable steps or choices.
  
-^ ^ Day ^ Topic ^ Learning material ^ Instructor ^ +** Final Mark** for 12-credit examthe final mark will be obtained as the 
-|1.|  16.09  14:00-16:00 | Overview. Introduction to KDD        | {{ :dm:1.dm-overview-corso.pdf | Course Overview}} {{ :dm:2.introduction-short.pdf | Introduction DM}} | Pedreschi +average mark of DM1 and DM2.
-|  |  18.09  16:00-18:00 | Lecture canceled  (Event at Scuola S. Anna Information in News Section of this page)                  |  | Pedreschi +
-|2.|  20.09  11:00-13:00 | Introduction to KDD: technologiesApplication and Data  |  | Pedreschi  | +
-|3.|  23.09  14:00-16:00 | Data Understanding (from Bertold book!)  | {{ :dm:3.dataunderstanding-2019.pdf |Slides DU}} {{ :dm:2-statistica_descrittiva.pdf |Slides on Descriptive Statistics}} useful for clarifying some statistical notions of statistics. Unfortunately this material is only in Italian. | Monreale +
-|4.|  25.09  16:00-18:00 | Data Preparation  |{{ :dm:3.dm_ml_data_preparation.pdf | Slides DP}} | Monreale +
-|  |  27.09  11:00-13:00 | Climate Strike  |  |  | +
-|5.|  30.09  14:00-16:00 | Introduction to Python.       | {{ :dm:python_basics.ipynb.zip |Python Introduction}} | Monreale +
-|6.|  02.10  16:00-18:00 | Clustering: Introduction + Centroid-based clustering, K-means | {{ :dm:4.basic_cluster_analysis-intro-kmeans.pdf |Clustering: Intro and K-means}} | Pedreschi +
-|7.|  04.10  11:00-13:00 | Lab: Data Understanding & Preparation in Knime  | Knime: {{ :dm:01_data_understanding.zip |}}  Data: {{ :dm:titanic.csv.zip | Titanic File}}  | Monreale | +
-|8.|  07.10  14:00-16:00 | Lab: DU Python + Project presentation  | Python: {{ :dm:titanic_data_understanding2.ipynb.zip |}}| Monreale +
-|9.|  09.10  16:00-18:00 | Clustering: K-means + Hierarchical   |{{ :dm:5.basic_cluster_analysis-hierarchical.pdf |}} | Monreale +
-|10.| 11.10  11:00-13:00 |  Suppressed for Internet festival |  | Pedreschi | +
-|11.| 14.10  14:00-16:00 | Clustering: DBSCAN & VALIDITY    | {{ :dm:6.basic_cluster_analysis-dbscan-validity.pdf |}}| Pedreschi +
-|12.| 16.10  16:00-18:00 |  Exercises on Clustering|  | Monreale +
-|13.| 18.10  11:00-13:00 |  Lab: Clustering    | Monreale | +
-|14.| 21.10  14:00-16:00 | Classification    | Pedreschi | +
-|15.| 23.10  16:00-18:00 | Classification    | Pedreschi | +
-|16.| 25.10  11:00-13:00 | Classification    | Pedreschi/ Milli | +
-|17.| 28.10  14:00-16:00 | LAB: Classificazione    | Monreale | +
-|18.| 30.10  16:00-18:00 | Exercises Classification + Discussion Clustering   | | Monreale| +
-|19.| 04.11  11:00-13:00 | Pattern Mining    | Pedreschi | +
-|20.| 06.11  16:00-18:00 | Pattern Mining    | Pedreschi | +
-|   | 08-14.11           | Project work    |  | | +
-|21.| 15.11  11:00-13:00 | Exercises and Lab on Pattern Mining    |  | Monreale | +
-|   | 18.11  14:00-16:00 | Suppressed  |  | | +
-|   | 20.11  16:00-18:00 | Suppressed    |  | | +
-|22.| 22.11  11:00-13:00 | Exercises Classification|  | Monreale | +
-|    | **Next Classes are dedicated to DM of 9 CFU **    | |+
  
- ===== Second part of course, second semester (DMA - Data miningadvanced topics and case studies) =====+===== Exam Booking Periods ===== 
 +  * Exam portal link[[https://esami.unipi.it/|here]] 
 +  * 1st Appello: from 09/01/2024 to 31/12/2024 
 +  * 2nd Appello: from 01/02/2024 to 17/02/2024 
 +  * 3rd Appello:  
 +  * 4th Appello:  
 +  * 5th Appello:  
 +  * 6th Appello:  
 +  
 +===== Exam Booking Agenda ===== 
 +  * 1st Appello - DM1: https://agende.unipi.it/yra-ief-dmo, DM2: https://agende.unipi.it/rnm-urj-wsu 
 +  * 2nd Appello - DM1: https://agende.unipi.it/yra-ief-dmo, DM2: https://agende.unipi.it/rnm-urj-wsu 
 +  * 3rd Appello:  
 +  * 4th Appello:  
 +  * 5th Appello:  
 +  * 6th Appello: 
  
-^ ^ Day ^ Room (Aula) ^ Topic ^ Learning material ^ Instructor (default: Nanni)^ +**Do not forget to make the evaluation of the course!!!** 
-|1.| 21.02.2019 14:00-16:00 | A1 | Introduction + Sequential patters/1 | {{ :dm:dm2_2019_intro.pdf |Introduction}}, {{ :dm:sequential_patterns_2019.pdf |Sequential patterns}} |  | +===== Exam DM1 ======
-|2.| 22.02.2019 16:00-18:00 | C1 | Sequential patterns/   |  |  | +
-|3.| 01.03.2019 16:00-18:00 | C1 | Sequential patterns/3 | {{ :dm:exercises_2019.03.01_fixed.zip |Sample exercises (fixed)}} |  | +
-|4.| 07.03.2019 14:00-16:00 | A1 | Sequential patterns/4 | Sequential pattern tools: Link to [[http://www.philippe-fournier-viger.com/spmf/|SPMF]] + {{ :dm:spmf_datasets.zip | Sample datasets}}, {{ :dm:gsp_py_2019.zip |Python2 GSP educational implementation}}([[http://sequenceanalysis.github.io/|source]]), [[https://github.com/chuanconggao/PrefixSpan-py|PrefixSpan-py]] (requires Python3) |  | +
-|5.| 08.03.2019 16:00-18:00 | C1 | Time series/   | {{ :dm:time_series_2019.pdf |Time series}} |  | +
-|6.| 14.03.2019 14:00-16:00 | A1 | Time series/2 | [[https://cs.gmu.edu/~jessica/BookChapterTSMining.pdf|Overview on DM for time series]], [[https://pdfs.semanticscholar.org/18f3/55d7ef4aa9f82bf5c00f84e46714efa5fd77.pdf|DTW paper by Sakoe and Chiba, 1978]] |  | +
-|7.| 15.03.2019 16:00-18:00 | C1 | Time series/3 | |  | +
-|8.| 21.03.2019 14:00-16:00 | A1 | Time series/4 | {{ :dm:timeseries_1_preprocess_2019.zip |Preprocessing in Python}} {{ :dm:timeseries_2_dtw_2019.zip |DTW in Python}} |  | +
-|9.| 22.03.2019 16:00-18:00 | C1 | Time series/5 | |  | +
-|10.| 28.03.2019 14:00-16:00 | A1 | Exercises for mid-term exam | {{ :dm:0.dm2_mid-term_exam.2018.04.10.pdf |Exercises from past exams}} |  | +
-|11.| 29.03.2019 16:00-18:00 | C1 | Exercises for mid-term exam | {{ :dm:exercises_2019.03.29.zip |Exercises from past exams (with some solutions)}} |  | +
-|   | 04.04.2019 16:00-18:00 | A1 + E | **mid-term exam** | | |  +
-|11.| 11.04.2019 14:00-16:00 | A1 | Classification: alternative methods/ | {{ :dm:lezioneadvancedclassificationmethods1-knn_nb.pdf |kNN and Bayes classifier}}  |   | +
-|12.| 12.04.2019 16:00-18:00 | C1 | Classification: alternative methods/ | {{ :dm:classification_nnandsvm.pdf |NN and SVM}}, {{ :dm:exercises_classification_2.pdf |Exercises}}  |   | +
-|  | <del>02.05.2019 14:00-16:00</del> | <del>A1</del> | Cancelled  |  |   | +
-|13.| 03.05.2019 16:00-18:00 | C1 | Classification: alternative methods/ |     | +
-|14.| 09.05.2019 14:00-16:00 | A1 | Classification: alternative methods/ | {{ :dm:neural_networks_svm_validation.pdf |Ex. on NNs and SVM}}, {{ :dm:dm2_exam.2018.07.03.pdf |Ex. on KNN and Naive Bayes}} |   | +
-|15.| 10.05.2019 16:00-18:00 | C1 | Classification: Model Evaluation  | {{ :dm:model_performance_analytics.pdf |Model performances}}  |   | +
-|16.| 16.05.2019 14:00-16:00 | A1 | Classification: Model Evaluation  | {{ :dm:2019.05.16_1_unbalanced_data_2019.pdf |Unbalanced data}}, {{ :dm:2019.05.16_3_classification_weights.pdf |Classification weights}} |   | +
-|17.| 17.05.2019 16:00-18:00 | C1 | Classification: alternative methods/ | {{ :dm:ensemblemethod_wisdomofthecrowd.pdf |Ensembles}}, {{ :dm:2019.05.16_4_homeworks.pdf |Homeworks!}} |   | +
-|18.| 23.05.2019 14:00-16:00 | A1 | Exercises + Outlier detection/ | {{ :dm:14.lift_chart.pdf |Ex. on Lift chart}}, {{ :dm:2018.05.18.validation_ensembles.pdf |Ex. on Ensembles}}, {{ :dm:17.outlier_detection_tutorial.pdf |Outlier detection}} |   | +
-|19.| 24.05.2019 16:00-18:00 | C1 | Outlier detection/ | {{ :dm:2019.05.24_ex_outliers.pdf |Ex. on outliers}}, {{ :dm:2019.05.24_optional_ex_dm2_exam.2017.06.13_solutions.pdf |Ex. from past exams}} |   | +
-|<del>20.</del>| <del>31.05.2019 16:00-18:00</del> | <del>C1</del> | Due to a strike, the lesson will not take place. For you convenience, here is some material you can use: {{ :dm:titanic_classification_dm2.ipynb.zip|Examples of classification and validation in Python}}, {{ :dm:outliers_2019.zip |Examples of outlier detection in Python}}, {{ :dm:16.dm2_2015_crispdm_mains.pdf |CRISP-DM guidelines}}. Feel free to contact me if you need clarifications. **Remark:** the CRISP-DM model will be not part of the exam program.|  |   | +
-|   | 06.06.2019 16:00-18:00 | E (+A1) | **mid-term exam** | {{ :dm:dm2_mid-term_exam.2018.01.06.pdf |2nd mid-term of last year}} and its {{ :dm:dm2_mid-term_exam.2018.01.06_solutions.pdf |solutions}} (careful: they were not double-checked). | |  +
-====== Exams ======+
  
-===== Exam DM part I (DMF) ======+The exam is composed of two parts:
  
-The exam is composed of three parts:+  * An **oral exam**, that includes(1) discussing the project report; (2) discussing topics presented during the classes, including the theory and practical exercises. 
  
-  * A **written exam**, with exercises and questions about methods and algorithms presented during the classes. It can be substitute with the mid-term test of December.+  * A **project**, that consists in exercises requiring the use of data mining tools for analysis of data. Exercises include: data understanding, clustering analysis, pattern mining, and classification (guidelines will be provided for more details). The project has to be performed by min 2, max 3 people. It has to be performed by using Python or any other data mining software. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The paper must be emailed to [[andrea.fedele@phd.unipi.it]] and [[riccardo.guidotti@unipi.it]]. Please, use “[DM1 2023-2024] Project” in the subject. 
 +  
 +  * **Dataset** 
 +    - Assigned: 25/09/2023 
 +    - MidTerm Submission: 15/11/2023 (+0.5) (half project required, i.e., Data Understanding & Preparation and Clustering) 
 +    - Final Submission: 31/12/2023 (+0.5) one week before the oral exam (complete project required). 
 +    - Dataset: {{ :dm:std.zip | STD}}
  
-  An **oral exam (optional) **, that includes(1) discussing the project report with a group presentation; (2) discussing topics presented during the classes, including the theory of the parts already covered by the written examIt is optional for students passing the written part by ONLY the mid-term test.+** DM1 Project Guidelines ** 
 +See {{ :dm:dm1_project_guidelines_23_24.pdf | Project Guidelines}}.
  
-  * A **project** consists in exercises that require the use of data mining tools for analysis of data. Exercises include: data understanding, clustering analysis, frequent pattern mining, and classification. The project has to be performed by min 3, max 4 people. It has to be performed by using Knime, Python or a combination of them. The results of the different tasks must reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The paper must emailed to [[datamining.unipi@gmail.com]]. Please, use “[DM 2018-2019] Project 2” in the subject.  
-Tasks of the project: 
-      - ** Data Understanding: ** Explore the dataset with the analytical tools studied and write a concise “data understanding” report describing data semantics, assessing data quality, the distribution of the variables and the pairwise correlations. (see Guidelines for details) 
-      - ** Clustering analysis: ** Explore the dataset using various clustering techniques. Carefully describe your's decisions for each algorithm and which are the advantages provided by the different approaches. (see Guidelines for details) 
-      - ** Classification: ** Explore the dataset using classification trees and random forest. Use them to predict the target variable. (see Guidelines for details) 
-      -  ** Association Rules: ** Explore the dataset using frequent pattern mining and association rules extraction. Then use them to predict a variable either for replacing missing values or to predict target variable. (see Guidelines for details) 
  
  
-  * Project 1 
-      - Dataset: **Carvana Data** 
-      - Assigned: 07/10/2019 
-      - Deadline: 05/01/2020  
-      - Link: https://www.kaggle.com/t/712fc5e264e748afb0e0616f56f3c102 
  
 + 
 +===== Exam DM2 ======
 +
 +The exam is composed of two parts:
  
 +  * An **oral exam**, that includes: (1) discussing the project report; (2) discussing topics presented during the classes, including the theory and practical exercises. 
  
- **Guidelines for the project are [[dm:start:guidelines|here]].**+  **project**, that consists in exercises requiring the use of data mining tools for analysis of data. Exercises include: imbalanced learning, dimensionality reduction, outlier detection, advanced classification/regression methods, time series analysis/clustering/classification (guidelines will be provided for more details). The project has to be performed by min 1, max 3 people. It has to be performed by using Python or any other data mining software. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 30 pages of text including figures. The paper must be emailed to [[andrea.fedele@phd.unipi.it]] and [[riccardo.guidotti@unipi.it]]. Please, use “[DM2 2023-2024] Project” in the subject.
    
-===== Exam DM part II (DMA) ======+  * **Dataset** 
 +    - Assigned: 19/02/2024 
 +    - MidTerm Submission: 30/04/2024 (Modules 1 and 2 (for TS classification non DL-based models) 
 +    - Final Submission: one week before the oral exam (complete project required, also with DL-based models for TS classification). 
 +    - Dataset: [[https://unipiit-my.sharepoint.com/:u:/g/personal/a_fedele7_studenti_unipi_it/EUSyNv8ahD9FrBZ6fiF3gvABcYVLpbo1biIyOGy8AmcO5g?e=ziQtEc|STD]]
  
-The exam is composed of three parts:+** DM2 Project Guidelines ** 
 +See {{ :dm:dm2_project_guidelines_23_24.pdf | Project Guidelines}}.
  
-  * A **written exam**, with exercises and questions about methods and algorithms presented during the classes. It can be substitute with the first and second mid-term tests of April and June. 
  
-  *<del> A small **online test** for the data ethics part. The test can be taken at the following link: [[https://thinfi.com/2etq|Link to "First Aid for Data Scientist" web site]] (pwd: datamining_2018). Register, and enroll to the "First Aid for Data Scientist" course. Take the quizzes of the 3 units. Then, download your certificate and send it to [[mirco.nanni@isti.cnr.it]] before the oral exam.</del> 
  
-  * An **oral exam**, that includes: (1) discussing the project report with a group presentation; (2) discussing topics presented during the classes, including the theory of the parts already covered by the written exam. 
  
-  * A **project**, that consists in exercises that require the use of data mining tools for analysis of data. Exercises include: sequential patterns, time series, classification (alternative methods and validation), outlier detection. The project has to be performed by max 3 people. It has to be performed by using Knime, Python, other software or a combination of them. The results of the different tasks must reported in a unique paper. The total length of this paper must be max 20 pages of text including figures.  The project must be delivered at least 2 days before the oral exam. 
-     * **Dataset**: the data is a time series dataset on air quality, which can be downloaded here: [[https://data.world/uci/air-quality|Dataset]]. 
-     * **Task 1: Time series**: Consider only attribute "PT08.S1(CO)" and split the corresponding time series into daily series, deleting those with too many missing values (value = -200) and fixing the others in some way. Make also sure that all time series have 24 values. Compute clustering (with an algorithm of your choice) based on DTW and Euclidean distances and compare the results. 
-     * **Task 2: Sequential patterns**: discover contiguous sequential patterns of at least length 4. Before that, time series should be discretized in some way. 
-     * **Task 3:Classification methods**: define a target variable "WE" for the time series data set to "true" for weekend days, and "false" for the others. Test the K-NN classification method using DTW as distance measure, and at least another classification method using the 24 values as separate variables. 
-     * **Task 4: Outlier detection**: from the original dataset (i.e. the raw records with all attributes, not the time series built only on the "PT08.S1(CO)" attribute), identify the top 1% outliers. Adopt at least two different methods belonging to different families (i.e. model-based, distance-based, density-based, angle-based, ...) to identify the 1% of input records with the highest likelihood of being outliers, and compare the results. Before doing the analysis, the records containing missing values should be deleted to avoid trivial results. 
  
-====== Appelli di esame ======+===== Past Exams ===== 
 +  * Past exams texts can be found in old pages of the course. Please do not consider these exercises as a unique way of testing your knowledge. Exercises can be changed and updated every year and will be published together with the slides of the lectures.
  
-===== Mid-term exams =====+===== Reading About the "Data Scientist" Job =====
  
-^ ^ Date ^ Hour ^ Place ^ Notes ^ Marks ^ +** ... a new kind of professional has emergedthe data scientistwho combines the skills of software programmerstatistician and storyteller/artist to extract the nuggets of gold hidden under mountains of dataHal VarianGoogle’s chief economistpredicts that the job of statistician will become the "sexiest" aroundDatahe explainsare widely available; what is scarce is the ability to extract wisdom from them**
-| DM1: First Mid-term 2018 | 30.10.2018 | 11-13 | Room C1L1N1 | Pleaseuse the system for registration: https://esami.unipi.it/| {{ :dm:20181030-midterm-test.pdf | results }} | +
-| DM1: Second Mid-term 2018 | 18.12.2018| 11-13 | Room C1L1N1 | Please, use the system for registration: https://esami.unipi.it/| | +
-| DM2: First Mid-term 2019 | 04.04.2019 | 16-18 | Room A1E | Pleaseuse the system for registration: https://esami.unipi.it/ \\ {{ :dm:dm2_mid-term_exam.2019.04.04_solutions.pdf |Text + Solutions}}| {{ :dm:results.2019.04.04.pdf |Results}} | +
-| DM2: Second Mid-term 2019 | 06.06.2019 | 16-18 | Room E \\ (+ A1 if needed) | Please, use the system for registration: https://esami.unipi.it/ \\ {{ :dm:dm2_mid-term_exam.2019.06.06.pdf |Text}} | {{ :dm:results.2019.06.06.pdf |Results}} | +
-===== Appelli regolari / Exam sessions ===== +
-^ Session ^ Date            ^ Time        ^ Room   ^ Notes ^ Marks ^ +
-|1.|16.01.2019| 14:00 - 18:00| Room E | | | +
-|2.|06.02.2019| 14:00 - 18:00| Room E | | | +
-|3.|19.06.2019| 09:00 - 13:00| Room A1 | Oral Exam on DM1 within 15 July. If you cannot do within that date you can do the oral exam on September.| {{ :dm:results.2019.06.19.pdf |Results}} | +
-|4.|10.07.2019| 09:00 - 13:00| Room A1 |Oral Exam on DM1 within 15 July. If you cannot do within that date you can do the oral exam on September. | {{ :dm:results.2019.07.10.pdf |Results}} | +
-===== Appelli straordinari A.A. 2017/18 / Extra sessions A.A. 20167/18=====+
  
- Date            ^ Time        ^ Room   ^ Notes ^ Results ^+//Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.// 
 + 
 +  * Data, data everywhere. The Economist, Feb. 2010 {{:dm:economist--010.pdf|download}} 
 +  * Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011 [[http://tech.fortune.cnn.com/2011/09/06/data-scientist-the-hot-new-gig-in-tech/|link]] 
 +  * Welcome to the yotta world. The Economist, Sept. 2011 {{:dm:economist-2012-dm.pdf|download}} 
 +  * Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, Sept 2012 [[http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1|link]] 
 +  * Il futuro è già scritto in Big Data. Il SOle 24 Ore, Sept 2012 [[http://www.ilsole24ore.com/art/tecnologie/2012-09-21/futuro-scritto-data-155044.shtml?uuid=AbOQCOhG|link]] 
 +  * Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics {{:dm:crossroadsxrds2012fall-dl.pdf|download}} 
 +  * Peter Sondergaard, Gartner, Says Big Data Creates Big Jobs. Oct 22, 2012: [[https://www.youtube.com/watch?v=mXLy3nkXQVM|YouTube video]] 
 +  * Towards Effective Decision-Making Through Data Visualization: Six World-Class Enterprises Show The Way. White paper at FusionCharts.com. [[http://www.fusioncharts.com/whitepapers/downloads/Towards-Effective-Decision-Making-Through-Data-Visualization-Six-World-Class-Enterprises-Show-The-Way.pdf|download]]
  
 ====== Previous years ===== ====== Previous years =====
-   * [[dm.2018-19]] +  * [[dm.2022-23ds]] 
-   *  [[dm.2017-18]]+  * [[dm.2021-22ds]] 
 +  * [[dm.2020-21]] 
 +  * [[dm.2019-20]] 
 +  * [[dm.2018-19]] 
 +  [[dm.2017-18]]
   * [[dm.2016-17]]   * [[dm.2016-17]]
   * [[dm.2015-16]]   * [[dm.2015-16]]
Linea 309: Linea 338:
   * [[dm.2012-13]]   * [[dm.2012-13]]
   * [[dm.2011-12]]   * [[dm.2011-12]]
-  * [[dm.2010-11]] +
-  * [[dm.2009-10]] +
-  * [[dm.2008-09]] +
-  * [[dm.2007-08]] +
-  * [[dm.2006-07]] +
-  * [[PhDWorkshop2011]] +
-  * [[SNA.Ingegneria2011]] +
-  * [[SNA.IMT.2011]] +
-  * [[MAINS.SANTANNA.2011-12]] +
-  * [[MAINS.SANTANNA.DM4CRM.2012]] +
-  * [[MAINS.SANTANNA.DM4CRM.2016]] +
-  * [[MAINS.SANTANNA.DM4CRM.2017 | Data Mining for Customer Relationship Management 2017]] +
-  * [[MAINS.SANTANNA.DM4CRM.2018]] +
-  * [[MAINS.SANTANNA.DM4CRM.2019]] +
-  * [[SDM2018 | Instructions for camera ready and copyright transfer]] +
-  * [[DM-SAM | Storie dell'Altro Mondo]]+
dm/start.1571160117.txt.gz · Ultima modifica: 15/10/2019 alle 17:21 (5 anni fa) da Anna Monreale