Strumenti Utente

Strumenti Sito


magistraleinformatica:dmi:start

Questa è una vecchia versione del documento!


Data Mining (309AA) - 9 CFU A.Y. 2023/2024

Instructor:

Teaching Assistant:

News

  • [05.09.2023] The lectures will start on 27th September 2023

Learning Goals

  • Fundamental concepts of data knowledge and discovery.
  • Data understanding
  • Data preparation
  • Clustering
  • Classification
  • Pattern Mining and Association Rules
  • Outlier Detection
  • Time Series Analysis
  • Sequential Pattern Mining
  • Ethical Issues

Hours and Rooms

Classes

Day of Week Hour Room
Wednesday 09:00 - 11:00 Room C1
Thursday 09:00 - 11:00 Room C1
Friday 09:00 - 11:00 Room C

Office hours - Ricevimento: Anna Monreale: Tuesday: 11:00-13:00 by online using Teams or at the Department of Computer Science, room 374/E (Please ask an appointment by email). Lorenzo Mannocci: TDB

A Teams Channel will be used ONLY to post news, Q&A, and other stuff related to the course. The lectures will be only in presence and will NOT be live-streamed, but recordings of the lecture or of the previous years will be made available here for non-attending students.

Learning Material -- Materiale didattico

Textbook -- Libro di Testo

Slides

Software

  • Python - Anaconda (at least 3.7 version!!!): Anaconda is the leading open data science platform powered by Python. Download page (the following libraries are already included)
  • Scikit-learn: python library with tools for data mining and data analysis Documentation page
  • Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Documentation page

Class Calendar (2023/2024)

First Semester

Day Topic Learning material References Video Lectures
1. 27.09 Overview. Introduction to KDD 1-overview-2023.pdf 1-intro-dm.pdfChap. 1 Kumar Book Introduction DM - Video1 Introduction DM - Video2
2. 28.09 Data Understanding 2-data_understanding.pdfChap.2 Kumar Book and additioanl resource of Kumar Book:Exploring Data If you have the first ed. of KUMAR this is the Chap 3
3. 29.09 Data Understanding & Data Preparation 3-data_preparation.pdf Chap.2 Kumar Book and additioanl resource of Kumar Book:Exploring Data If you have the first ed. of KUMAR this is the Chap 3
4. 04.10 Data Preparation & Data Similarities 4-data_similarity.pdf Data Similarity is in Chap. 2 DP+Similarities The last minutes of the lecture were not recorded because of the connection
5. 05.10 Python-LAB: Data Understanding DU notebooks and data Python Lab on DU
06.10 Suppressed
6. 11.10 Introduction to Clustering. Centroid-based Clustering: K-means algorithm. 5-basic_cluster_analysis-intro.pdf 6.1-basic_cluster_analysis-kmeans.pdf Chap. 7 Kumar Book Video 1: Introduction to Clustering + K-means - Part 1 - Video of previous years
7. 12.10 Centroid-based Clustering: K-means variants. 6.2-basic_cluster_analysis-kmeans-variants.pdf Chap. 7 Kumar Book clusteringmixturemodels.pdf xmeans.pdf Video 2: Introduction to Clustering + K-means - Part 2] Video 1: Center-based clustering - Bisecting K-means, Xmeans, EM ;Videos of previous years
13.10 Suspension of teaching
8. 18.10 Hierarchical and density based CLustering 7.basic_cluster_analysis-hierarchical.pdf 8.basic_cluster_analysis-dbscan-validity.pdf Chap. 7 Kumar Book
9. 19.10 Clustering Validity & Python Lab: Clusterig K-means 8.basic_cluster_analysis-dbscan-validity.pdf Chap. 7 Kumar Book
10. 20.10 Python Lab: Clusterig Density based and hierarchical + Introduction to Classification Notebook on Clustering 9.chap3_basic_classification-2023.pdf Chap.3 Kumar Book
11. 25.10 Decision Trees & Classifier Evaluation Same slides as previous lecture Chap.3 Kumar Book
12. 26.10 Classifier Evaluation Same slides as previous lecture Chap.3 Kumar Book
13. 27.10 Rule-based Classifiers 10-rule-based-classifiers.pdf Chap.4 Kumar Book
14. 02.11 Rule-based Classifiers + Instance based Classifiers 10-knn.pdf Chap.4 Kumar Book
15. 03.11 Naive Bayesian Classifier. SVM. Ensemble Classifiers 11_2023-naive_bayes.pdf 14_svm_2023.pdf 13_ensemble_2023.pdf Chap.4 Kumar Book
16. 08.11 Python Lab: Classification classification.zip
17. 09.11 NN Classifiers 15_neural_networks_2023.pdf Chap.4 Kumar Book
18. 10.11 Python Lab: NN & Imbalanced Classification imbalanced_classification.zip
19. 15.11 Association Rule Mining: Apriori 17_association_analysis.pdf Chap.5 Kumar Book
20. 16.11 Association Rule Mining: Evalaution and FP-Growth 17_2023-fp-growth.pdf Chap.5 Kumar Book
21. 17.11 Sequential Pattern Mining 18_sequential_patterns_2023.pdf Chap.6 Kumar Book
22. 22.11 Sequential Pattern Mining: timing constraint. Time Series Analysis: Similarities, Distances and Transformations 22_time_series_similarity_2023.pdf Overview on Time Series
23. 23.11 Time Series Analysis: Shapelet & Motif
24. 24.11 Time Series Analysis: Shapelet & Motif matrixprofile.pdf

Exams

Project

A project consists in data analyses based on the use of data mining tools. The project has to be performed by a team of 3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 25 pages of text including figures. The students must deliver both: paper (single column) and well commented Python Notebooks.

  1. Dataset: Dataset Files
  2. Deadline: the fist part has to be delivered within November 19th, 2023 November 26th, 2023. Send an email to: anna.monreale@unipi.it, lorenzo.mannocci@phd.unipi.it
  • Third part of the project consists in the assignment described here:
  1. Deadline: Jan 8, 2024

Students who did not deliver the above project within Jan 8, 2024 need to ask by email a new project to the teachers. The project that will be assigned will require about 2 weeks of work and after the delivery it will be discussed during the oral exam.

Paper Presentation (OPTIONAL)

Students need to present a research paper (made available by the teacher) during the last week of the course. This presentation is OPTIONAL: Students that decide to do the paper presentation can avoid the oral exam with open questions. They only need to present the project (see next point). The paper presentation can be done by the group or by a single person.

Oral Exam

  • Project presentation (with slides) – 10-15 minutes: mandatory for all the students with question fo understanding the details of any part of the project.
  • Open questions on the entire program : for students who will not opt for paper presentation
  • Open questions on the topics which will not be covered by the project only for students opting for paper presentation.

Previous years

magistraleinformatica/dmi/start.1700851179.txt.gz · Ultima modifica: 24/11/2023 alle 18:39 (5 mesi fa) da Anna Monreale