Strumenti Utente

Strumenti Sito


dm:mains.santanna.2011-12

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisione Revisione precedente
Prossima revisione
Revisione precedente
dm:mains.santanna.2011-12 [06/03/2013 alle 06:55 (11 anni fa)]
Fosca Giannotti [Exams]
dm:mains.santanna.2011-12 [14/03/2013 alle 15:03 (11 anni fa)] (versione attuale)
Fosca Giannotti [Exercise] Deadline extended
Linea 5: Linea 5:
 ===== News ===== ===== News =====
  
-  * Exercises 1 and 2 are onlineDeadline for both assigments is ** December 13, 2011** Send both reports in .pdf format by email to [[pedre@di.unipi.it]] with the tag [DM-MAINS] in the subject line.+  * The data mining software Weka can be downloaded from [[http://www.cs.waikato.ac.nz/ml/weka/|here]].
  
 ====== Goals ====== ====== Goals ======
Linea 41: Linea 41:
  
 ^ ^ Date ^ Topic ^ Learning material ^  ^ ^ Date ^ Topic ^ Learning material ^ 
-|1.   |22.11.2011 - 11:00-13:00 and 16:00-18:00 | Introduction to Data Mining and the Knowledge Discovery Process | {{:dm:introductiondm.pdf|slides}} - Textbook: chapt. 1 |   +|1.   |05.03.2013 - 11:00-13:00 | Introduction to Data Mining and the Knowledge Discovery Process | {{:dm:introductiondm.pdf|slides}} - Textbook: chapt. 1 |   
-|2.   |23.11.2011 - 09:00-11:00  | Data understanding. Introduction to Weka | {{:dm:chap2_data.pdf|slides}} - Textbook: chapt. 2 (2.1, 2.2) and chapt. 3 (3.1, 3.2, 3.3) |  +|2.   |06.03.2013 - 09:00-13:00  | Data understanding. Introduction to Weka | {{:dm:chap2_data.pdf|slides}} - Textbook: chapt. 2 (2.1, 2.2) and chapt. 3 (3.1, 3.2, 3.3) |  
-|3.   |28.11.2011 11:00-13:00 and 14:00-16:00 | Clustering Analysis | {{:dm:clustering.pdf|slides}} - Textbook: chapt. 8 (8.1, 8.2, 8.5) |  +|3.   |06.03.2013 - 14:00-18:00  | Clustering Analysis | {{:dm:clustering.pdf|slides}} - Textbook: chapt. 8 (8.1, 8.2, 8.5) |  
-|4.   |29.11.2011 11:00-13:00 and 16:00-18:00 | Classification and predictive analysis | {{:dm:dm.classification.pdf|slides}} - Textbook: chapt. 4 (4.1, 4.2, 4.3, 4.4, 4.5) |  +|4.   |07.03.2013 09:00-13:00 and 14:00-18:00 | Classification and predictive analysis | {{:dm:dm.classification.pdf|slides}} - Textbook: chapt. 4 (4.1, 4.2, 4.3, 4.4, 4.5) | 
-|5.   |30.11.2011 - 16:00-18:00 | Pattern discovery and associaltion rule mining | {{:dm:dm.association.pdf|slides}}  -  Textbook: chapt. 6 (6.1, 6.2) |  +
-|6.   |05.12.2011 - 09:00-13:00 | CRM applications. Big data and social network analysis. Data mining and privacy |  +
  
  
-===== Exercises ===== 
  
-  - ** Clustering: Russian Companies dataset. ** Download the zipped .arff dataset at {{:dm:russiancompanies.zip|}}, describing 1438 Russian companies. The following properties of each company are provided, relative to years 1996 and 1997: number of employees (emp), total amount of wages (wage), total revenues (output), the logarithm of the three previous variables (resp., ln ln(emp), lw ln(wage/emp), ly ln(output)), the production sector (sector: 1 industry, 2 constructions,trade), the type of ownership (owntype: 1 public, 2 private, 3 mixed). Provide a clustering analysis of the dataset with respect to a selected subset of variables, and explain the obtained clusters taking into account also the nominal variables sector and owntype. Describe your findings in a short report (up to 3 three pages of text, excluding figures, either in English or Italian) illustrating the key features of the dataset, how you conducted the clustering analysis, and the interpretation of the obtained clusters.  +===== Exercise =====
-  - ** Classification: Adult Census dataset. ** Download the zipped .arff dataset at {{:dm:adult.census.zip|}}, describing demographic information about 32561 persons extracted from US census data. The available attrubutes are: age, workclass, education, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, and a binary class income attribute (> $50K, < $50K). Provide a concise, accurate and readable decision tree for the classification problem of predicting the income class variable given (all or some of) the other variables. Describe your findings in a short report (up to 3 three pages of text, excluding figures and charts, either in English or Italian) illustrating the key features of the dataset, how you conducted the classification analysis, and the interpretation of the obtained tree.+
  
 +  * **Breast Cancer Wisconsin (Diagnostic) Data Set. Assigned on: 07.03.2013. To be completed within: 22.03.2013. Send papers (3 pages max of text, figures excluded) by email to [[pedre@di.unipi.it]] cc: Fosca Giannotti[[fosca.giannotti@gmail.com]]. Use "[DM-MAINS] " in the subject. Groupwork allowed, max 3 people per group, inter-disciplinary competence required in each group!** 
 +  * **Instructions:** Download the {{http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29|Wisconsin Diagnostic Breast Cancer (WDBC) dataset}} from the UCI archive. The dataset contains 569 observations on samples of breast tissue, together with their classification as benign or malignant, as performed by istologists. You are supposed to perform the following tasks: 1) Data understanding and exploratory analysis; 2) clustering analysis (disregarding the class information), including description of the discovered (best) clusters; 3) classification analysis using decision trees for the task of diagnosing a sample as benign or malignant. Describe the process adopted to select the proposed clustering/tree, together with their quality evaluation.
 ====== Exams ====== ====== Exams ======
  
-For MAINS master students (one-year degree) the exam of the Data Mining module consists in the evaluation of the two reports of exercises 1 and 2 above. For students of the two-year LM-MAINS degree the exam consists in the evaluation of the two reports of exercises 1 and 2 above, and an individual oral exam devoted to the discussion of aspects emerging from the exercises. The evaluation of the reports is the same for all components of the group (max 3 students oer group). The date of the first oral exam session of the LM-MAINS students will set by appointment, within January 2012.+The exam of the Data Mining module consists in the evaluation of the report of assigned exercises. For students of the two-year LM-MAINS degree the exam consists in the evaluation of the report of exercises, and an individual oral exam devoted to the discussion of aspects emerging from the exercises. The evaluation of the reports is the same for all components of the group (max 3 students oer group). The date of the first oral exam session of the LM-MAINS students will set by appointment.
  
 ====== 2012 Edition ======  ====== 2012 Edition ====== 
dm/mains.santanna.2011-12.1362552940.txt.gz · Ultima modifica: 06/03/2013 alle 06:55 (11 anni fa) da Fosca Giannotti