Strumenti Utente

Strumenti Sito


ICT for BI & CRM - Part III: Data Mining


  • The data mining software Weka can be downloaded from here.


Organizations and business are overwhelmed by the flood of data continuously collected into their data warehouses and arriving from external sources – the Web above all. Traditional exploratory techniques may fail to make sense of the data, due to its inherent complexity and size. Data mining and knowledge discovery techniques emerged as an alternative approach, aimed at revealing patterns, rules and models hidden in the data, and at supporting the analytical user to develop descriptive and predictive models for a number of business problems, notably in the CRM domain.


  • Basic concepts of data mining and the knowledge discovery process.
  • Data and data sources.
  • Exploratory data analysis.
  • Fundamental data mining tasks and methods: clustering, classification and prediction, patterns and association rules.
  • Hints on descriptive and predictive analytics for CRM tasks: customer segmentation, churn analysis, promo redemption, product recommendation, market basket analysis.
  • Discussion of industrial data mining projects for CRM in retail, both traditional and online.


  • Slides (see Calendar).
  • Gordon S. Linoff e Michael J. Berry. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. Wiley, 2011.

Reading about the "data analyst" job

  • Data, data everywhere. The Economist, Feb. 2010 download
  • Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011 link
  • Welcome to the yotta world. The Economist, Sept. 2011 download


Date Topic Learning material
1. 05.03.2013 - 11:00-13:00 Introduction to Data Mining and the Knowledge Discovery Process slides - Textbook: chapt. 1
2. 06.03.2013 - 09:00-13:00 Data understanding. Introduction to Weka slides - Textbook: chapt. 2 (2.1, 2.2) and chapt. 3 (3.1, 3.2, 3.3)
3. 06.03.2013 - 14:00-18:00 Clustering Analysis slides - Textbook: chapt. 8 (8.1, 8.2, 8.5)
4. 07.03.2013 - 09:00-13:00 and 14:00-18:00 Classification and predictive analysis slides - Textbook: chapt. 4 (4.1, 4.2, 4.3, 4.4, 4.5)


  • Breast Cancer Wisconsin (Diagnostic) Data Set. Assigned on: 07.03.2013. To be completed within: 22.03.2013. Send papers (3 pages max of text, figures excluded) by email to pedre [at] di [dot] unipi [dot] it cc: Fosca Giannottifosca [dot] giannotti [at] gmail [dot] com. Use “[DM-MAINS] ” in the subject. Groupwork allowed, max 3 people per group, inter-disciplinary competence required in each group!
  • Instructions: Download the Wisconsin Diagnostic Breast Cancer (WDBC) dataset from the UCI archive. The dataset contains 569 observations on samples of breast tissue, together with their classification as benign or malignant, as performed by istologists. You are supposed to perform the following tasks: 1) Data understanding and exploratory analysis; 2) clustering analysis (disregarding the class information), including description of the discovered (best) clusters; 3) classification analysis using decision trees for the task of diagnosing a sample as benign or malignant. Describe the process adopted to select the proposed clustering/tree, together with their quality evaluation.


The exam of the Data Mining module consists in the evaluation of the report of assigned exercises. For students of the two-year LM-MAINS degree the exam consists in the evaluation of the report of exercises, and an individual oral exam devoted to the discussion of aspects emerging from the exercises. The evaluation of the reports is the same for all components of the group (max 3 students oer group). The date of the first oral exam session of the LM-MAINS students will set by appointment.

2012 Edition

dm/mains.santanna.2011-12.txt · Ultima modifica: 14/03/2013 alle 16:03 (5 anni fa) da Fosca Giannotti