Strumenti Utente

Strumenti Sito


dm:mains.santanna.2011-12

Questa è una vecchia versione del documento!


ICT for BI & CRM - Part III: Data Mining

News

  • Exercises 1 and 2 are online. Deadline for both assigments is December 13, 2011. Send both reports in .pdf format by email to pedre [at] di [dot] unipi [dot] it with the tag [DM-MAINS] in the subject line.

Goals

Organizations and business are overwhelmed by the flood of data continuously collected into their data warehouses and arriving from external sources – the Web above all. Traditional exploratory techniques may fail to make sense of the data, due to its inherent complexity and size. Data mining and knowledge discovery techniques emerged as an alternative approach, aimed at revealing patterns, rules and models hidden in the data, and at supporting the analytical user to develop descriptive and predictive models for a number of business problems, notably in the CRM domain.

Syllabus

  • Basic concepts of data mining and the knowledge discovery process.
  • Data and data sources.
  • Exploratory data analysis.
  • Fundamental data mining tasks and methods: clustering, classification and prediction, patterns and association rules.
  • Hints on descriptive and predictive analytics for CRM tasks: customer segmentation, churn analysis, promo redemption, product recommendation, market basket analysis.
  • Discussion of industrial data mining projects for CRM in retail, both traditional and online.

Textbooks

  • Slides (see Calendar).
  • Gordon S. Linoff e Michael J. Berry. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. Wiley, 2011.

Reading about the "data analyst" job

  • Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011 link

Calendar

Date Topic Learning material
1. 22.11.2011 - 11:00-13:00 and 16:00-18:00 Introduction to Data Mining and the Knowledge Discovery Process introductiondm.pdf - Textbook: chapt. 1
2. 23.11.2011 - 09:00-11:00 Data understanding. Introduction to Weka chap2_data.pdf - Textbook: chapt. 2 and 3
3. 28.11.2011 - 11:00-13:00 and 14:00-16:00 Clustering Analysis clustering.pdf - Textbook: chapt. 8.
4. 29.11.2011 - 11:00-13:00 and 16:00-18:00 Classification and predictive analysis dm.classification.pdf - Textbook: chapt. 4
5. 30.11.2011 - 16:00-18:00 Pattern discovery and associaltion rule mining Textbook: chapt. 6
6. 05.12.2011 - 09:00-13:00 CRM applications. Big data and social network analysis. Data mining and privacy

Exercises

  1. Clustering: Russian Companies dataset. Download the zipped .arff dataset at russiancompanies.zip, describing 1438 Russian companies. The following properties of each company are provided, relative to years 1996 and 1997: number of employees (emp), total amount of wages (wage), total revenues (output), the logarithm of the three previous variables (resp., ln, lw, ly), the production sector (sector: 1 = industry, 2 = constructions, 3 = trade), the kind of ownership (owntype: 1 = public, 2 = private, 3 = mixed). Provide a clustering analysis of the dataset with respect to a selected subset of variables, and explain the obtained clusters taking into account also the nominal variables sector and owntype. Describe your findings in a short report (up to 3 three pages) illustrating the key features of the dataset, how you conducted the clustering analysis, and the interpretation of the obtained clusters.
  2. Classification: Adult Census dataset. Download the zipped .arff dataset at adult.census.zip, describing demographic information about 32561 persons extracted from US census data. The available attrubutes are: age, workclass, education, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, and a binary class income attribute (> $50K, < = $50K). Provide a concise, accurate and readable decision tree for the classification problem of predicting the income class variable given (all or some of) the other variables. Describe your findings in a short report (up to 3 three pages) illustrating the key features of the dataset, how you conducted the classification analysis, and the interpretation of the obtained tree.
dm/mains.santanna.2011-12.1322657402.txt.gz · Ultima modifica: 30/11/2011 alle 12:50 (13 anni fa) da Fosca Giannotti