====== Data Mining 2018 ====== * **Anna Monreale**\\ Università di Pisa, Knowledge Discovery and Data Mining Lab\\ [[annam@di.unipi.it]] ===== News ===== ====== Goals ====== Data mining and knowledge discovery techniques emerged as an alternative approach, aimed at revealing patterns, rules and models hidden in the data, and at supporting the analytical user to develop descriptive and predictive models for a number of business problems. This short course focusses on the main applications scenarios of data mining to challenging problems in the broad CRM domain - Customer Relationship Management. ====== Syllabus ====== * Clustering models. Discussion of real cases. * Patterns and association rule mining for market basket analysis. * Prediction models Discussion of real cases. ====== Textbooks ====== * Slides (see Calendar). * Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7 * Pang-Ning Tan, Michael Steinbach, Vipin Kumar. //Introduction to Data Mining//. Addison Wesley, ISBN 0-321-32136-7, 2006 * [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php]] ====== Reading about the "data analyst" job ====== * Data, data everywhere. The Economist, Feb. 2010 {{:dm:economist--010.pdf|download}} * Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011 [[http://tech.fortune.cnn.com/2011/09/06/data-scientist-the-hot-new-gig-in-tech/|link]] * Welcome to the yotta world. The Economist, Sept. 2011 {{:dm:economist-2012-dm.pdf|download}} ====== Calendar ====== ^ ^ Date ^ Topic ^ Learning material ^ |01. | 18.09.2018 | Introduction to data mining and big data analytics. Data Understanding & Preparation | {{ :dm:1-introduction-sa.pdf |}} {{ :dm:2-dataunderstanding-sa.pdf |}} {{ :dm:3-data_preparation-sa.pdf |}}| |02. | 19.09.2018 | knime: Data Understanding & Preparation. Clustering | {{ :dm:4-clusteringintroduction-sa.pdf |}} {{ :dm:5-kmeans-sa.pdf |}} {{ :dm:6-dbscan-sa.pdf |}} {{ :dm:01_titanic_data_understanding.zip | 01_titanic_data_understanding}}| |03. | 20.09.2018 | Knime: Clustering. Classificazione. | {{ :dm:02_clustering.zip | knime_clustering}} {{ :dm:7-classification-sa.pdf |}}| |04. | 21.09.2018 | Knime: Classificazione. Case Studies| {{ :dm:04_classification.zip | knime_classification}} {{ :dm:calcio_infortuni.pdf |}}{{ :dm:musicpref.pdf |}} {{ :dm:mensa.pdf |}}| ===== Datasets ===== 0. {{ :dm:data.txt.zip | Iris}}. (for details see [[https://archive.ics.uci.edu/ml/datasets/iris]]) 1. {{ :dm:human_resources.csv.zip | Human Resources}}. (for details see [[https://www.kaggle.com/ludobenistant/hr-analytics]]) 2. {{ :dm:telco_churn.csv.zip | Telco Churn}}. (for details see [[http://didawiki.di.unipi.it/doku.php/dm/mains.santanna.dm4crm.2016]]) 3. {{ :dm:adult.csv.zip | Adult}}. (for details see [[https://archive.ics.uci.edu/ml/datasets/Adult]]) 4. {{ :dm:titanic_train.csv.zip | Titanic}}. (for details see [[https://www.kaggle.com/c/titanic]])