====== Traccia secondo esercizio, DM2 ====== * ** Sequential pattern analysis: WarLogs Dataset. Assigned on: 02.04.2014. To be completed within: 21.04.2014. Send papers (3 pages max of text, figures excluded) by email to datamining [dot] unipi [at] gmail [dot] com. Use ”[DM] exercise 6” in the subject.** Download the Dataset here in CVS format: {{:dm:warlogs.csv.zip| warlogs.csv.zip}}. Description of the variables are [[dm:warlogs2013-14|here]]. **Problem** : Build a dataset of sequences that describe, **for each day** and **for each geographical area**, the sequence of **events** happened there. The **geographical areas** to adopt can be the same indicated in the "region" attribute already in the dataset, or they can be obtained by partitioning the territory in some other way, for instance to try to have more balanced areas. The **events** to consider can be, for instance, represented by the "category" or "type" attributes in the dataset, or they can be computed considering other informations (kind of casualties, number of wounded or killed victims, etc.). Use this dataset to extract a set of frequent sequential patterns. **Tools for sequential patterns.** Among possible alternatives, we suggest do adopt one of the following: * **Weka**: use the GeneralizedSequentialPatterns associator. The input dataset should contain, for each line, a pair , and the lines should be temporally ordered (there is no explicit timestamp in the data). Here is an example: {{:dm:sequence_data.csv.zip|}}. * **Spam**: command-line tool, that can be downloaded {{:dm:spam_bin.zip|here}} (binaries for Windows and Linux, including sample input file). Notice that the input should contain only numeric (integer) values, therefore some coding is needed. Also, input sequences longer than 64 transactions are not allowed, therefore they should be truncated. ====== Ignorare quanto è qui sotto ====== {{ :dm:c641dad03aacb3d94ad6f575d6a43ac4.jpg?nolink&300 |}} ^ ^ Day ^ Aula ^ Topic ^ Learning material ^ Instructor ^ |1.| 17.02.2014 9:00-11:00 | N1 | Introduction | | Giannotti | |2.| 19.02.2014 9:00-11:00 | L1 | Frequent patterns and association rules / 1 | | Giannotti | |3.| 24.02.2014 9:00-11:00 | N1 | Frequent patterns and association rules / 2 | | Giannotti | |4.| 26.02.2014 9:00-11:00 | L1 | Frequent patterns and association rules / 3 | | Giannotti | |5. | 3.03.2014 9:00-11:00 | N1 | Association rules on DM tools | | Giannotti | |6.| 5.03.2014 9:00-11:00 | L1 | Sequential patterns / 1 | | Nanni | |7.| 10.03.2014 9:00-11:00 | N1 | Sequential patterns / 2 | | Nanni | |8.| 12.03.2014 9:00-11:00 | L1 | Time series / 1 + Data exploration: assignments | | Nanni | |9.| 17.03.2014 9:00-11:00 | N1 | Time series / 2 | | Nanni | |10.| 19.03.2014 9:00-11:00 | L1 | Classification: evaluation methods + Case study: Fraud detection| | Giannotti | |11.| 24.03.2014 9:00-11:00 | N1 | Network diffusion and Virality Marketing| | Giannotti | |12.| 26.03.2014 9:00-11:00 | L1 | Mobility Data Mining / 1 | | Nanni | |13.| 7.04.2014 9:00-11:00 | N1 | Mobility Data Mining / 2 | | Nanni | |14.| 9.04.2014 9:00-11:00 | L1 | Case study: Mobility Data Mining | | Nanni | |15.| 14.04.2014 9:00-11:00 | N1 | Case study: Mobility Data Mining/2 | | Giannotti - Nanni | |16.| 16.04.2014 9:00-11:00 | L1 | Data exploration: results of assignments + Presentation of projects | | Nanni | |17.| 28.04.2014 9:00-11:00 | N1 | Data Mining and Privacy/1 | | Giannotti | |18.| 30.04.2014 9:00-11:00 | L1 | Case study: Mining official data ed health data | | Nanni | |10.| 5.05.2014 9:00-11:00 | N1 | Data Mining and Privacy/2 | | Giannotti | |20.| 7.05.2014 9:00-11:00 | L1 | | | | |21.| 12.05.2014 9:00-11:00 | N1 | | | | |22.| 14.05.2014 9:00-11:00 | L1 | | | | |23.| 19.05.2014 9:00-11:00 | N1 | | | | |24.| 21.05.2014 9:00-11:00 | L1 | | | | |25.| 27.05.2014 9:00-11:00 | N1 | | | |