Strumenti Utente

Strumenti Sito


dm:temp

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisione Revisione precedente
Prossima revisione
Revisione precedente
dm:temp [14/02/2014 alle 12:11 (10 anni fa)]
Mirco Nanni
dm:temp [04/04/2014 alle 07:30 (10 anni fa)] (versione attuale)
Mirco Nanni
Linea 1: Linea 1:
 +====== Traccia secondo esercizio, DM2 ======
 +
 +  * ** Sequential pattern analysis: WarLogs Dataset.  Assigned on: 02.04.2014. To be completed within: 21.04.2014. Send papers (3 pages max of text, figures excluded) by email to datamining [dot] unipi [at] gmail [dot] com. Use ”[DM] exercise 6” in the subject.** Download the Dataset here in CVS format: {{:dm:warlogs.csv.zip| warlogs.csv.zip}}. Description of the variables are [[dm:warlogs2013-14|here]]. **Problem** : Build a dataset of sequences that describe, **for each day** and **for each geographical area**, the sequence of **events** happened there. The **geographical areas** to adopt can be the same indicated in the "region" attribute already in the dataset, or they can be obtained by partitioning the territory in some other way, for instance to try to have more balanced areas. The **events** to consider can be, for instance, represented by the "category" or "type" attributes in the dataset, or they can be computed considering other informations (kind of casualties, number of wounded or killed victims, etc.). Use this dataset to extract a set of frequent sequential patterns. **Tools for sequential patterns.** Among possible alternatives, we suggest do adopt one of the following:
 +    * **Weka**: use the GeneralizedSequentialPatterns associator. The input dataset should contain, for each line, a pair <sequence ID><Event ID>, and the lines should be temporally ordered (there is no explicit timestamp in the data). Here is an example: {{:dm:sequence_data.csv.zip|}}.
 +    * **Spam**: command-line tool, that can be downloaded {{:dm:spam_bin.zip|here}} (binaries for Windows and Linux, including sample input file). Notice that the input should contain only numeric (integer) values, therefore some coding is needed. Also, input sequences longer than 64 transactions are not allowed, therefore they should be truncated.
 +
 +
 +====== Ignorare quanto è qui sotto ======
 +
 +
 {{ :dm:c641dad03aacb3d94ad6f575d6a43ac4.jpg?nolink&300 |}} {{ :dm:c641dad03aacb3d94ad6f575d6a43ac4.jpg?nolink&300 |}}
  
dm/temp.1392379903.txt.gz · Ultima modifica: 14/02/2014 alle 12:11 (10 anni fa) da Mirco Nanni