Entrambe le parti precedenti la revisioneRevisione precedenteProssima revisione | Revisione precedente |
magistraleinformatica:ir:ir14:start [08/12/2014 alle 17:24 (10 anni fa)] – Paolo Ferragina | magistraleinformatica:ir:ir14:start [15/09/2015 alle 10:59 (9 anni fa)] (versione attuale) – [General Information] Paolo Ferragina |
---|
* **Lectures Schedule:** tuesday 14-16 (N1), thursday 14-16 (C) | * **Lectures Schedule:** tuesday 14-16 (N1), thursday 14-16 (C) |
* **Question time:** by appointment. | * **Question time:** by appointment. |
* **Official Lecture's Log:** Here it is the [[http://unimap.unipi.it/registri/dettregistriNEW.php?re=154452::::&ri=9142|registro]]. | * **Official Lecture's Log:** Here it is the [[http://unimap.unipi.it/registri/registri.php?ri=9142&tmplt=principale.tpl&aa=2015|registro]]. |
* News about this course will be distributed via a [[http://twitter.com/FerraginaTeach | Tweeter-channel]] | * News about this course will be distributed via a [[http://twitter.com/FerraginaTeach | Tweeter-channel]] |
| |
| |
^ Date ^ Room ^ Text ^ | ^ Date ^ Room ^ Text ^ |
| 16-01-15, 09:00 | C1 | text | | | 16-01-2015 | C1 | {{:magistraleinformatica:ir:ir14:ir150116.docx|text}} | |
| 09-02-15, 09:00 | A1 | text | | | 09-02-2015 | A1 | {{:magistraleinformatica:ir:ir14:ir150209.docx|text}} | |
| | 05-06-2015 | L1 | {{:magistraleinformatica:ir:ir14:ir150605.docx|text}} | |
| | 29-06-2015 | L1 | {{:magistraleinformatica:ir:ir14:ir150629.docx|text}} | |
| | 20-07-2015 | L1 | {{:magistraleinformatica:ir:ir14:ir150720.docx|text}} | |
| | 10-09-2015 | L1 | {{:magistraleinformatica:ir:ir14:ir150910.docx|text}} | |
| |
===== Books ===== | ===== Books ===== |
| |
^ Date ^ Argument ^ Refs ^ | ^ Date ^ Argument ^ Refs ^ |
| 23-09-2014 | Introduction to the course: modern IR, not just search engines! Boolean retrieval model. Matrix document-term. Inverted list: dictionary + postings. How to implement an AND, OR and NOT queries, and their time complexities. | [[https://dl.dropboxusercontent.com/u/7999075/00-Ratti-video.ppt|Slides]] and Chapt 1 of [MRS] | | | 23-09-2014 | Introduction to the course: modern IR, not just search engines! Boolean retrieval model. Matrix document-term. Inverted list: dictionary + postings. How to implement an AND, OR and NOT queries, and their time complexities. | [[https://dl.dropboxusercontent.com/u/7999075/00-Ratti-video.ppt|Slides]] and Chapt 1 of [MRS] | |
| 25-09-2014 | The issue of hierarchical memories: I/O-model. Parsing: tokenization, normalization, lemmatization, stemming, thesauri. Statistical properties of texts: Zipf law: classical and generalized, Heaps law, Luhn's consideration. | Sect. 2.1, 2.2 and 5.1 of [MRS]. {{:magistraleinformatica:ir:ir13:01-intro.ppt|Slides of this week}}. | | | | 25-09-2014 | The issue of hierarchical memories: I/O-model. Parsing: tokenization, normalization, lemmatization, stemming, thesauri. Statistical properties of texts: Zipf law: classical and generalized, Heaps law, Luhn's consideration. | Sect. 2.1, 2.2 and 5.1 of [MRS]. {{:magistraleinformatica:ir:ir13:01-intro.ppt|Slides of this week}}. | | |
| 30-09-2014 | Index construction: multi-way mergesort, BSBI and SPIMI. Distributed and dynamic indexing. | Chapter 4 of [MRS]. {{:magistraleinformatica:ir:ir13:02-construction.ppt|Slides.}} | | | 30-09-2014 | Index construction: multi-way mergesort, BSBI and SPIMI. Distributed and dynamic indexing. | Chapter 4 of [MRS]. {{:magistraleinformatica:ir:ir13:02-construction.ppt|Slides.}} | |
| 02-10-2014 | Query processing: skip pointers (with solution based on dynamic programming), caching, phrase queries. | {{:magistraleinformatica:ir:ir14:03-query-part_a.ppt|Slides}} | | | 02-10-2014 | Query processing: skip pointers (with solution based on dynamic programming), caching, phrase queries. | {{:magistraleinformatica:ir:ir14:03-query-part_a.ppt|Slides}} | |
| 07-10-2014 | Wild-card queries (permuterm, k-gram). Edit distance: kgram index, dynamic programming. Case of 1-error. Zone indexes. | Chapter 4 and Sect. 3.2 and 3.3 of [MRS]. {{:magistraleinformatica:ir:ir14:03-query-part_b.zip|Slides}} | | | 07-10-2014 | Wild-card queries (permuterm, k-gram). Edit distance: kgram index, dynamic programming. Case of 1-error. Zone indexes. | Chapter 4 and Sect. 3.2 and 3.3 of [MRS]. {{:magistraleinformatica:ir:ir14:03-query-part_b.zip|Slides}} | |
| 14-10-2014 | Exact search: hashing with chaining, univeral hashing, cuckoo hashing. Prefix search: compacted trie, front coding, 2-level indexing. Exact searching with errors: Bloom filter. | {{:magistraleinformatica:ir:ir12:reading-cuckoo.pdf|Paper}} on cuckoo, just description (no proof). Paper on {{:magistraleinformatica:ir:ir12:reading-bloomfilter.pdf|bloom filter}} (just things looked in class). {{:magistraleinformatica:ir:ir14:04a-dictsearch.ppt|Slides}}.| | | 14-10-2014 | Exact search: hashing with chaining, univeral hashing, cuckoo hashing. Prefix search: compacted trie, front coding, 2-level indexing. Exact searching with errors: Bloom filter. | {{:magistraleinformatica:ir:ir12:reading-cuckoo.pdf|Paper}} on cuckoo, just description (no proof). Paper on {{:magistraleinformatica:ir:ir12:reading-bloomfilter.pdf|bloom filter}} (just things looked in class). {{:magistraleinformatica:ir:ir14:04a-dictsearch.ppt|Slides}}.| |
| 16-10-2014 | Auto-completion search: definition, trie, top-k, cartesian tree, RMQ. | {{:magistraleinformatica:ir:ir14:4b-dictsearch_-_autocompletion_-_part_1.pdf|Slides}}. You should read [[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/ae/ae2012/chap8.pdf|Sect 8.4.1]] and Sect 2 and 3 of this [[http://www.cs.sunysb.edu/~bender/newpub/BenderFa00-lca.pdf|paper]]. | | | 16-10-2014 | Auto-completion search: definition, trie, top-k, cartesian tree, RMQ. | {{:magistraleinformatica:ir:ir14:4b-dictsearch_-_autocompletion_-_part_1.pdf|Slides}}. You should read [[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/ae/ae2012/chap8.pdf|Sect 8.4.1]] and Sect 2 and 3 of this [[http://www.cs.sunysb.edu/~bender/newpub/BenderFa00-lca.pdf|paper]]. | |
| 21-10-2014 | Notion of Entropy and prefix-free codes. Huffman and Arithmetic coding, with comparisons with Entropy and comment on their performance. | Sect. 6.1 of [MRS]. Read pag. 21-36, 52-56 of [MG]. {{:magistraleinformatica:ir:ir14:05a-compression_text.ppt|Slides}}. | | | 21-10-2014 | Notion of Entropy and prefix-free codes. Huffman and Arithmetic coding, with comparisons with Entropy and comment on their performance. | Sect. 6.1 of [MRS]. Read pag. 21-36, 52-56 of [MG]. {{:magistraleinformatica:ir:ir14:05a-compression_text.ppt|Slides}}. | |
| 28-10-2014 | Arithmetic coding: bound on the code length, relation with entropy. LZ77, LZSS, gzip and snappy. Posting list compression, codes: gamma, delta, variable bytes. | Sez 5.3 of [MRS] and pag. 74-79 of [MG]. | | | 28-10-2014 | Arithmetic coding: bound on the code length, relation with entropy. LZ77, LZSS, gzip and snappy. Posting list compression, codes: gamma, delta, variable bytes. | Sez 5.3 of [MRS] and pag. 74-79 of [MG]. | |
| 30-10-2014 | PForDelta. Rank and Select primitives: definition and their use. Elias-Fano code and its use for postings compression. | {{:magistraleinformatica:ir:ir14:05b-compression_integers_vers1.ppt|Slides}}.| | | 30-10-2014 | PForDelta. Rank and Select primitives: definition and their use. Elias-Fano code and its use for postings compression. | {{:magistraleinformatica:ir:ir14:05b-compression_integers_vers1.ppt|Slides}}.| |
| 11-11-2014 | More on Rank and Select on binary arrays. Rank and Select on general arrays: the Wavelet Tree. Binary tree encoding and navigation. | {{:magistraleinformatica:ir:ir14:05c-succinct-tree_and_wavelet.ppt|Slides}} | | | 11-11-2014 | More on Rank and Select on binary arrays. Rank and Select on general arrays: the Wavelet Tree. Binary tree encoding and navigation. | {{:magistraleinformatica:ir:ir14:05c-succinct-tree_and_wavelet.ppt|Slides}} | |
| 13-11-2014 | Suffix arrays: data structure and search operations. Text mining over suffix arrays. Move-to-Front and Run-length-encoding and Burrows-Wheeler Transform: bzip, how to construct, hot to decompress entirely or just a substring. | {{:magistraleinformatica:ir:ir14:05d-bwt_e_fmi.pptx|Slides}} | | | 13-11-2014 | Suffix arrays: data structure and search operations. Text mining over suffix arrays. Move-to-Front and Run-length-encoding and Burrows-Wheeler Transform: bzip, how to construct, how to decompress entirely or just a substring. | {{:magistraleinformatica:ir:ir14:05d-bwt_e_fmi.pptx|Slides}} | |
| 18-11-2014 | Text-based ranking: dice, jaccard, tf-idf. Vector space model. Storage of tf-idf and use for computing document-query similarity. Fast top-k retrieval: high idf, champion lists, many query-terms, fancy hits, clustering. Relevance feedback, Rocchio, pseudo-relevance feedback, query expansion.| Sect 6.2, 6.3 from [MRS]. Chap 7 and 9 from [MRS]. {{:magistraleinformatica:ir:ir14:06-ranking.ppt|Slides}} | | | 18-11-2014 | Text-based ranking: dice, jaccard, tf-idf. Vector space model. Storage of tf-idf and use for computing document-query similarity. Fast top-k retrieval: high idf, champion lists, many query-terms, fancy hits, clustering. Relevance feedback, Rocchio, pseudo-relevance feedback, query expansion.| Sect 6.2, 6.3 from [MRS]. Chap 7 and 9 from [MRS]. {{:magistraleinformatica:ir:ir14:06-ranking.ppt|Slides}} | |
| 25-11-2014 | Link-based ranking: pagerank and HITS and weighted variants. | Chap 21 from [MRS]. {{:magistraleinformatica:ir:ir14:06-ranking.ppt|Slides}}. | | | 25-11-2014 | Link-based ranking: pagerank and HITS and weighted variants. | Chap 21 from [MRS]. {{:magistraleinformatica:ir:ir13:08b-web-ranking.ppt|Slides}}. | |
| 27-11-2014 | Web search engines: difficulties for their design, the web-graph and its properties, how to compute the SCC I/O-efficiently, the size of the web and the estimation of the relative sizes of SE. | Chap. 8, 19.1 and 19.2, 19.5 from [MRS]. {{:magistraleinformatica:ir:ir14:07-theweba.ppt|Slides}} | | | 27-11-2014 | Web search engines: difficulties for their design, the web-graph and its properties, how to compute the SCC I/O-efficiently, the size of the web and the estimation of the relative sizes of SE. | Chap. 8, 19.1 and 19.2, 19.5 from [MRS]. {{:magistraleinformatica:ir:ir14:07-theweba.ppt|Slides}} | |
| 28-11-2014 | The storage of the web graph. Latent Semantic Indexing. | Chap 18 and 20.4 from [MRS]. {{:magistraleinformatica:ir:ir14:09-lsi.ppt|Slides}}. | | | 28-11-2014 | The storage of the web graph. Latent Semantic Indexing. | Chap 18 and 20.4 from [MRS]. {{:magistraleinformatica:ir:ir14:09-lsi.ppt|Slides}}. | |
| 02-12-2014 | Random projections. Crawling and consistent hashing. | Chap 20.1, 20.2 and 20.3. {{:magistraleinformatica:ir:ir14:08a-web-crawling.ppt|Slides}}. | | | 02-12-2014 | Random projections. Crawling and consistent hashing. | Chap 20.1, 20.2 and 20.3. {{:magistraleinformatica:ir:ir14:08a-web-crawling.ppt|Slides}}. | |
| 03-12-2014 | Recommendation systems and Web advertising. | {{:magistraleinformatica:ir:ir14:11-applications.ppt|Slides}}. | | | 03-12-2014 | Recommendation systems and Web advertising. | {{:magistraleinformatica:ir:ir14:11-applications.ppt|Slides}}. | |
| 04-12-2014 | Semantic-annotation tools: basics and TAGME. | | | | 04-12-2014 | Semantic-annotation tools: basics and TAGME. | | |
| 09-12-2014 | Semantic-annotation tools: advanced and some applications. | {{:magistraleinformatica:ir:ir14:12-topic_annotators.pptx|Slides}} | | | 09-12-2014 | Semantic-annotation tools: advanced and some applications. | {{:magistraleinformatica:ir:ir14:12-topic_annotators.pptx|Slides}} | |
| 10-12-2014 | Extra lecture: 9-11, M1: Clustering: flat, hierarchical, soft, hard. K-means, optimal bisect, hierarchical - max, min, avg, centroid. | Chap 16 and 17 of [MRS], slides. | | | 10-12-2014 | Extra lecture: 9-11, M1: Clustering: flat, hierarchical, soft, hard. K-means, optimal bisect, hierarchical - max, min, avg, centroid. | Chap 16 and 17 of [MRS], {{:magistraleinformatica:ir:ir14:10-clustering.ppt|slides}}. | |
| 11-12-2014 | room C: exercises | | | | 11-12-2014 | Exercises | | |
| 16-12-2014 | room N1: exercises | | | | 12-12-2014 | Q&A time on theory and exercises | | |
| | 16-12-2014 | Exercises | | |
| | 18-12-2014 | Q&A time on theory and exercises (meeting requested by students) | | |