====== Year 2012-2013====== Prof. Roberto Grossi ==== Announcements ==== * Avviso: l'esame scritto del 12 luglio sarĂ  gestito dalla prof.ssa Pagli. * Calendario orali (contattare il docente per fissare una data specifica nel periodo indicato): * dal 17.06.2013 al 4.07.2013; * dal 22.07.2013 al 31.07.2013. * Examination dates: Dec. 20 at 9:30 ("compitino", aula C1), Jan. 15 at 9:30 (aula A), Feb. 5 at 9:30 (aula A), Jun. 12 at 9:00 (Aula A), Jul. 12 at 9:00 (Aula A). * The class lectures will be (mainly) given in English. ==== Overview ==== We will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. This course deepens and extends the algorithmic notions of students. The syllabus is structured to highlight the applicative scenarios in which the studied algorithms and data structures can be successfully applied. The level of detail with which each argument will be dealt with can change year-by-year, and will be decided according to requests coming from other courses and/or specific issues arising in, possibly novel, applicative scenarios. The advanced nature of this course focuses on developing algorithmic design skills. The students are exposed to complex problems that require a significant effort in problem solving. One "brainstorming" lecture per week is devoted to a full immersion in this activity, and it is highly recommend to attend it. The purpose is not that of teaching/learning further N algorithms, but to refine students' skills. The final written exam will be based on the topics discussed during the "brainstorming" lectures. We like students that make mistakes in a constructive way, since this is the starting point of our brainstorming to drive their reasoning paths, thus learning by mistakes. We are interested in the route that leads to the algorithmic solution, rather than the solution itself. Regarding mistakes and intelligence, Alan Turing wrote in 1947 that "... if a machine is expected to be infallible, it cannot also be intelligent". This is a motto for this class, and an invitation to express your own ideas even if they could be apparently wrong. ==== Topics ==== === Algorithms for massive data sets and hierarchical memories === ^ Date ^ Topics ^ References and notes ^ | Oct. 2| Introduction to the course. Problem solving: ideas and examples. Toy problem: maximal interval of positive integers with bounded pairwise difference.| [[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformatica/alg2/algo2_12/pilots.pdf|[Notes]]] | | Oct. 3| External memory model. Case study for sequential access versus random access. Description of the model. CPU complexity and I/O complexity. Cost of scanning and sorting. K-way merge sort.| [[http://www.ittc.ku.edu/~jsv/Papers/Vit.IO_book.pdf|Introduction, chapt.2 and 3 (set D=1 disks)]]]| | Oct. 4| Lower bounds for sorting and permuting in the external memory model. K-way distribution sort.| [[http://www.ittc.ku.edu/~jsv/Papers/Vit.IO_book.pdf|[Introduction, chapt.2 and 3 (set D=1 disks), chapt.5 (set D=1 disks, no randomization, till Sect.5.1.1 included)]]] [[http://people.cs.aau.dk/~simas/aalg04/esort.pdf|Notes]][[http://www.daimi.au.dk/%7Elarge/ioS06/Alower.pdf|[Notes]]]| | Oct. 9| Problem solving. K-way merging, permuting, distributing, median, selection and order statistics.| [[http://people.cs.aau.dk/~simas/aalg04/esort.pdf|Notes]] [[http://en.wikipedia.org/wiki/Selection_algorithm|Notes]]| | Oct. 10| k-way distributing. Scan+sort paradigm: case study of map-reduce.| [[http://www.daimi.au.dk/~large/ioS06/AL.pdf|Notes]] [[http://code.google.com/edu/parallel/mapreduce-tutorial.html|[Notes]]]| | Oct. 11| Cache-oblivious (CO) model. Motivations and simple examples: multiple scanning, inverting an array, median and order statistics, binary search (lower bound).| [[http://erikdemaine.org/papers/BRICS2002/paper.pdf|[Sect.4.1]]]| | Oct. 12| Cache-oblivious model. k-way search search and van Emde Boas layout. Paging of arbitrary binary trees.| [[http://erikdemaine.org/papers/BRICS2002/paper.pdf|[Sect.4.1]]] {{:magistraleinformatica:alg2:algo2_11:paginazione.pdf|[Sect.3.1]}} [[http://www.cs.sunysb.edu/~bender/pub/treelayout-full.ps|[Sect.5.1]]] {{:magistraleinformatica:alg2:algo2_11:cotrees_grossi.pdf|[Notes]}}| | Oct. 16| Problem solving. Implicit van Emde Boas layout of binary complete trees. | {{:magistraleinformatica:alg2:algo2_12:tree_layout_1.pdf|Notes}} {{:magistraleinformatica:alg2:algo2_12:tree_layout_2.pdf|Notes}} [[https://bitbucket.org/coriolan/van-emde-boas-layout/|Code]] [[https://github.com/bcopeland/em_misc/blob/master/vebtree.c|Code]]| | Oct. 17| Problem solving. Map-reduce computations. Cache-oblivious layout of arbitrary trees. | [[http://pastebin.com/FCczrtHU|Code]] [[http://wiki.apache.org/hadoop/WordCount|Code]] [[http://www.google.com/about/datacenters/gallery/|Gallery]] {{:magistraleinformatica:alg2:algo2_12:tree_layout_3.pdf|Notes}}| | Oct. 18| Suffix sorting. DC-3 algorithm for RAM model, EM model, CO model. | [[http://www.cs.helsinki.fi/u/tpkarkka/publications/jacm05-revised.pdf|DC3 algorithm (up to Section3)]]| | Oct. 19| Suffix tree. How to make it cache-oblivious. | {{:magistraleinformatica:alg2:algo2_12:tree_layout_4.pdf|Notes}}| === Randomization, hashing and data streaming === ^ Date ^ Topics ^ References and notes ^ | Oct. 23| Data Streaming Model. Motivations and examples. Basic technique: Count-Min Sketches. K-wise independence.| [[https://sites.google.com/site/countminsketch/|Site]] {{:magistraleinformatica:alg2:algo2_12:count-min-sketch.pdf|Notes}}| | Oct. 24| Problem solving. Filling the details of the DC3 suffix sorting. Proving k-wise independence for a family of hash functions.| Notes.| | Oct. 25| Count-Min Sketches. Full proof and analysis. Indicator variables, Markov's inequality.| [[http://www.research.att.com/people/Cormode_Graham/library/publications/CormodeMuthukrishnan04CMJalg.pdf|Paper]] [[https://sites.google.com/site/countminsketch/|Site]]| | Oct. 30| Problem solving. Space lower bound for Count-Min Sketches.| {{:magistraleinformatica:alg2:algo2_12:lower-bound-count-min-sketch.pdf|Notes}} {{:magistraleinformatica:alg2:algo2_12:appendixc_clrs.pdf|App.C CLRS on Probability}}| | Oct. 31| Problem solving. Chernoff bounds, indicator variables, and applications of count-min sketches. Designing a data streaming algorithm to estimate the number of distinct elements.| {{:magistraleinformatica:alg2:algo2_12:chernoff-bounds.pdf|Sect.4.1, Motwani-Raghavan book}} [[https://wiki.engr.illinois.edu/display/cs591fsn/Distinct+Elements+in+Data+Streams|Notes]] [[http://www.cs.columbia.edu/~coms699812/lecture2.pdf|Notes]]| | Nov. 9| Using a family of random hash functions: Cuckoo hashing.| {{:magistraleinformatica:alg2:algo2_10:cuckoo-undergrad.pdf|Notes}}| | Nov. 13| Problem solving. Count-Min Sketch. Cuckoo hashing.| {{:magistraleinformatica:alg2:algo2_12:count-min-sketch-median.pdf|Notes}}| | Nov. 14| Class canceled.| [[http://www.mit.gov.it/mit/site.php?p=scioperi|Transportation strike]]| | Nov. 16| Bloom filters. Design parameters and probabilistic analysis, with applications.| {{:magistraleinformatica:alg2:bloomfiltersurvey.pdf|Survey: except par.2.5-2.6 (optional: par.2.2)}}| | Nov. 20| Problem solving. Count-Min Sketch for interval queries and inner product.| [[http://www.research.att.com/people/Cormode_Graham/library/publications/CormodeMuthukrishnan04CMJalg.pdf|Paper]]| | Nov. 21| Randomized quicksort. Skiplists. Randomized binary search trees.| {{:magistraleinformatica:alg2:algo2_10:randqs.pdf|[sez.2.5.4]}} {{:magistraleinformatica:alg2:algo2_11:skip.pdf|[sez.3.3]}} {{:magistraleinformatica:alg2:algo2_12:rbst.pdf|Paper (pages 288-302}}| == Data compression and string algorithms == | Nov. 23| Class canceled.| [[http://www.unipi.it/studenti/offerta/tfa/index.htm|Examination committee]]| | Nov. 27| Shannon's entropy and Kolmogorov's complexity. Elias' coding: gamma- and delta-codes. Huffman coding. Lempel-Ziv dictionary-based parsing.| {{:magistraleinformatica:alg2:algo2_11:07_mg-partchap2.pdf|[cap.2:da inizio fino a pag.36; sez.2.6]}} {{:magistraleinformatica:alg2:algo2_11:07_mg-partchap3.pdf|[cap 3: fino pag.119]}}| == Hard problems == | Dec. 7| NP-Hard problems. Approximation algorithms. Traveling Salesman Problem (TSP): hardness of approximation and 2-approximation for metric instances.| {{:magistraleinformatica:alg2:algo2_10:dispensa_0.pdf|[chapt.9: par. 9.3-9.4 (Italian)]}}| | Dec. 11| Further examples: approximation for Maximum Independent Set (MIS) and Vertex Cover (VC).| [[http://www.dis.uniroma1.it/~ausiello/InfoTeoIIRM/book/chapter02.pdf| [chapt.2: pp.39-40, par. 2.1.2]]] [[http://en.wikipedia.org/wiki/Vertex_cover| [Approximate evaluation of VC]]]| | Dec. 14| Knapsack problem: bad example for greedy and its refinement for 2-approximation. Approximation for Min Bin Packing using Next Fit and First Fit Decreasing strategies.| [[http://www.dis.uniroma1.it/~ausiello/InfoTeoIIRM/book/chapter02.pdf| [chapt.2: par. 2.1.1, par. 2.2.2]]] | | Dec. 18| Problem solving. Distinct elements in a stream and general discussion of the exercises presented during the semester.| | == Excercises discussed in class == {{:magistraleinformatica:alg2:algo2_12:esercitazioni2012.pdf| List of problems}} discussed during the problem solving session (in Italian). == Official class log (registro) == [[http://unimap.unipi.it/registri/printregistriNEW.php?re=89100:::&ri=9172| Registro su unimap]] == Exams == * No more accessible