====== Year 2017-2018 ====== [[http://pages.di.unipi.it/grossi/|Prof. Roberto Grossi]]\\ Tutor: [[http://webhost.services.iit.cnr.it/staff/filippo.geraci/|Dr. Filippo Geraci]] ==== Announcements ==== Please register at http://piazza.com/unipi.it/fall2017/642aa for electronic discussion and other activities. ==== Schedule ==== * Regarding your exam planning: I'll be on leave during 19.09.18-15.10.18 and 01.11.2018-28.02.19. * Timetable: [[https://www.di.unipi.it/en/education/mcs/timetable-wif/myschedule-wif?courses%5B%5D=58926_u&cds=WIF-LM&dip=Informatica&test=10&sem=1&anno=2017#|weekly timetable]] with changes: (Oct.5 -> Oct.4, 14-16, I-Lab); (Oct.9 -> Oct.2, 14-16, room A1); (Oct.10 -> Oct.23, 14-16, room A1). * Office hours: Thu. 14-17 or by appointment. ==== Overview ==== The advanced nature of this course focuses on developing algorithmic design skills, exposing the students to complex problems that cannot be directly handled by standard libraries (being aware that several basic algorithms and data structures are already covered by the libraries of modern programming languages), thus requiring a significant effort in problem solving. These problems involve all basic data types, such as integers, strings, (geometric) points, trees and graphs as a starting point. The syllabus is structured to highlight the applicative situations in which the corresponding algorithms can be successfully employed, making references to software applications and libraries. The level of detail in each argument can change year-by-year, and will be decided according to requests coming from other courses in the curriculum and/or specific issues arising in, possibly novel, applicative scenarios. ==== Exams ==== //Written exam:// - choose one of the topics discussed in class - write a very short to-do list and ask the instructor for approval - if the instructor suggests some mods, modify the to-do list according to the instructor's comments and repeat step 2 - if the chosen topic and the to-do list are approved, expand the to-do list into a more detailed to-do list and repeat step 3 - make a written report in English and submit it to the instructor (recall to add at least 20% new content, when compared to what seen in class) - meet the instructor to read together the report and get some comments on it - make the necessary mods Suggested reading: [[http://www.ittc.ku.edu/~jsv/Papers/Vitwritingnotes.pdf|some useful tips for scientific writing in English]] (first two sections) by J.S. Vitter. Example of interaction: [[.interaction:|student and instructor]] discussing the report's content and structure. //Oral exam:// topics discussed in class, please read the references in the notes. ==== Topics ==== Caveat: Several topics are the outcome of recent advancements in the field, and thus the course material mostly consists in research papers or book chapters. === Glimpse on the class topics === ^ Date ^ Topics ^ References and notes ^ |18.09.2017| Distributed server and load balancing through hashing.| [[https://jeremykun.com/2015/12/28/load-balancing-and-the-power-of-hashing/|blog]]| |19.09.2017| Download file manager and the knapsack problem: hardness, approximation and heuristics.| [[https://www.dropbox.com/s/w4wludnqb94fdq0/20170919.pdf?dl=0|snapshot]]| |21.09.2017| Large-scale analysis of social networks through diameter computation.| [[https://www.dropbox.com/s/yz7rtj5io9emspw/20170921.pdf?dl=0|snapshot]]| === Randomization, hashing and data streaming === Randomization is a powerful tool to solve large-scale problems. After introducing the concept of randomized algorithms and hashing, we consider some applications, such as data streaming algorithms, a field emerged in the last decade. Here data flow as a stream and one-pass algorithms with limited memory can process it. We focus on the count-min sketch paradigm and its applications. [Note: to refresh the basic notions on counting and probability, please refer to Appendix C in Cormen-Leiserson-Rivest-Stein's book "Introduction to Algorithms", 3rd ed., MIT Press.] ^ Date ^ Topics ^ References and notes ^ |25.09.2017| Playing with probability. Random indicator variables: secretary problem and random permuting (suggested reading: birthday paradox). Randomized quick sort. | [CLRS 5.1-5.3 (optional 5.4.1), {{:magistraleinformatica:alg2:algo2_13:randqs.pdf|par. 7.3}}] [[https://repl.it/L9KD/1|code]] | |26.09.2017| Virus scan and stream analysis with Karp-Rabin fingerprints: randomized checking and pattern matching. Montecarlo and Las Vegas algorithms. | [RM {{:magistraleinformatica:alg2:algo2_15:karp-rabin-1.pdf| par.7.4-7.6}}] [[https://repl.it/LbWC/3|code]]| |28.09.2017| Dictionary of keywords. Quick review of classical hashing. Universal hashing. Markov's inequality. Perfect hashing.| [CLRS 11.2, 11.3.3, CLRS 11.5 ] [[https://repl.it/Ljuh/8|code]]| |02.10.2017| Case study on hashing: rsync and file synchronization using hash functions | {{ :magistraleinformatica:ad:ad_17:rsync.pdf |slides}} [[https://en.wikipedia.org/wiki/Rsync|wikipedia]] | |02.10.2017| Data Streaming algorithms. Motivations and examples. Count-Min Sketches | {{https://7797b024-a-62cb3a1a-s-sites.googlegroups.com/site/countminsketch/cm-latin.pdf| sects.1-3}} [[https://sites.google.com/site/countminsketch/|Site]] {{:magistraleinformatica:alg2:algo2_12:count-min-sketch.pdf|Notes}} [[https://repl.it/Lvob/3|code]]| |03.10.2017| Queries with Count-Min Sketches: implementation and analysis. | {{https://7797b024-a-62cb3a1a-s-sites.googlegroups.com/site/countminsketch/cm-latin.pdf| sects.3-4}} {{:magistraleinformatica:ad:ad_17:20171003.pdf|Notes}} {{:magistraleinformatica:alg2:algo2_12:count-min-sketch-median.pdf|Notes (optional)}} | |04.10.2017| Document resemblance with MinHash, k-sketches and the Jaccard similarity index. Azuma-Hoeffding bound. | [[http://gatekeeper.dec.com/ftp/pub/dec/SRC/publications/broder/positano-final-wpnums.pdf|paper]] [[http://cs.brown.edu/courses/cs253/papers/nearduplicate.pdf|paper]] [[http://homes.cs.washington.edu/~jrl/cs525/scribes08/lec10.pdf|Azuma-Hoeffding]] [[https://repl.it/MDNO/3|code]]| |19.10.2017| Case study on hashing: document tagging and perfect hashing.| {{ :magistraleinformatica:ad:ad_17:tagger.tgz | code }} | |23.10.2017| Networked data and randomized min-cut algorithm for graphs. | {{:magistraleinformatica:alg2:algo2_15:mincut1.pdf| par.1.1}} | |23.10.2017| Proxy caches and dictionaries with errors: Bloom filters. | {{:magistraleinformatica:alg2:bloomfiltersurvey.pdf|Survey: except par.2.5-2.6 (optional: par.2.2)}} | |24.10.2017| Randomness in data. Kolmogorov complexity. Generating random sequences and permutations (exercise: subsets). | [[http://www.eecs.berkeley.edu/~luca/cs172/notek.pdf|Notes]] [[http://www.1stworks.com/ref/ruskeycombgen.pdf|Sect. 5.4]] | |26.10.2017| Generating random trees. Models for generating random graphs: Erdős-Rényi-Gilbert G(n,p), Barabási–Albert preferential attachment, Watts–Strogatz small world, configuration model. | [[http://www.1stworks.com/ref/ruskeycombgen.pdf|Sect. 5.5]] [[http://www.win.tue.nl/~rhofstad/NotesRGCN.pdf| Sect. 1.1-1.3 (optional 1.5-1.6), 4,1 (till pag.126), 7.1-7.2, 8.1]] | |06.11.2017| Case study on data stream: statistics and analytics (part 1). | {{ :magistraleinformatica:ad:ad_17:data-stream-stats.pdf |notes}} | |07.11.2017| Distributed server and load balancing through hashing: revisited | [[https://www.cs.cmu.edu/afs/cs/project/pscico-guyb/realworld/www/slidesF15/hashing.pdf|Sect.7 and 8.1]] | |09.11.2017| The power of two random choices: Cuckoo hashing.| {{:magistraleinformatica:alg2:algo2_10:cuckoo-undergrad.pdf|Notes}} {{:magistraleinformatica:alg2:algo2_16:cuckoohashinsertion.pdf|Notes}} [[https://repl.it/NzGN/3|code]]| |13.11.2017| Case study on data stream: statistics and analytics (part 2). | {{ :magistraleinformatica:ad:ad_17:data-stream-stats2.pdf |notes}} [[https://repl.it/@geraci/Min-Hash-estimate|code]]| === Enumeration, hardness and approximation of some combinatorial problems === NP-hard problems are important but difficult to solve and no deterministic polynomial algorithms are currently known for them. Even basic problems, such as finding a simple path of vertices in a graph, are NP-hard. We discuss how to attack these problems (a) by counting and listing their solutions where the cost is proportional to the output (it can be exponential in the worst case but we only pay for what we get) and (b) by approximating their solutions with polynomial-time algorithms when the problem requires to minimize a cost or maximize a benefit. Most of the addressed problems are on graphs, which are popular representations for modern networked data. ^ Date ^ Topics ^ References and notes ^ |12.10.2017| Introduction to the P vs NP problem. Polynomial reducibility among problems.| [CLRS, Chapt.34 till first part of Sect.34.3] | |16.10.2017| NP-complete problems. Cook-Levin Theorem: an intuition. Reduction examples: from 3SAT to Clique (substitution); from 3SAT to vertex cover VC (gadget). | [CLRS, Sect. 34.3, Th. 34.11 and 34.12, Sect. 34.5.2] {{ :magistraleinformatica:ad:ad_17:cook-levin.pdf | Cook-Levin}} | |17.10.2017| NP-hard problems. Reduction from Partition to Knapsack (restriction). Dynamic programming algorithms for Knapsack: Case 1: integer weights, complexity O(nW). Case 2: integer values, complexity O(n2vmax). Examples. | {{ :magistraleinformatica:ad:ad_17:partition-knapsack.pdf | PDF}} [[https://repl.it/@grossiroberto/knapsack|code]] | |14.11.2017| NP-hard problems: heuristics based on dynamic programming; approximation algorithms. Case study: knapsack problem. | [[http://www.dis.uniroma1.it/~ausiello/InfoTeoIIRM/book/chapter02.pdf| chapt.2: par. 2.1.1]] [[https://repl.it/@grossiroberto/knapsack|code]] | |16.11.2017| NP-hard problems: branch and bound algorithms; fully polynomial-time approximation schemes (FPTASs). Case study: knapsack problem. | {{ :magistraleinformatica:ad:ad_17:vazirani_knapsack.pdf |ch.8}} {{ :magistraleinformatica:ad:ad_17:NotesKnapsack1.pdf | notes}} [[https://repl.it/@grossiroberto/approxKnapsack|code]] | |20.11.2017| Case study on bottom-k sketches: approximate similarity searching of large collections of images | [[http://image.diku.dk/igel/paper/NNCUBkS.pdf|paper]]| |21.11.2017| NP-hard problems: counting version (#P) based on dynamic programming, uniform random sampling of the feasible solutions; fully polynomial-time randomized approximation schemes (FPRASs). Case study: #knapsack problem. | {{ :magistraleinformatica:ad:ad_17:notesknapsack2.pdf |notes}} [[https://repl.it/@grossiroberto/ApproxKnapsack|code]] | |23.11.2017| General inapproximability results. Case study: travel salesman problem (TSP). 2-approximation algorithm . | [CLRS 35.2] | |27.11.2017| Case study on approximation for metric k-center: Clustering and video summarization. | [[https://www.dropbox.com/s/mvomtclqs97vx26/27-11.pdf?dl=0|slides]] [[https://www.dropbox.com/s/aeowxkt6p8iximx/chapter2.pdf?dl=0|notes]] | |28.11.2017| Non-existence of PTAS. Local search. Greedy. Case study: max cut for graphs. | {{:magistraleinformatica:alg2:algo2_14:lec02.pdf|Notes}} | |30.11.2017| Randomized approximation. Derandomization: universal hash functions; conditional expectations. Case study: max-cut for graphs. | [[http://pages.cs.wisc.edu/~jyc/02-810notes/lecture19.pdf|sect. 3-4]] [[http://web.cs.iastate.edu/~pavan/633/lec14.pdf|sect. 1.1]] | |04.12.2017| Case study on approximation for graphs (max cut): single individual haplotypes reconstruction problem (hapcut) |[[https://tinyurl.com/yak7p2w7|paper]] | |05.12.2017| Fixed-parameter tractable (FPT) algorithms. Kernelization. Bounded search tree. Case study: min-vertex cover in graphs. | [[https://www.mimuw.edu.pl/~malcin/book/parameterized-algorithms.pdf|sect. 2.2.1, 3.1]] | |07.12.2017| Randomized FPT algorithms: color coding and randomized separation. Case study: longest path in graphs and subgraph isomorphism. | [[https://www.mimuw.edu.pl/~malcin/book/parameterized-algorithms.pdf|sect. 5.2, 5.3]] | |11.12.2017| Class canceled for weather alert. | - | |12.12.2017| Fine-grained algorithms. SETH conjecture and conditional lower bounds. Guaranteed heuristics. Case study: diameter in undirected unweighted graphs. | [[https://www.dropbox.com/s/zq0dklabkjyd302/20171212.pdf?dl=0|notes]] [[https://people.csail.mit.edu/virgi/ipec-survey.pdf|sect. 2.3, 2.4, 3, 4]]| |14.12.2017| Approximation in fine-grained algorithms and limitations. Case study: diameter in undirected unweighted graphs. Case study: communities detection in large graphs.| {{ :magistraleinformatica:ad:ad_17:diameterapprox.pdf | notes }} [[https://www.nature.com/articles/nature03607.pdf|paper]] [[https://images.nature.com/original/nature-assets/nature/journal/v435/n7043/extref/nature03607-s1.pdf|supplement]] | == Activity in class == * Some of the [[https://www.dropbox.com/sh/qoi3td0pgn5efe1/AADOZkEZOeTO_OHcqh2Wx7sPa?dl=0|screen snapshots]] shown during the classes. == Official documents for the course == * Access to [[http://unimap.unipi.it/registri/printregistriNEW.php?re=2088158::::&ri=9172|unimap log (registro delle lezioni)]]. * Access to the [[https://esami.unipi.it/esami/|course evaluation form (questionario studenti)]]. == Spot yourself in the classroom == {{empty|}}