Strumenti Utente

Strumenti Sito


magistraleinformatica:alg2:algo2_16:start

Year 2016-2017

Prof. Roberto Grossi
Tutor: Dr. Filippo Geraci

Announcements

  • Exams – written part: results here.
  • The final list of problems discussed in class is available.
  • First meeting for oral exam calendar: Jan. 16, 2016 at 9:00 in office (room 342 DN).
  • Second meeting for oral exam calendar: Feb. 13, 2016 at 9:00 in office (room 342 DN).
  • Office hours: Thu. 14-18 (in my room at the Dipartimento di Informatica) or by appointment.

Overview

We will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. This course deepens and extends the algorithmic notions of students.

The syllabus is structured to highlight the applicative scenarios in which the studied algorithms and data structures can be successfully applied. The level of detail with which each argument will be dealt with can change year-by-year, and will be decided according to requests coming from other courses and/or specific issues arising in, possibly novel, applicative scenarios.

The advanced nature of this course focuses on developing algorithmic design skills. The students are exposed to complex problems that require a significant effort in problem solving. One “brainstorming” lecture per week is devoted to a full immersion in this activity, and it is highly recommend to attend it. The purpose is not that of teaching/learning further N algorithms, but to refine students' skills. The final written exam will be based on the topics discussed during the “brainstorming” lectures.

We like students that make mistakes in a constructive way, since this is the starting point of our brainstorming to drive their reasoning paths, thus learning by mistakes. We are interested in the route that leads to the algorithmic solution, rather than the solution itself. Regarding mistakes and intelligence, Alan Turing wrote in 1947 that “… if a machine is expected to be infallible, it cannot also be intelligent”. This is a motto for this class, with a kind invitation to express your own ideas even if they could be apparently wrong.

Topics

Please note that several topics are the outcome of recent advancement in algorithms and data structures, and thus most of the course material consists in research papers or book chapters.

Warming up

Quick review of some basic notions and algorithms, with problem solving and grader for solutions.

  • URL for the grader: http://cms.di.unipi.it (please register to submit your solutions).
  • Reference textbook: [CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein: Introduction to Algorithms, 3rd Edition (MIT Press), 2009.
Date Topics References and notes
Sept. 21 Introduction to the course. Entrance exam. -
Sept. 22 Graphs representations. BFS. Dijkstra algorithm for SSSP. Problem “hogwarts”. [CLRS 22.1, 22.2, 24.3] hogwarts
Sept. 27 Problem solving: “hogwarts”. hogwarts
Sept. 28 Mergesort and inversions in sorting. Problem “paletta”. Another solution for “hogwarts”. [CLRS 2.3.1, 2.3.2, def. inversions in exercise 2-4] hogwarts solution paletta.pdf
Sept. 29 DFS and cycle detection. k-coloring of graphs. Problem “nemesi”. [CLRS 22.3, def. graph coloring in problem B-1] nemesi
Oct. 4 Problem solving: “nemesi” and “paletta”. nemesi.pdf e paletta.pdf
Oct. 5 Augment search trees: ranges, ranks, and intervals. [CLRS 14 (only static, no red-black tree)] Segment tree classifica.pdf [solve paletta.pdf using these data structures]
Oct. 6 Playing with probability. Random indicator variables: secretary problem and random permuting. Problem 1 [CLRS 5.1-5.3]
Oct. 11 Problem solving: “classifica”; range updates. classifica.pdf Problem 1

Randomization, hashing and data streaming

Randomization is now a common powerful tool to solve large-scale problems. After introducing the concept of randomized algorithms and hashing, we apply them to solve data streaming problems, a field emerged in the last decade. Here data flow as a stream and one-pass algorithms with limited memory can process it. We focus on the count-min sketch paradigm and its applications. [Note: to refresh the basic notions on counting and probability, please refer to Appendix C in Cormen-Leiserson-Rivest-Stein's book “Introduction to Algorithms”, 3rd ed., MIT Press.]

Date Topics References and notes
Oct. 12 Random indicator variables: birthday paradox. Las Vegas and Montecarlo algorithms. Indicator variables and randomised quicksort (Las Vegas). [CLRS 5.4.1, 7.3] par. 7.3
Oct. 13 Karp-Rabin fingerprints: randomized checking and pattern matching (Montecarlo and Las Vegas) par.7.4-7.6
Oct. 18 Problem solving: expected depth of nodes in random search trees. Karp-Rabin fingerprinting and randomized data structure for longest common extension. Problems 2 and 3
Oct. 19 Hashing. Quick review of the chaining method. Universal hashing. [CLRS 11.2, 11.3.3 ]
Oct. 20 Markov's inequality. Perfect hashing. [CLRS 11.5 ]
Oct. 25 Problem solving: Hashing sets. Pairwise independence of hash functions. Problems 4 and 5
Oct. 26 Data Streaming Model. Motivations and examples. Count-Min Sketches (part I) sects.1-3 Site Notes
Oct. 27 Count-Min Sketches (part II). Cuckoo Hashing (part I) sects.1-3 Notes Notes
Nov. 7 Question time -
Nov. 8 Problem solving: Data streaming and count-min sketches. Problems 6-9
Nov. 9 Cuckoo Hashing (part II). Notes Notes
Nov. 10 Classical Bloom Filters. Survey: except par.2.5-2.6 (optional: par.2.2)
Nov. 15 Problem solving: space-efficient perfect hash and classical Bloom filters. Problems 10 and 11
Nov. 16 MinHash, and sketches for the Jaccard similarity index. paper Azuma-Hoeffding
Nov. 17 Randomized min-cut algorithm in unweighted graphs (Montecarlo type) par.1.1
Nov. 22 Problem solving: MinHash sketches for the union. Extensions of the randomized min-cut. Problems 12 and 13

Algorithms for massive data sets and hierarchical memories

Massive data sets are one of the key elements in our so-called big data society. In this scenario the design and analysis of algorithms and data structures adopt the external and cache-oblivious models, which take into account the memory hierarchy that provides performance to modern computers. We study basic problems on sorting and searching in these models, comparing the results with the classical RAM model of computation.

Date Topics References and notes
Nov. 23 External memory model (EMM) a.k.a. cache-aware model vs cache-oblivious model: I/O complexity, sequential access versus random access. Searching (part I). Introduction, chapt.2 and 3 (set D=1 disks) Sect.1-3
Nov. 24 External memory model and cache-oblivious model: searching (part II) and van EmdeBoas layout. [Sect.4.1]
Nov. 29 Problem solving: search and range queries in implicit search tree and van Embe Boas layout. Problems 14, 15 and 16
Nov. 30 External memory k-way merge sort. DC3 suffix sort (part I). notes DC3 algorithm (up to Section3)
Dec. 1 DC3 suffix sort (part II). Review of P and NP classes. DC3 algorithm (up to Section3) example [CLRS 34.2]
Dec. 6 Problem solving: external memory k-way merge, permuting and suffix sorting. Problems 17, 18 and 19

Enumeration, hardness and approximation of some combinatorial problems

NP-hard problems are important but difficult to solve and no deterministic polynomial algorithms are currently known for them. Even basic problems, such as finding a simple path of vertices in a graph, are NP-hard. We discuss how to attack these problems (a) by counting and listing their solutions where the cost is proportional to the output (it can be exponential in the worst case but we only pay for what we get) and (b) by approximating their solutions with polynomial-time algorithms when the problem requires to minimize a cost or maximize a benefit. Most of the addressed problems are on graphs, which are popular representations for modern networked data.

Date Topics References and notes
Dec. 7 Approximations algorithms. Inapproximability of general TSP. 2-approximation for metric TSP. [CLRS preamble, 35.2]
Dec. 13 2-approximation for min-vertex cover and max cut. [CLRS 35.1] Notes
Dec. 14 Problem solving: greedy strategy for min vertex cover and weighted max cut. Problems 20 and 21
Dec. 15 Approximation algorithms for bin-packing and knapsack problems. chapt.2: par. 2.1.1, 2.2.2
Dec. 16 Question time -
Activity in class
Official documents for the course
Spot yourself in the classroom

magistraleinformatica/alg2/algo2_16/start.txt · Ultima modifica: 14/01/2017 alle 23:07 (7 anni fa) da Roberto Grossi