Algorithm Engineering A.A. 2017-2018

Teachers: Paolo Ferragina and Linda Pagli

CFU: 9 (first semester).

Course ID: 531AA.

Language: English.

Degree: Master degree in Computer Science and Master degree in CS&Networking.

Question time: After lectures and by appointment.

News: about this course will be distributed via a Tweeter-channel.

October 23: the lecture is suppressed. A substituting lecture will be held October 17, at 16-18 in room A1.

Official lectures schedule: The schedule and content of the lectures is available below and with the official registro.

Goals

In this course we will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. The design and analysis will involve several models of computation— such as RAM, 2-level memory, cache-oblivious, streaming— in order to take into account the architectural features and the memory hierarchy of modern PCs and the availability of Big Data upon which those algorithms could work on. We will add to such theoretical analysis several other engineering considerations spurring from the implementation of the proposed algorithms and from experiments published in the literature.

Every lecture will follow a problem-driven approach that starts from a real software-design problem, abstracts it in a combinatorial way (suitable for an algorithmic investigation), and then introduces algorithms aimed at minimizing the use of some computational resources like time, space, communication, I/O, energy, etc. Some of these solutions will be discussed also at an experimental level, in order to introduce proper engineering and tuning tools for algorithmic development.

Week Schedule of the Lectures

Week Schedule
Day	Time Slot	Room
Monday	9:00 - 11:00	A1
Tuesday	11:00 - 13:00	A1
Wednesday	11:00 - 13:00	L1

Exam

Dates	Room	Text
30/10/2017, hr 9-11	C1	Midterm exam: text
18/12/2017, hr 9-11	C1	Finalterm exam: text (visione compiti giorni 19-21 dicembre 2017 ore 10-12. Registrazione in the dates of the following written exams.), soluzione
15/01/2018, hr 14	C1	Exam: text
15/02/2018, hr 14	C1	Exam: text
15/06/2018, hr 09-11	L1	Exam: text
16/07/2018, hr 09-11	L1	Exam: text
04/09/2018, hr 09-13	E	Exam: text

Background

I strongly suggest to refresh your knowledge about basic Algorithms and Data Structures by looking at the well-known book Introduction to Algorithms, Cormen-Leiserson-Rivest-Stein (third edition). Specifically, I suggest you to look at the chapters 2, 3, 4, 6, 7, 8, 10, 11 (no perfect hash), 12 (no randomly built), 15 (no optimal BST), 18, 22 (no strongly connected components). Also, you could look at the Video Lectures by Erik Demaine and Charles Leiserson, specifically Lectures 1-7, 9-10 and 15-17.

Books, Notes, etc.

I'll use just the old-fashioned blackboard. Most of the content of the course will be covered by some notes I wrote in these years; for some topics I'll use parts of papers/books.

Lectures

Date	Lecture	Biblio	Slides
18/09/2017	Introduction to the course. Models of computation: RAM, 2-level memory. An example of algorithm analysis: the sum of n numbers. The role of the Virtual Memory system.	Chap. 1 of the notes.	Lecture0
19/09/2017	Maximum sub-array sum in 1d and its variations.	Chap. 2 of the notes. About sect 2.5: first page and half. About the algorithmic solution of the density-based problem only hints of the basic ideas.	Lecture1
20/09/2017	Random sampling: disk model, known length, the streaming model m=1.	Chap. 3 of the notes.	Lecture2
25/09/2017	Random sampling on the streaming model, known and unknown length. Reservoir sampling. Algorithm and proofs.
26/09/2017	List Ranking: difficulties on disk, pointer-jumping technique, I/O-efficient simulation.	Chap. 4 of the notes.	Lecture3-4
	Students are warmly invited to refresh their know-how about: Divide-and-conquer technique for algorithm design and Master Theorem for solving recurrent relations; and Binary Search Trees	Lecture 2, 9 and 10 of Demaine-Leiserson's course at MIT
27/09/2017	Divide-and-conquer technique for algorithm design and Master Theorem for solving recurrent relations. List Ranking: divide-and-conquer approach with its analysis. Selecting an independent set with randomization and deterministic coin tossing.	CLRS cap. 2.3.1, 4.5 e chapter 4 of the notes.
2/10/2017	Sorting atomic items: sorting vs permuting, comments on the time and I/O bounds, binary merge-sort and its bounds. Snow Plow and compression. Multi-way mergesort.	Chap. 5 of the notes	lec.5
3/10/2017	Algorithm for Permuting. Lower bounds for sorting. The case of D>1 disks: non-optimality of multi-way MergeSort, the disk-striping technique. Quicksort: recap on best-case, worst-case.	Lower bound of Permuting is optional (sect 5.2.2).	lec.6
4/10/2017	Quicksort: Average-case with analysis. Selection of k-th ranked item in linear average time (with proof). 3-way partition for better in-memory quicksort. RandSelect.		lec.7
9/10/2017	Bounded Quicksort; Multiway Quicksort. Selection of k-1 “good pivot” via Oversampling. Proof of the average time complexity, Dual Pivot Qicksort		lec 8.
10/10/2017	Recap: BFS and DFS visits, Minimum Spanning Tree problem: Kruskal and Prim algorithms and analysis. Algorithms for external and semi-external computation of MST.	CLR cap.23, For external MST, look at Sect 11.5 of the Mehlhorn-Sander's book.	lec.9
11/10/2017	Exercises on Random Sampling, List Ranking and Sorting.
16/10/2017	Fast set intersection, various solutions: scan, sorted merge, binary search, mutual partition, binary search with exponential jumps. Fast set intersection: two-level scan, random shuffling.	Chap. 6 of the notes.
17/10/2017	Interpolation search. String sorting: comments on the difficulty of the problem on disk, lower bound. LSD-radix sort with proof of time complexity and correctness.	Chap. 7 of the notes. For “Interpolation Search” please read the corresponding section in this preliminary note.
17/10/2017	Exercises on Graphs and Sorting.
18/10/2017	MSD-radix sort and the trie data structure. Multi-key Quicksort. Ternary search tree.
19/10/2017	Exercises
24/10/2017	Exercises
	Students are warmly invited to refresh their know-how about: hash functions and their properties; hashing with chaining.	Lectures 7 of Demaine-Leiserson's course at MIT
06/11/2017	Hashing and dictionary problem: direct addressing, simple hash functions, hashing with chaining, uniform hashing and its computing/storage cost, universal hashing (definition and properties). Two examples of Universal Hash functions: one with correctness proof, the other without.	Chap. 8 of the notes. Theorem 8.3 without proof, Theo 8.5 without proof (only the statements).
07/11/2017	Perfect hash table (with proof). Minimal ordered perfect hashing: definition, properties, construction, space and time complexity.
08/11/2017	Cuckoo hashing (with proof).
09/11/2017	Bloom Filter: properties, construction, query operation (with proofs). Lower bound on BF-like data structures. Randomized data structures: Treaps (with proofs).	Notes by others. Study also Theorems and Lemmas.
13/11/2017	Randomized data structures: Skip lists (with proofs and comments on I/Os).	Study also Theorems and Lemmas. see Demaine's lecture num. 12 on skip lists.
	Please find here the full set of notes
14/11/2017	Prefix search: definition of the problem, solution based on arrays, Front-coding, two-level indexing. Locality Preserving front coding and its use with arrays.	Chap. 9 of the notes: 9.1, 9.3.
15/11/2017	Compacted tries. Analysis of space, I/Os and time of the prefix search for all data structures seen in class. More on two-level indexing of strings: Solution based on Patricia trie, with analysis of space, I/Os and time of the prefix search. Locality Preserving front coding and its use with Patricia trie.	Chap. 9 of the notes: 9.4 and 9.5.
16/11/2017	Exercises
20/11/2017	Substring search: definition, properties, reduction to prefix search. The Suffix Array. Binary searching the Suffix Array: p log n. Searching in Suffix Arrays with p + log n. Suffix Array construction via qsort and its asymptotic analysis. LCP array construction in linear time.	Chap. 10 of the notes: 10.1, 10.2.1 and 10.2.2, 10.2.3 (no “The skew algorithm”, “The Scan-based algorithm”).
21/11/2017	Suffix Trees: properties, structure, pattern search, space occupancy. Construction of Suffix Trees from Suffix Arrays and LCP arrays, and vice versa. Text mining use of suffix arrays.	Sect 10.3, 10.3.1, 10.3.2, 10.4.3
22/11/2017	The k-mismatch problem with SA+LCP or ST and RMQ data structure. Auto-completion search. RMQ and LCA queries, equivalence and reductions, their algorithmic solutions and few applications.	Sect 10.4.1
23/11/2017	Exercises
28/11/2017	Prefix-free codes, notion of entropy, optimal codes. Integer coding: the problem and some considerations. The codes Gamma and Delta, space/time performance and consideration on optimal distributions. Rice, PForDelta.	Chap. 11 of the notes
29/11/2017	Coders: (s,c)-codes, variable-byte, Interpolative. Elias-Fano. With examples.
30/11/2017	Huffman, with optimality (proof).	Chap. 12 of the notes (no sect 12.1.2).
04/12/2017	Canonical Huffman: construction, properties, decompression algorithm. Arithmetic coding: properties, algorithm and proofs; with examples.	No PPM and Range coding.
05/12/2017	No lecture
06/12/2017	Dictionary-based compressors: properties and algorithmic structure. LZ77, LZSS and gzip. LZ8 and LZW. LZ parsing with suffix trees and LCA queries.	Slides Chap 13, no par 13.4, and Chap 10 at sect 10.4.2
06/12/2017	Burrows-Wheeler Transform (forward and backward), Move-To-Front (with proof of entropy bound), Run-Length-Encoding, bzip2.	Chap. 14 of the notes. Do not study sections 14.4 and 14.5.
13/12/2017	Exercises on data compression.