### Indice

# Algorithm Engineering A.A. 2016-2017

**Teachers**: Paolo Ferragina and Linda Pagli

**CFU:** 9 (second semester).

**Course ID:** 531AA.

**Language:** English.

**Degree:** Master degree in CS&Networking in collaboration with the Scuola Normale Superiore “S. Anna”.

**Question time:** After lectures and by appointment.

**News:** about this course will be distributed via a Tweeter-channel.

**Official lectures schedule:** The schedule and content of the lectures is available below and with the official registro.

# Goals

In this course we will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. The design and analysis will involve several models of computation— such as RAM, 2-level memory, cache-oblivious, streaming— in order to take into account the architectural features and the memory hierarchy of modern PCs and the availability of Big Data upon which those algorithms could work on. We will add to such theoretical analysis several other engineering considerations spurring from the implementation of the proposed algorithms and from experiments published in the literature.

Every lecture will follow a problem-driven approach that starts from a real software-design problem, abstracts it in a combinatorial way (suitable for an algorithmic investigation), and then introduces algorithms aimed at minimizing the use of some computational resources like time, space, communication, I/O, energy, etc. Some of these solutions will be discussed also at an experimental level, in order to introduce proper engineering and tuning tools for algorithmic development.

## Week Schedule of the Lectures

Week Schedule | ||
---|---|---|

Day | Time Slot | Room |

Tuesday | 9:00 - 11:00 | A1 |

Wednesday | 9:00 - 11:00 | C1 |

Friday | 11:00 - 13:00 | B |

# Exam

# Background

I strongly suggest to refresh your knowledge about basic Algorithms and Data Structures by looking at the well-known book Introduction to Algorithms, Cormen-Leiserson-Rivest-Stein (third edition). Specifically, I suggest you to look at the chapters 2, 3, 4, 6, 7, 8, 10, 11 (no perfect hash), 12 (no randomly built), 15 (no optimal BST), 18, 22 (no strongly connected components). Also, you could look at the Video Lectures by Erik Demaine and Charles Leiserson, specifically Lectures 1-7, 9-10 and 15-17.

# Books, Notes, etc.

I'll use *just* the *old-fashioned* blackboard. Most of the content of the course will be covered by some notes I wrote in these years; for some topics I'll use parts of papers/books.

# Lectures (preliminary schedule and links)

Date | Lecture | Biblio | Slides |
---|---|---|---|

21/02/2017 | Introduction to the course. Models of computation: RAM, 2-level memory. An example of algorithm analysis: the sum of n numbers. The role of the Virtual Memory system. | Chap. 1 of the notes. | Lecture0 |

22/02/2017 | Maximum sub-array sum in 1d and its variations. | Chap. 2 of the notes, about sect 2.5: first page and half. About the algorithmic solution of the density-based problem only hints of the basic ideas. | Lecture1 |

24/02/2017 | Random sampling: disk model, known length, definition of the streaming model. | Chap. 3 of the notes. | Lecture2 |

28/02/2017 | Random sampling on the streaming model, known and unknown length. Reservoir sampling. List Ranking: definition of the problem, RAM solution. | ||

01/03/2017 | List Ranking: difficulties on disk, pointer-jumping technique, I/O-efficient simulation. Divide-and-conquer technique for algorithm design and Master Theorem for solving recurrent relations. | Chap. 4 of the notes. CLRS cap. 2.3.1, 4.5 | Lecture3-4 |

Students are warmly invited to refresh their know-how about: Divide-and-conquer technique for algorithm design and Master Theorem for solving recurrent relations; and Binary Search Trees | Lecture 2, 9 and 10 of Demaine-Leiserson's course at MIT | ||

03/03/2017 | List Ranking: divide-and-conquer approach with its analysis. Selecting an independent set with randomization and deterministic coin tossing. Sorting atomic items: sorting vs permuting, comments on the time and I/O bounds, binary merge-sort and its bounds. | lec.5 | |

07/03/2017 | Snow Plow and compression. Multi-way mergesort. Lower bounds for sorting. | Chap. 5 of the notes | lec.6 |

08/03/2017 | The case of D>1 disks: non-optimality of multi-way MergeSort, the disk-striping technique. Quicksort: recap on best-case, worst-case. Average-case with analysis. Selection of k-th ranked item in linear average time (with proof). 3-way partition for better in-memory quicksort. | Lower bound of Permuting is optional (sect 5.2.2). | lec.7 |

10/03/2017 | Quicksort: Bounded-space quicksort. Multi-way Quicksort for disk, multi-pivot selection, oversampling (with proof). | lec.8 | |

14/03/2017 | Recap: graphs and their representations, BFS and DFS visits, computing shortest-path over unary-length edges. | Chap 22.1-22.3 of CLR. Do not study Lemma 22.5. | lec.9 |

15/03/2017 | Recap: Topological sort. Minimum Spanning Tree problem: definition, Greedy approach. Kruskal and Prim algorithms and analysis. | Chap 22.4 and 23 of CLR. | lec.10 |

17/03/2017 | Shortest Path problem: Dijkstra's algorithm. Algorithms for external and semi-external computation of MST. | For Shortest Path look at this note. For external MST, look at Sect 11.5 of the Mehlhorn-Sander's book. | lec.11 |

21/03/2017 | Fast set intersection, various solutions: scan, sorted merge, binary search, mutual partition, binary search with exponential jumps. | Chap. 8 of the notes. | |

22/03/2017 | Fast set intersection: two-level scan, random shuffling. String sorting: comments on the difficulty of the problem on disk, lower bound. LSD-radix sort with proof of time complexity and correctness. | Chap. 6 of the notes. | |

24/03/2017 | MSD-radix sort and the trie data structure. Multi-key Quicksort. Ternary search tree. Interpolation search. | ||

28/03/2017 | Randomized data structures: Treaps and Skip lists (with proofs and comments on string items). | Notes by others. Study also Theorems and Lemmas. see Demaine's lecture num. 12 on skip lists. | |

Students are warmly invited to refresh their know-how about: hash functions and their properties; hashing with chaining. | Lectures 7 of Demaine-Leiserson's course at MIT | ||

29/03/2017 | Hashing and dictionary problem: direct addessing, simple hash functions, hashing with chaining, uniform hashing and its computing/storage cost, universal hashing (definition and properties). Two examples of Universal Hash functions: one with correctness proof, the other without. | Chap. 7 of the notes. Theorem 7.3 without proof, Theo 7.5 without proof (only the statements). | |

31/03/2017 | The lecture of today has been canceled because of a strike by personnel of the Polo Fibonacci (rooms closed) | ||

04/04/2017 | Perfect hash table (with proof). Minimal ordered perfect hashing: definition, properties, construction, space and time complexity. | ||

05/04/2017 | Cuckoo hashing (with proof). Bloom Filter: properties, construction, query operation (with proofs). | Do not read subsections 7.7.1-7.7.3. | |

21/04/2017 | Auto-completion search. RMQ and LCA queries, equivalence and reductions, their algorithmic solutions and few applications. | Slides (no solution with linear space - 4 russian trick: page 10-29 starting from “The second data structure is built to efficiently answer RMQ-queries in which i and j are in the same block Dk.”) | |

To synchro the notes, here you can find the latest versions of the chapters of this first part of the course. | Updated notes | ||

28/04/2017 | Prefix search: definition of the problem, solution based on arrays, Front-coding, two-level indexing. Compacted tries. Analysis of space, I/Os and time of the prefix search for all data structures seen in class. More on two-level indexing of strings: Solution based on Patricia trie, with analysis of space, I/Os and time of the prefix search. | Chap. 9 of the notes: 9.1, 9.4 and 9.5. | |

02/05/2017 | Substring search: definition, properties, reduction to prefix search. The Suffix Array. Binary searching the Suffix Array: p log n. Searching in Suffix Arrays with p + log n. Suffix Array construction via qsort and its asymptotic analysis. | Chap. 10 of the notes: 10.1, 10.2.1 and 10.2.2, 10.2.3 (no “The skew algorithm”, “The Scan-based algorithm”). | |

03/05/2017 | LCP array construction in linear time. Suffix Trees: properties, structure, pattern search, space occupancy. Construction of Suffix Trees from Suffix Arrays and LCP arrays, and vice versa. | Sect 10.3, 10.3.1, 10.3.2 | |

05/05/2017 | Text mining use of suffix arrays. The k-mismatch problem with SA+LCP or ST and RMQ data structure. | Sect 10.4.1 and 10.4.3 | |

09/05/2017 | Esercitazione | ||

10/05/2017 | Esercitazione | ||

11/05/2017 | Esercitazione | ||

12/05/2017 | Prefix-free codes, notion of entropy, optimal codes. Integer coding: the problem and some considerations. The codes Gamma and Delta, space/time performance and consideration on optimal distributions. Rice, PForDelta. | Chap. 11 of the notes | |

16/05/2017 | Coders: (s,c)-codes, variable-byte, Interpolative. With examples. | ||

17/05/2017 | Example on Interpolative and Elias-Fano. | ||

18/05/2017 | Huffman, with optimality (proof). Canonical Huffman: construction, properties, decompression algorithm. | Chap. 12 of the notes (no sect 12.1.2). | |

19/05/2017 | Arithmetic coding: properties, algorithm and proofs; with examples. | No PPM and Range coding. | |

22/05/2017 | Dictionary-based compressors: properties and algorithmic structure. LZ77, LZSS and gzip. LZ8 and LZW. LZ parsing with suffix trees and LCA queries. | Chap 13, no par 13.4, and Chap 10 at sect 10.4.2 | |

23/05/2017 | Burrows-Wheeler Transform (forward and backward), Move-To-Front, Run-Length-Encoding, bzip2. | Chap. 14 of the notes. Do not study Theorem 14.5 and sections 14.4 and 14.5. | |

24/05/2017 | Exercises on data compression. | ||

29/05/2017 | Recap on the arguments of the course. |