# Algorithm Engineering A.A. 2015-2016

**Teacher**: Paolo Ferragina

**CFU:** 9 (second semester).

**Course ID:** 531AA.

**Language:** English.

**Degree:** Master degreein CS&Networking in collaboration with the Scuola Normale Superiore “S. Anna”.

**Lectures schedule:** 09:00-11:00 at Monday (room C), Thursday (room L1), Tuesday (room N1).

**Question time:** After lectures and by appointment.

**News:** about this course will be distributed via a Tweeter-channel.

**Official lectures schedule:** The schedule and content of the lectures is available below and with the official registro.

**Student meetings June-July - Ferragina's office**: June 3 (hr 9-11), June 13 (hr 9-11) and June 21 (hr 9-11). In July it will be by appointment (send email to me).

# Goals

In this course we will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. The design and analysis will involve several models of computation— such as RAM, 2-level memory, cache-oblivious, streaming— in order to take into account the architectural features and the memory hierarchy of modern PCs and the availability of Big Data upon which those algorithms could work on. We will add to such theoretical analysis several other engineering considerations spurring from the implementation of the proposed algorithms and from experiments published in the literature.

Every lecture will follow a problem-driven approach that starts from a real software-design problem, abstracts it in a combinatorial way (suitable for an algorithmic investigation), and then introduces algorithms aimed at minimizing the use of some computational resources like time, space, communication, I/O, energy, etc. Some of these solutions will be discussed also at an experimental level, in order to introduce proper engineering and tuning tools for algorithmic development.

# Exam

# Background

I strongly suggest to refresh your knowledge about basic Algorithms and Data Structures by looking at the well-known book Introduction to Algorithms, Cormen-Leiserson-Rivest-Stein (third edition). Specifically, I suggest you to look at the chapters 2, 3, 4, 6, 7, 8, 10, 11 (no perfect hash), 12 (no randomly built), 15 (no optimal BST), 18, 22 (no strongly connected components). Also, you could look at the Video Lectures by Erik Demaine and Charles Leiserson, specifically Lectures 1-7, 9-10 and 15-17.

# Books, Notes, etc.

I'll use *just* the *old-fashioned* blackboard.

Most of the content of the course will be covered by some notes I wrote in these years; for some topics I'll use parts of papers/books.

# Lectures

Date | Lecture | Biblio |
---|---|---|

22/02/16 | Introduction to the course. Models of computation: RAM, 2-level memory. An example of algorithm analysis: the sum of n numbers. The role of the Virtual Memory system. | Chap. 1 of the notes. |

23/02/16 | Maximum sub-array sum in 1d and its variations. | Chap. 2 of the notes, about sect 2.5 read only the first page and half up to the algorithmic solution of the density-based problem. |

25/02/16 | Random sampling: disk model, streaming model and known length, streaming model and unknown length. | Chap. 3 of the notes. |

29/02/16 | List Ranking: definition of the problem, RAM solution, difficulties on disk, pointer-jumping technique, I/O-efficient simulation. | Chap. 4 of the notes. |

Students are warmly invited to refresh their know-how about: Divide-and-conquer technique for algorithm design and Master Theorem for solving recurrent relations; and Binary Search Trees | Lecture 2, 9 and 10 of Demaine-Leiserson's course at MIT | |

01/03/16 | List Ranking: divide-and-conquer approach with its analysis, Master Theorem (recap), randomised-algorithm for generating an independent set. Sorting atomic items: sorting vs permuting, comments on the time and I/O bounds, binary merge-sort and its bounds. | |

03/03/16 | Snow Plow and compression. Multi-way mergesort. | Chap. 5 of the notes |

07/03/16 | Lower bounds for sorting. The case of D>1 disks: non-optimality of multi-way mergesort (with comments and analysis), the disk-striping technique. Quicksort: recap on best-case, worst-case. Various partitioning strategies: two-way or three-way. | Lower bound of Permuting is optional (sect 5.2.2). |

08/03/16 | Quicksort: Average-case with analysis. Selection of k-th ranked item in linear average time (with proof). Bounded-space quicksort. 3-way partition for better in-memory quicksort. | |

10/03/16 | Multi-way Quicksort for disk, multi-pivot selection (with proof). Fast set intersection, various solutions: scan, sorted merge, binary search, two-level scan. | Preliminary draft of the chapter on Intersection (vers 3). |

14/03/16 | Fast set intersection: binary search with exponential jumps, random shuffling. String sorting: comments on the difficulty of the problem on disk, lower bound. MSD-radix sort and the trie data structure. LSD-radix sort with proof of time complexity and correctness. Various exercises. | Chap. 6 of the notes. |

15/03/16 | Multi-key Quicksort. Ternary search tree. Interpolation search. | |

17/03/16 | Randomised data structures: Treaps and Skip lists (with proofs and comments on string items). | Notes by others. Study also Theorems and Lemmas. see Demaine's lecture num. 12 on skip lists. |

Students are warmly invited to refresh their know-how about: hash functions and their properties; hashing with chaining. | Lectures 7 of Demaine-Leiserson's course at MIT | |

21/03/16 | Hashing and dictionary problem: direct addessing, simple hash functions, hashing with chaining, uniform hashing and its computing/storage cost, universal hashing (definition and properties). Two examples of Universal Hash functions: one with correctness proof, the other without. Perfect hash table (with proof). | Chap. 13 of the notes. Theorem 13.3 without proof, Theo 13.5 without proof (only the statements). |

22/03/16 | Minimal ordered perfect hashing: definition, properties, construction, space and time complexity. With exercises on previous and today topics. Cuckoo hashing (with proof). Bloom Filter: properties, construction, query operation (with proofs). | Do not read 13.7.1-13.7.3. |

24/03/16 | The lecture of today is canceled | |

06/04/16 | Exercises on the first part of the course | |

07/04/16 | Exercises on the first part of the course | |

11/04/16 | Prefix-free codes, notion of entropy, optimal codes. Integer coding: the problem and some considerations. The codes Gamma and Delta, space/time performance and consideration on optimal distributions. | Chap 9 of the notes (new version) |

12/04/16 | Coders: Rice, PForDelta, (s,c)-codes, variable-byte, Interpolative, Elias-Fano. | |

13/04/16 | Exercises on Integer coders. | |

14/04/16 | Huffman, with optimality (proof). Canonical Huffman: construction, properties, decompression algorithm. The case of long codewords and rescaling. | Chap. 10 of the notes. |

18/04/16 | Arithmetic coding: properties, algorithm and proofs; with examples. | No PPM and Range coding in Chap 10 of the notes. |

19/04/16 | Dictionary-based compressors: properties and algorithmic structure. LZ77, LZSS and gzip. LZ8 and LZW | Chap 11, no par 11.4. |

20/04/16 | Burrows-Wheeler Transform (forward and backward), Move-To-Front, Run-Length-Encoding, bzip2. | Chap. 12 of the notes. Do not study Theorem 12.5 and sections 12.4 and 12.5. |

21/04/16 | Exercises on data compression. | Some additional slides on data compression |

25/04/16 | holiday | |

26/04/16 | Prefix search: definition of the problem, solution based on arrays, Front-coding, two-level indexing. Compacted tries. Analysis of space, I/Os and time of the prefix search for all data structures seen in class. | Chap. 7 of the notes. Do not study sections 7.2, 7.3 and 7.6 |

27/04/16 | More on two-level indexing of strings: Solution based on Patricia trie, with analysis of space, I/Os and time of the prefix search. Substring search: definition, properties, reduction to prefix search. The Suffix Array. Binary searching the Suffix Array: p log n (no pages 8.4-8.5). Text mining use of suffix arrays. Suffix Trees: properties, structure, pattern search, space occupancy. Construction of Suffix Trees from Suffix Arrays and LCP arrays, and vice versa. | Chap. 8 of the notes. Do not study sections “The skew algorithm”, “The Scan-based algorithm”, and sect 8.3.3 |

28/04/16 | Searching in Suffix Arrays with p + log n. Suffix Array construction via qsort and its asymptotic analysis. LCP array construction in linear time. | |

02/05/16 | The k-mismatch problem. RMQ and LCA queries, equivalence and reductions, their algorithmic solutions. | Few slides as summary, but you MUST study the notes. |

03/05/16 | LZ parsing with suffix trees and LCA queries. Exercises on tries, suffix trees and suffix arrays. | |

05/05/16 | Recap: graphs and their representations, BFS and DFS visits, computing shortest-path over unary-length edges. | Chap 23.1-23.3 of CLR. Do not study Lemma 23.5. |

09/05/16 | Topological sort. Minimum Spanning Tree problem: definition, Greedy approach. | Other chap on MST of CLR |

10/05/16 | Kruskal's and Prim's algorithms, proof of correctness, semi-external version, analysis of time/IO complexity. A running example. | |

12/05/16 | Shortest Path problem: Dijkstra's algorithm. Algorithms for external and semi-external computation of MST. | For Shortest Path look at this note. For external MST, look at Sect 11.5 of the Mehlhorn-Sander's book. |

19/05/16 | Use of MST for clustering and for the bottleneck shortest path problem (no proof). The case of negative edges in Shortest Path problem: Bellman-Ford's algorithm. How to detect a negative cycle in a directed graph. | Few slides on BF-algorithm |

23/05/16 | Exercises on graphs |