# Algorithm Engineering A.A. 2014-2015

**Teacher**: Paolo Ferragina

**CFU:** 9 (second semester).

**Course ID:** 531AA.

**Language:** English.

**Degree:** Master degreein CS&Networking in collaboration with the Scuola Normale Superiore “S. Anna”.

**Lectures schedule:** 11:00-13:00 at Monday (N1), Tuesday (C), Wednesday (N1).

**Question time:** After lectures and by appointment.

**News:** about this course will be distributed via a Tweeter-channel.

**Official lectures schedule:** The schedule and content of the lectures is available below and with the official registro.

# Goals

In this course we will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. These algorithmic tools will be designed and analyzed in several models of computation— such as RAM, 2-level memory, cache-oblivious, streaming— in order to take into account the architectural features and the memory hierarchy of modern PCs. Moreover, we will add to such theoretical analysis several other engineering considerations spurring from the implementation of the proposed algorithms and from experiments published in the literature.

Every lecture will follow a problem-driven approach that starts from a real software-design problem, abstracts it in a combinatorial way (suitable for an algorithmic investigation), and then introduces algorithmic solutions aimed at minimizing the use of some computational resources like time, space, communication, I/O, energy, etc. Some of these solutions will be discussed also at an experimental level, in order to introduce proper engineering and tuning tools for algorithmic development.

# Exam

# Background

If you wish to refresh your mind on basic Algorithms and Data Structures, I suggest you to look at the well-known book Introduction to Algorithms by Cormen-Leiserson-Rivest-Stein, third edition. There you'd look at the chapters 2, 3, 4, 6, 7, 8, 10, 11 (no perfect hash), 12 (no randomly built), 15 (no optimal BST), 18, 22 (no strongly connected components). Also, you could look at the Video Lectures by Erik Demaine and Charles Leiserson, specifically Lectures 1-7, 9-10 and 15-17.

# Books, Notes, etc.

I'll use *just* the *old-fashioned* blackboard.

Most of the content of the course will be covered by some notes I wrote in these years; for some topics I'll use parts of papers/books.

# Lectures

Date | Lecture | Biblio |
---|---|---|

23/02/15 | Introduction to the course. Models of computation: RAM, 2-level memory. The role of the Virtual Memory system. An example of algorithm analysis: the sum of n numbers. | Chap. 1 of the notes. |

24/02/15 | Maximum sub-array sum in 1d and its variations. | Chap. 2 of the notes, about sect 2.5 read only the first page and half up to the algorithmic solution of the density-based problem. |

25/02/15 | Random sampling: disk model, streaming model and known length, streaming model and unknown length. | Chap. 3 of the notes. |

Students are warmly invited to refresh their know-how about: Divide-and-conquer technique for algorithm design and Master Theorem for solving recurrent relations; and Binary Search Trees | Lecture 2, 9 and 10 of Demaine-Leiserson's course at MIT | |

02/03/15 | List Ranking: definition of the problem, RAM solution, difficulties on disk, pointer-jumping technique, I/O-efficient simulation. | |

03/03/15 | List Ranking: divide-and-conquer approach with its analysis, Master Theorem (recap), randomised-algorithm for generating an independent set. Sorting atomic items: sorting vs permuting, comments on the time and I/O bounds, binary merge-sort and its bounds. | Chap. 4 of the notes. |

04/03/15 | Snow Plow and compression. Multi-way mergesort. | Chap. 5 of the notes |

09/03/15 | Lower bounds for sorting and permuting on disk. The case of D>1 disks: non-optimality of multi-way mergesort (with comments and analysis), the disk-striping technique. | |

12/03/15 | Quicksort: recap on best-case, worst-case, average-case with analysis. Various partitioning strategies: two-way or three-way. Selection of k-tk ranked item in linear average time (with proof). | |

16/03/2015 | Quicksort: Multi-way for disk, multi-pivot selection (with proof). Bounded-space quicksort. 3-way partition for better in-memory quicksort. | |

17/03/2015 | String sorting: comments on the difficulty of the problem on disk, lower bound. LSD-radix sort with proof of time complexity and correctness. | Chap. 6 of the notes (updated 20/3/15). |

18/03/2015 | MSD-radix sort, trie data structure, Multi-key Quicksort. | |

19/03/2015 | Ternary search tree. Interpolation search. | |

23/03/2015 | Prefix search: definition of the problem, solution based on arrays, Front-coding, two-level indexing. | Chap. 7 of the notes |

24/03/2015 | More on two-level indexing of strings: review of in-memory level based on array of string pointers and uncompacted tries. Solution based on compacted trie and Patricia trie, with analysis of space, I/Os and time of the prefix search. | This year we don't study locality preserving FC. |

25/03/2015 | Randomised data structures: Treaps and Skip lists (with proofs and comments on string items). | Notes by others. Study also Theorems and Lemmas. see Demaine's lecture num. 12 on skip lists. |

Students are warmly invited to refresh their know-how about: hash functions and their properties; hashing with chaining. | Lectures 7 of Demaine-Leiserson's course at MIT | |

26/03/2015 | Hashing and dictionary problem: direct addessing, simple hash functions, hashing with chaining, uniform hashing and its computing/storage cost, universal hashing (definition and properties). | Chap. 13 of the notes. Do not study the proof of Theorem 13.3 (study the statement), and the whole theorem 13.4. |

30/03/2015 | Two examples of Universal Hash functions: one with correctness proof, the other without. Minimal ordered perfect hashing: definition, properties, construction, space and time complexity. With exercises on previous and today topics. | Theo 13.5 without proof, but with statement. |

01/04/2015 | As a consequence of the email of Prof. Danelutto of yesterday, students are invited to read the attached exercises, with my solutions, and contact me via skype from 11 to 13 [nick: paolo.ferragina]. | Exercises and their solutions. |

15/04/2015 | Perfect hash table (with proof) and Cuckoo hashing (with proof). | Note that Chap. 13 has been changed. |

16/04/2015 | Bloom Filter: properties, construction, query operation (with proofs). Spectral Bloom Filter. Lower bound and comparison with BF. | No Compressed BF. |

17/04/2015 | Exercises | |

20/04/2015 | Integer coding: the problem and some considerations. Gamma, delta, golomb/rice, PForDelta. | Chap. 9 of the notes |

21/04/2015 | More on integer encoders: (s,c)-codes, variable-byte, Interpolative, Elias-Fano. | |

22/04/2015 | Prefix-free codes, notion of entropy, optimal codes. Huffman, with optimality (proof). Canonical Huffman: construction, properties, decompression algorithm. | Chap. 10 of the notes. |

23/04/2015 | Exercises | |

27/04/2015 | Arithmetic coding: properties, algorithm and proofs; with examples. | No PPM and Range coding in Chap 10 of the notes |

28/04/2015 | Dictionary-based compressors: properties and algorithmic structure. LZ77, LZSS and gzip. LZ8 and LZW | Chap 11, no par 11.4. |

29/04/2015 | Burrows-Wheeler Transform (forward and backward), Move-To-Front, Run-Length-Encoding, bzip2. | Chap. 12 of the notes. Study also Theorem 12.5, but no sections 12.4 and 12.5. |

29/04/2015 | Exercises | |

06/05/2015 | Substring search: definition, properties, reduction to prefix search. The Suffix Array. Binary searching the Suffix Array: p log n (no pages 8.4-8.5). How to compute the lcp-array. Suffix array construction (theo 8.1, but no Skew algorithm, and no Scan-based algorithm). Text mining use of suffix arrays. | Chap. 8 of the notes. |

07/05/2015 | Suffix Trees: properties, structure, pattern search, space occupancy. Construction of Suffix Trees from Suffix Arrays and LCP arrays, and vice versa. LZ77-parsing via suffix trees. | NO McCreight's construction |

08/05/2015 | Exercise | |

11/05/2015 | Range-minimum-query over arrays and Lowest common ancestor over trees, reductions and various solutions. Application to k-mismatch problem. | Read 8.4 of the notes and these slides |

12/05/2015 | Recap: graphs and their representations, BFS and DFS visits, computing shortest-path over unary-length edges, topological sort. | Chap 23.1-23.4 of CLR |

18/05/2015 | Minimum Spanning Tree problem: definition, Greedy approach, Kruskal's and Prim's algorithms. | Chap 23 of CLR |

19/05/2015 | Algorithms for external and semi-external computation of MST. Also use of MST for clustering and for the bottleneck shortest path problem (no proof). | A part of the Mehlhorn-Sander's book, look at Sect 11.5. |

25/05/2015 | Shortest Path problem: Dijkstra's algorithm. Steiner Tree problem: definition and a 2-approximate solution. Traveling Salesman Tour problem: definition and a 2-approximate solution. | Part of the chap on Shortest Path of CLR. Also sect 11.5 and 11.6 of Sanders-Mehlhorn's book. |

26/05/2015 | Exercises on Graph Problems |