# Algorithm Engineering A.A. 2013-2014

**Teacher**: Paolo Ferragina

**CFU:** 9 (second semester).

**Course ID:** 531AA.

**Language:** English.

**Degree:** Master degreein CS&Networking in collaboration with the Scuola Normale Superiore “S. Anna”.

**Lectures schedule:** Monday 11-13 (room N1), Tuesday 9-11 (room C1), Thursday 9-11 (room N1).

**Question time:** After lectures and Monday 9-11 in Ferragina's office.

**News:** about this course will be distributed via a Tweeter-channel.

**Official lectures schedule:** The schedule and content of the lectures is available below and with the official registro.

# Goals

In this course we will study, design and analyze (theoretically and experimentally) advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. These algorithmic tools will be designed and analyzed in several models of computation— such as RAM, 2-level memory, cache-oblivious, streaming— in order to take into account the architectural features and the memory hierarchy of modern PCs.

Every lecture will follow a problem-driven approach that starts from a real software-design problem, abstracts it in a combinatorial way (suitable for an algorithmic investigation), and then introduces algorithmic solutions aimed at minimizing the use of some computational resources like time, space, communication, I/O, energy, etc. Some of these solutions will be discussed at an experimental level, in order to introduce proper engineering and tuning tools for algorithmic development.

# Exam

# Background

If you wish to refresh your mind on basic Algorithms and Data Structures, I suggest you to look at the well-known book Introduction to Algorithms by Cormen-Leiserson-Rivest-Stein, third edition. There you'd look at the chapters 2, 3, 4, 6, 7, 8, 10, 11 (no perfect hash), 12 (no randomly built), 15 (no optimal BST), 18, 22 (no strongly connected components). Also, you could look at the Video Lectures by Erik Demaine and Charles Leiserson, specifically Lectures 1-7, 9-10 and 15-17.

# Books, Notes, etc.

I'll project no slides for teaching, and use *just* the *old-fashioned* blackboard.

Most of the content of the course will be covered by some notes I wrote in these years; for some topics I'll use parts of papers/books.

# Lectures

Date | Lecture | Biblio |
---|---|---|

17/02/14 | Introduction to the course. Models of computation: RAM, 2-level memory. The role of the Virtual Memory system. An example of algorithm analysis: the sum of n numbers. | Chap. 1 of the notes. |

18/02/14 | Maximum sub-array sum in 1d and 2d. | Chap. 2 of the notes. |

20/02/14 | Random sampling: disk model, streaming model and known length, streaming model and unknown length. | Chap. 3 of the notes. |

Students are warmly invited to refresh their know-how about: hash functions and their properties; hashing with chaining; binary Search Trees as dictionary data structures | Lectures 7, 9 and 10 of Demaine-Leiserson's course at MIT | |

24/02/14 | List Ranking: definition of the problem, RAM solution, difficulties on disk, pointer-jumping technique, I/O-efficient simulation. | |

25/02/14 | List Ranking: divide-and-conquer approach, randomised-algorithm for generating an independent set. | Chap. 4 of the notes. |

Students are warmly invited to refresh their know-how about: Divide-and-conquer and Master Theorem for recurrent relations. | Lecture 2 of Demaine-Leiserson's course at MIT | |

27/02/14 | Sorting atomic items: sorting vs permuting, comments on the time and I/O bounds, binary merge-sort and its bounds, Snow Plow and compression. Multi-way mergesort. | Chap 5 of the notes |

03/03/14 | Sorting and Permuting: upper bounds and lower bounds (with proofs). Sorting in D-disks: limitations and disk striping technique. | |

04/03/14 | Quicksort: recap on best-case, worst-case, average-case with analysis. Various partitioning strategies with one or two pivots. Bounded-space Quicksort and qsort. | |

06/03/14 | Quicksort: Multi-way for disk, multi-pivot selection (with proof). Selecting the k-th ranked item from an unordered sequence. | |

10/03/14 | Exercises on disk-sorting. Then lecture on String sorting: comments on the difficulty of the problem on disk, lower bound, MSD-radix sort. | Chap 6 of the notes |

11/03/14 | String sorting: LSD-radix sort with proof. Exercises on List ranking. | |

13/03/14 | String sorting: Multi-key quicksort and Ternary search trees with proof. Exercises. | |

17/03/14 | Prefix search: definition of the problem, solution based on arrays, Front-coding, two-level indexing. Interpolative search with proof of time complexity. | Chap 7 of the notes |

19/03/14 | Locality-preserving Front-coding (algorithm and theorem, no proof). Uncompacted and Compacted trie. Patricia trie (theorem, no proof). | |

20/03/14 | Exercises. | |

24/03/14 | Integer coding: gamma, delta, variable-byte, golomb/rice, PForDelta, (s,c)-codes. | Chap 9 of the notes |

25/03/14 | Interpolative. Prefix-free codes, notion of entropy, optimal codes. Huffman, with optimality (proof). | Chap 10 of the notes |

07/04/14 | Canonical Huffman: construction, properties, decompression algorithm. Bounded Huffman codes, and pathological cases. Arithmetic coding with proof of compression ratio. | No PPM in Chap 10 of the notes |

08/04/14 | Dictionary-based compressors: LZ77, LZ78, LZW, LZSS. Description of gzip. | Chap 11 of the notes |

10/04/14 | Exercises on various compressors | |

14/04/14 | Substring search: definition, properties, reduction to prefix search. The Suffix Array. Binary searching the Suffix Array: p log n. How to compute the lcp-array. Text mining use of suffix arrays. | Chap 8 of the notes. Not read pages 8-4 and 8-5 and 8-6 about searching SA using LCP. |

15/04/14 | The skew algorithm for suffix array construction. The suffix tree data structure, properties and its relation with suffix arrays. | No section “The scan-based algorithm”. |

16/04/14 | Q&A with students | |

16/04/14 | Construction of Suffix Trees from Suffix Arrays and LCP arrays, and vice versa. LZ77-parsing via suffix trees. | NO McCreight's construction |

28/04/14 | Range-minimum-query over arrays and Lowest common ancestor over trees, reductions and various solutions. Application to k-mismatch problem. | |

29/04/14 | Burrows-Wheeler Transform (forward and backward), Move-To-Front, Run-Length-Encoding, bzip2. | Chap 12 of the notes. Study also Theorem 12.5, but no sections 12.4 and 12.5. Here it is an updated proof of Theorem 12.5. |

30/04/14 | Randomised data structures: Treaps and Skip lists (with proofs). | Notes by others. Study also Theorems and Lemmas. see Demaine's lecture num. 12 on skip lists. |

05/05/14 | Hashing and dictionary problem: direct addessing, simple hash functions, hashing with chaining, uniform hashing and its computing/storage cost, universal hashing (properties, an example with proof). | Study Chap 13 of the notes. Do not study the proof of Theorem 13.3 (study the statement), and the whole theorem 13.5. |

06/05/14 | Perfect hashing: properties, construction, query operation (with proofs). | |

07/05/14 | The power of two choices and d-left hashing (only results, no proofs). Bloom Filter: properties, construction, query operation (with proofs). | No 13.7.1 and 13.7.2. |

08/05/14 | Exercises. | |

12/05/14 | Recap: graphs, BFS and DFS visits, computing shortest-path over unary-length edges. | Chap 22.1, 22.2 and 22.3 of CLR |

13/05/14 | Cuckoo hashing: definition, properties, operations with proofs, some algorithm-engineering considerations. | |

14/05/14 | Minimal ordered perfect hashing: definition, properties, construction, space and time complexity. With exercises on previous and today topics. | |

15/05/14 | Minimum Spanning Tree problem: definition, Greedy approach, Kruskal's and Prim's algorithms. Shortest Path problem: Dijkstra's algorithm. Some considerations on external and semi-external approaches. | Chap 23 of CLR and part of the chap on Shortest Path of CLR |

20/05/14 | (Fully) external MST computation. Steiner Tree problem: definition and a 2-approximate solution. Traveling Salesman Tour problem: definition and a 2-approximate solution. | sect 11.5 and 11.6 of Sanders-Mehlhorn's book |

22/05/14 | Exercises on MST, Shortest Paths, Steiner Tree problems |