# Algorithm Engineering A.A. 2012-2013

**Teacher**: Paolo Ferragina

**CFU:** 9 (second semester).

**Course ID:** 531AA.

**Language:** English.

**Degree:** Master degreein CS&Networking in collaboration with the Scuola Normale Superiore “S. Anna”.

**Lectures schedule:** Tuesday 9-11 (C1), Wednesday 9-11 (C1), Thursday 9-11 (F).

**Question time:** After lectures.

**News:** about this course will be distributed via a Tweeter-channel.

**Official lectures schedule:** The schedule and content of the lectures is available with the official registro.

# Goals

In this course we will study, design and analyze (theoretically and experimentally) advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. These algorithmic tools will be designed and analyzed in several models of computation— such as RAM, 2-level memory, cache-oblivious, streaming— in order to take into account the architectural features and the memory hierarchy of modern PCs.

Every lecture will follow a problem-driven approach that starts from a real software-design problem, abstracts it in a combinatorial way (suitable for an algorithmic investigation), and then introduces algorithmic solutions aimed at minimizing the use of some computational resources like time, space, communication, I/O, energy, etc. Some of these solutions will be discussed at an experimental level, in order to introduce proper engineering and tuning tools for algorithmic development.

# Exam

# Background

If you wish to refresh your mind on Algorithms and Data Structures, I suggest you to follow the Video Lectures by Erik Demaine and Charles Leiserson, specifically Lectures 1-5, 7 and 10. There it is missing the part on basic graph problems (representation, DFS, BFS, topological sort) which you may browse in any book, such as Introduction to Algorithms by Cormen-Leiserson-Rivest-Stein, third edition.

# Books, Notes, etc.

I'll project no slides for teaching, and use *just* the *old-fashioned* blackboard.

Since the course covers many different algorithmic themes and will deal with advanced results and issues which are not yet part of books, I'll distribute notes and copies of papers/books which will constitute the reading material of the course.

# Lectures

Date | Lecture | Biblio |
---|---|---|

Refreshing: Hashing: hash functions and their properties. Hashing with chaining. Binary Search Trees: recursive visit, pre/in/post visits. Red-Black Trees. | Lectures 7 and 10 of Demaine-Leiserson's course at MIT | |

Refreshing: Graphs: how to represent it in memory. DFS and BFS visits, properties and applications. | part of Lecture 17 of Demaine-Leiserson's course at MIT. Look at the book CLR. | |

19/02/2013 | Introduction to the course. Models of computation: RAM, 2-level memory. The role of the Virtual Memory system. An example of algorithm analysis: the sum of n numbers. | Chap. 1-2 of the notes. Study also chapter 2. |

20/02/2013 | List-ranking problem: RAM solution, pointer-jumping, the simulation of a parallel algorithm on disk, an I/O-efficient randomized solution based on divide-and-conquer. | Chap. 4 of the notes (draft needs corrections, suggestions or typos are appreciated). |

21/02/2013 | Exercise on the simulation of a parallel algorithm on disk. Random sampling: the case of known sequence length. | Chap. 3 of the notes. |

Refreshing: Analysis of algorithms, (asymptotic) time/space complexity, Big-Oh notation, Divide-and-Conquer, Binary search, Recurrences, and few other examples. | Lectures 1, 2 and 3 of Demaine-Leiserson's course at MIT | |

Refreshing: Mergesort and Quicksort. Heap: properties and operations. | Lectures 4 of Demaine-Leiserson's course at MIT | |

27/02/2013 | Q&A on the MIT-lectures. Random sampling in the streaming model. | |

28/02/2013 | Sorting atomic items: sorting vs permuting, comments on the bounds, binary merge-sort and its bounds, Snow Plow and compression. | Chap 5 of the notes |

05/03/2013 | Multi-way mergesort. Lower-bounds for permuting and sorting, for one and more disks. Disk striping. | |

06/03/2013 | Quicksort: recap on best-case, worst-case, average-case with analysis. Various partitioning strategies with one or two pivots. Selection of the k-th ranked item. | |

07/03/2013 | Proof for the time complexity of the selection of the k-th ranked item. Bounded-space Quicksort. Multi-way quicksort with proof. | |

12/03/2013 | String sorting: definition of the problem, lower bound, MSD-radix sort, LSD-radix sort, trie-based sort, Multi-key quicksort. | Chap. 6 of the notes |

13/03/2013 | String searching by prefix. Array-based solution, front-coding, two-level indexing, uncompacted tries, compacted tries. And time, I/O, space analisys for all of them. | |

14/03/2013 | Locality-preserving Front Coding: algorithm, time and I/O-bound but not proof of space bound. Interpolation search. Patricia trie and its time, space, I/O-complexities. | Chap. 7 of the notes. No proof of Th 7.5, no section 7.6 |

19/03/2013 | Exercises on FC, LPFC, Tries, etc. Recap about binary search trees and their visits. | |

20/03/2013 | Exercises on recursive tree visits. Substring search: definition, properties, reduction to prefix search. The Suffix Array. | |

21/03/2013 | Binary searching the Suffix Array: p log n, p + log n with the use of lcp-array. The Skew algorithm. | |

26/03/2013 | The suffix tree: properties, substring search, implementation of LZ77 | |

27/03/2013 | Few reductions between LCA in trees and LCP in strings | |

28/03/2013 | Approximate pattern matching via RMQ/LCA, Text mining over suffix arrays. | Chap. 8 of the notes. Do not read subsect 8.2.2 and 8.3.3 and the part referring to the “scan-based algorithm” in pages 8-13 thorugh 8-16. |

09/04/2013 | Integer coding: gamma, delta, variable-byte, golomb/rice, PForDelta, Interpolative, (s,c)-codes. | Chap. 9 of the notes. |

10/04/2013 | Prefix-free codes, notion of entropy, optimal codes. Huffman. | |

16/04/2013 | Huffman's optimality (proof). Canical Huffman with examples. | |

17/04/2013 | Auto-evaluation: exercises on Huffman and integer encoding taken from exams: ex 1, ex 4, ex 2, ex 3. Also, encode with gamma/delta/rice with k=2/PForDelta b=2 the sequence S = 1, 2, 4, 6, 1, 7, 8; and with Interpolative the sequence S=1,3,4,10,11,13,21. | |

18/04/2013 | Arithmetic coding: compressor, decompressor, efficacy. | Chap. 10 of the notes. Do not study Range coding and PPM. |

23/04/2013 | Lempel Ziv 1997, 1998, LZW, gzip. RLE, MTF, Burrows-Wheeler Transform and bzip. | Chap. 11 of the notes. Do not study the section on “optimality”. BWT on the slides. |

24/04/2013 | Exercises on dictionary-based compressors | |

30/04/2013 | Exercises on arithmetic coding and BWT. | |

02/05/2013 | Video-lecture on skip-lists and solve the following exercise: Insert the sequence of keys (5, 2, 10, 7, 8) by assuming that the number of coin tosses for each of them is, respectively, equal to (2, 3, 1, 2, 3); then show the path followed by a search for the key 7 (solution). | see Demaine's lecture num. 12 and these notes. |

07/05/2013 | Treaps with examples. | |

08/05/2013 | Dictionary problem: direct-address tables, hashing with chaining, universal hashing with properties and proof. | Chapter of CLR |

09/05/2013 | Perfect hashing (with proofs). d-left hashing. Cuckoo hashing (with proofs). | Paper on cuckoo hashing. and slides. |

14/05/2013 | Minimal ordered perfect hashing. Bloom Filter and Spectral Bloom Filter (with proofs). | Part of a book, slides paper. |

15/05/2013 | Minimum Spanning Tree: properties, greedy approach, Kruskal algorithm in RAM and disk. | clr and mehlhorn |

16/05/2013 | Minimum Spanning Tree: Prim, semi-external algorithm, Sybein's algorithm with proof. MST's applications: clustering and Hamiltonian cycle. | |

21/05/2013 | Self-evaluation over the following exercises. | |

22/05/2013 | Steiner Tree problem: an approximation algorithm. Matching in dense/generic graphs. | part of a book |

23/05/2013 | Q&A time. |