### Indice

# Algorithm Engineering A.A. 2021-2022

**Teacher**: Paolo Ferragina

**CFU:** 9 (first semester).

**Course ID:** 531AA.

**Language:** English.

**Question time:** Monday: 15-17, or by appointment (given COVID-19 situation, it will occur in video-conferencing within the virtual room of the course)

**News:** about this course will be distributed via a Telegram channel.

**Official lectures schedule:** The schedule and content of the lectures is available below and with the official registro.

# Goals

In this course we will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. The design and analysis will involve several models of computation— such as RAM, 2-level memory, cache-oblivious, streaming— in order to take into account the architectural features and the memory hierarchy of modern PCs and the availability of Big Data upon which those algorithms could work on. We will add to such theoretical analysis several other engineering considerations spurring from the implementation of the proposed algorithms and from experiments published in the literature.

Every lecture will follow a problem-driven approach that starts from a real software-design problem, abstracts it in a combinatorial way (suitable for an algorithmic investigation), and then introduces algorithms aimed at minimizing the use of some computational resources like time, space, communication, I/O, energy, etc. Some of these solutions will be discussed also at an experimental level, in order to introduce proper engineering and tuning tools for algorithmic development.

## Week Schedule of the Lectures

Week Schedule | ||
---|---|---|

Day | Time Slot | Room |

Monday | 9:00 - 11:00 | Room Fib A1, and Virtual Room of the course |

Tuesday | 11:00 - 13:00 | Room Fib L1, and Virtual Room of the course |

Wednesday | 9:00 - 11:00 | Room Fib A1, and Virtual Room of the course |

# Exam

The exam will consist of a written test plus an oral discussion.

As last year, I'm planning to have one midterm exam in November and one in December. The two midterms (if >= 18 as vote) will be combined with an oral exam to be given in January, which can **increase or decrease** the final vote (by few points, given that they will somewhat “average” with the vote reported in the written exam).

Dates | Room | Text | Notes |
---|---|---|---|

08/11/2021, start at 09:00 | room A1 and virtual | text, results, solution (Rice-code should use rest from 0) | The midterm exam will consist of a set of exercises, and will last for 45mins. The part of the program for the exercises will be detailed in the list of lectures below. |

15/12/2021, start at 09:00 | room C and virtual | text, results, solution | The FinalTerm exam will have the same structure as the other one, but can participate only the students who got >=18 rank in the first MidTerm. Students have to register at the following link by December 7th, 2021. |

14/01/2022, start at 14:00 | room C1 | text, results, solution | Oral will start at 16:00, of 14.01. Students can attend the oral exam in the next exam dates too (starting from Feb). |

02/02/2022, start at 09:00 | room C1 | text, results, solution | Oral will start at 14:30 in the virtual room of the course, the same day of the written exam |

13/06/2022, start at 09:00 | room L1 | text, solution | |

04/07/2022 | text | ||

25/07/2022 | text |

# Background and Notes of the Course

I strongly suggest refreshing your knowledge about basic Algorithms and Data Structures by looking at the well-known book Introduction to Algorithms, Cormen-Leiserson-Rivest-Stein (third edition). Specifically, I suggest you look at the chapters 2, 3, 4, 6, 7, 8, 10, 11 (no perfect hash), 12 (no randomly built), 15 (no optimal BST), 18, 22 (no strongly connected components). Also, you could look at the Video Lectures by Erik Demaine and Charles Leiserson, specifically Lectures 1-7, 9-10, and 15-17.

Most of the content of the course will be covered by some notes I wrote in these years; for some topics, parts of papers/books will be used. You can download the latest version of these notes from this link. I state that **this material** will be published by *Cambridge University Press* as *Pearls of Algorithm Engineering* by me. This prepublication version is free to view and download for personal use only. Not for redistribution, resale or use in derivative works. © Paolo Ferragina 2020.

# Lectures

Video-lectures of this year are available below, the ones of the last academic year are available too (click on “Videos”, and then sort them by name).

The lectures below include also some **EXTRA** material, which is suggested to be read for students aiming to high rankings.

Date | Lecture | Biblio |
---|---|---|

13/09/2021 | Introduction to the course. Models of computation: RAM, 2-level memory. An example of algorithm analysis: the sum of n numbers, and binary search. The B-tree (or B+-tree) data structure: searching and updating a big-set of sorted keys. Exercise: evaluating the I/O-cost of binary search. | Chap. 1 of the notes. Read note 1 and note 2 on B-trees. video part 1 and part 2 |

14/09/2021 | The role of the Virtual Memory system. Algorithm for Permuting. Sorting atomic items: sorting vs permuting, comments on the time and I/O bounds. Binary merge-sort, and its I/O-bounds. | Chapter 2 (no sect 2.5-) video part 1 and part 2 |

15/09/2021 | Snow Plow, with complexity proof and an example. Multi-way mergesort. The case of D>1 disks: non-optimality of multi-way MergeSort, the disk-striping technique. Finding the maximum-sum subsequence (study from my notes!). | Chap. 5 of the notes. video part 1 and part 2 |

Students are warmly invited to refresh their know-how about: Divide-and-conquer technique for algorithm design and Master Theorem for solving recurrent relations; and Binary Search Trees | Lecture 2, 9 and 10 of Demaine-Leiserson's course at MIT | |

20/09/2021 | Lower bound for sorting: comparisons and I/Os. Quicksort: recap on best-case, worst-case. Quicksort: Average-case with analysis. Selection of kth ranked item in linear average time (with proof). 3-way partition for better in-memory quicksort. RandSelect. EXTRA: Lower bound for permuting. | Chap. 5 of the notes video part 1 and part 2 |

21/09/2021 | The dual pivot. Bounded Quicksort. Multi-way Quicksort: definition, design and I/O-complexity. Selection of k-1 “good pivot” via Oversampling. Proof of the average time complexity of Multi-way Quicksort. | Chap. 5 of the notes video part 1 and part 2 |

22/09/2021 | Random sampling: disk model, known length (algorithms and proofs). Random sampling on the streaming model, known and unknown length. Reservoir sampling. Algorithm and proofs. | Chap. 3 of the notes. |

27/09/2021 | Exercises on Random Sampling and sorting. Randomized data structures: Treaps (with time complexity and proofs). | Notes by others. Study also Theorems and Lemmas. video part 1 and part 2 |

28/09/2021 | Randomized data structures: Skip lists (with proofs and comments on time-space complexity and I/Os). | See Demaine's lecture num. 12 on skip lists. video part 1 and part 2 |

29/09/2021 | Fast set intersection, various solutions: scan, sorted merge, binary search, mutual partition, binary search with exponential jumps. Fast set intersection: two-level scan. Interpolation Search. EXTRA: random shuffling (sect. 6.4). | Chap. 6 of the notes, and Sect 9.2. video part 1 and part 2 |

06/10/2021 | Prefix-free codes, the notion of entropy. Integer coding: the problem and some considerations. | Chap. 11 of the notes. From last year: Video part 1part 2 |

11/10/2021 | Shannon Theorem and optimal codes. The codes Gamma and Delta, space/time performance, and consideration on optimal distributions. The codes Rice, PForDelta, variable-byte. | Video part 1 and part 2 |

12/10/2021 | (s,c)-codes, with examples. | Video part 1 and part 2 |

13/10/2021 | Interpolative code. Elias-Fano code. | Video part 1 and part 2 |

18/10/2021 | More on Elias-Fano code. Exercises | Video part 1 and part 2 |

19/10/2021 | Exercises on integer coders. Python code for Interpolative coding. | Video and notes of today. |

20/10/2021 | Exercises on integer encoders. Lab activity (optional): A gentle introduction to the Succinct Data Structure Library: integer sequences. (Please, prepare in advance the environment for experimenting with the SDSL library, as indicated at the repo.) | video and slides SDSL |

25/10/2021 | String sorting: comments on the difficulty of the problem on disk, lower bound. LSD-radix sort with proof of time complexity and correctness. MSD-radix sort and the trie data structure. | Chap. 7 of the notes. Video |

26/10/2021 | Multi-key Quicksort. Ternary search tree. Exercises. | Video |

27/10/2021 | Exercises. | Video |

02/11/2021 | Exercises. | video |

03/11/2021 | Exercises. | |

08/11/2021 | MidTerm Exam (see above for details, and topics are the ones up to this point in the Agenda). | |

09/11/2021 | Rank and Select: definition and succinct solution for Rank. Compressed solution based on Elias-Fano coding.EXTRA: Solution for Select. | Chapter 15 of the notes. Video |

10/11/2021 | Succinctly encoding binary trees, with examples. | Sect. 9.2. Video |

15/11/2021 | Data Compression. Static, semistatic and dynamic models for frequency estimation. Huffman, with optimality (proof) and relation to Entropy | Chap. 12 of the notes. No PPM. video |

16/11/2021 | Canonical Huffman: construction, properties. Huffman decompression. Arithmetic coding: properties, algorithm and proofs. | Video |

17/11/2021 | Exercises on Statistical coders | notes, Video |

22/11/2021 | More on Arithmetic coding: decoding, encoding the output, and entropy bounds. Dictionary-based compressors: properties and algorithmic structure. LZ77, LZSS, LZ78. | Chap 13.1 and 13.2. No LZW. Slides and video. |

23/11/2021 | Substring search: definition, properties, reduction to prefix search. The Suffix Array. Binary searching the Suffix Array: p log n. Suffix Array construction via qsort and its asymptotic analysis. Text mining use of suffix arrays. Compressor bzip: MTF, RLE, Burrows-Wheeler Transform. | Chap. 10 of the notes: 10.1, 10.2.1 (but no page 10-4 and 10-5 and thus no Lemma 10.2), 10.2.3 (but no “The skew algorithm”, no “The Scan-based algorithm”), 10.4.3 and 14.1-14.3. Video and slides |

23/11/2021 | Exercises | Video and notes |

24/11/2021 | How to construct the BWT via qsort. How to invert the BWT. Exercises. | Video part 1 and part 2. Notes on exercises. |

Students are warmly invited to refresh their know-how about: hash functions and their properties; hashing with chaining. | Lectures 7 of Demaine-Leiserson's course at MIT | |

29/11/2021 | Hashing and dictionary problem: direct addressing, simple hash functions, hashing with chaining. Uniform hashing and its computing/storage cost, universal hashing (definition and properties). Two examples of Universal Hash functions: one with correctness proof, the other without. | Chap. 8 of the notes. All theorems with proof, except Theo 8.3 and 8.5 without a proof (only the statement). Video |

30/11/2021 | Cuckoo hashing (with all proofs). Minimal ordered perfect hashing: definition, properties, construction, space and time complexity. | No Perfect Hash (8.5). Video |

01/12/2021 | Bloom Filter: properties, construction, query and insertion operations, error estimation (with proofs). Spectral Bloom Filter. Prefix search: definition of the problem, solution based on arrays, Front-coding, two-level indexing based on compacted tries or Patricia Trees. Analysis of space, I/Os and time of all described solutions. EXTRA: Lower bound on BF space | No 8.7.2 (compressed BF). Chap. 9 of the notes: 9.1, 9.4, 9.5. No Locality Preserving front coding (9.3). Video |

06/12/2021 | Exercises | video |

07/12/2021 | Exercises | Video |

13/12/2021 | Student meeting - exercises | Video |

15/12/2021 | FinalTerm exam. Topics will be the ones that we have dealt with after the MidTerm exam. Students have to register at the following link by December 7th, 2021. |