### Indice

# Algorithm Engineering A.A. 2018-2019

**Teachers**: Paolo Ferragina and Linda Pagli

**CFU:** 9 (first semester).

**Course ID:** 531AA.

**Language:** English.

**Degree:** Master degree in Computer Science and Master degree in CS&Networking.

**Question time:** After lectures it will be announced via Twitter

**News:** about this course will be distributed via a Tweeter-channel.

**Official lectures schedule:** The schedule and content of the lectures is available below and with the official registro.

# Goals

In this course we will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. The design and analysis will involve several models of computation— such as RAM, 2-level memory, cache-oblivious, streaming— in order to take into account the architectural features and the memory hierarchy of modern PCs and the availability of Big Data upon which those algorithms could work on. We will add to such theoretical analysis several other engineering considerations spurring from the implementation of the proposed algorithms and from experiments published in the literature.

Every lecture will follow a problem-driven approach that starts from a real software-design problem, abstracts it in a combinatorial way (suitable for an algorithmic investigation), and then introduces algorithms aimed at minimizing the use of some computational resources like time, space, communication, I/O, energy, etc. Some of these solutions will be discussed also at an experimental level, in order to introduce proper engineering and tuning tools for algorithmic development.

## Week Schedule of the Lectures

Week Schedule | ||
---|---|---|

Day | Time Slot | Room |

Monday | 9:00 - 11:00 | L |

Tuesday | 11:00 - 13:00 | L1 |

Wednesday | 11:00 - 13:00 | F1 |

# Exam

Dates | Room | Text | Notes |
---|---|---|---|

29 oct 2018, 14:00-16:00 | C1, L1 | First Midterm exam: text, results, solution part 1, solution part 2. | Students that got a rank >= 16 can participate to the second midterm exam. Friday 9th November, hr 17:00, room L1, correction of the written exam. |

19 dec 2018, 14:00-16:00 | C1, L1 | Second Midterm exam: text, results, solution | Correction and registration will occur in Sala Seminari Ovest, Dip. Informatica, Wednesday 9th January, hr 17:00. Score “30 e lode” is assigned only to the students who got in both exams the score 30. The score is lost if the student participates to one of the next exams (just sitting is enough !). The score can be registered in any of the following exam dates (even in the summer), but PLEASE do not write your name in the ESAMI platform if you want to register your exam score, just show yourself in one of those dates. |

18 jan 2018, 11:00-13:00 | A1 | text, solution | |

15 feb 2018, 11:00-13:00 | A1 | text, results. | Correction and registration will occur in Sala Seminari EST, Dip. Informatica, Monday 18th February, hr 16:00. The score can be registered in any of the following exam dates (even in the summer). The score is lost if the student participates to one of the next exams (just sitting is enough !). |

4 apr 2019, 11:00-13:00 | text | Results: Lorenzi (24), Seitanidis (ins). Correction and registration will occur in Ferragina's office: Monday 15th april at 14:30 (sharp), otherwise in any other following exam's date. | |

18 jun 2019, 09:00-13:00 | L1 | text | If a student wishes to take also the Information Retrieval exam, s/he has to write to Prof. Ferragina for scheduling it in the same morning and avoid overlapping. |

16 jul 2019, 09:00-13:00 | L1 | text, solution | If a student wishes to take also the Information Retrieval exam, s/he has to write to Prof. Ferragina for scheduling it in the same morning and avoid overlapping. |

11 sep 2019, 09:00-13:00 | L1 | text | If a student wishes to take also the Information Retrieval exam, s/he has to write to Prof. Ferragina for scheduling it in the same morning and avoid overlapping. |

# Background

I strongly suggest to refresh your knowledge about basic Algorithms and Data Structures by looking at the well-known book Introduction to Algorithms, Cormen-Leiserson-Rivest-Stein (third edition). Specifically, I suggest you to look at the chapters 2, 3, 4, 6, 7, 8, 10, 11 (no perfect hash), 12 (no randomly built), 15 (no optimal BST), 18, 22 (no strongly connected components). Also, you could look at the Video Lectures by Erik Demaine and Charles Leiserson, specifically Lectures 1-7, 9-10 and 15-17.

# Books, Notes, etc.

We'll use both the *old-fashioned* blackboard and slides. Most of the content of the course will be covered by some notes prof. Ferragina wrote in these years; for some topics parts of papers/books will be used.

# Lectures

Date | Lecture | Biblio | Slides |
---|---|---|---|

17/09/2018 | Introduction to the course. Models of computation: RAM, 2-level memory. An example of algorithm analysis: the sum of n numbers. The role of the Virtual Memory system. | Chap. 1 of the notes. | Lecture0 |

18/09/2018 | Maximum sub-array sum in 1d and its variations. | Chap. 2 of the notes. Until 2.5: first page and half. | Lecture1 |

19/09/2018 | Variants of Maximum sub-array sum. Random Sampling in the 2-level model. Random sampling: disk model, known length, the streaming model m=1. | Sec 2.5 of Chapter 2 (no proofs); Chap. 3 of the notes. Sec.3.1 | Lecture2 |

24/09/2018 | Random sampling on the streaming model, known and unknown length. Reservoir sampling. Algorithm and proofs. The List Ranking problem: parallel solution | ||

25/09/2018 | List Ranking: difficulties on disk, pointer-jumping technique, I/O-efficient simulation. Divide and Conquer for List Ranking. Randomized coin tossing to determine the independent set. | Chap. 4 of the notes. | Lecture3-4 |

Students are warmly invited to refresh their know-how about: Divide-and-conquer technique for algorithm design and Master Theorem for solving recurrent relations; and Binary Search Trees | Lecture 2, 9 and 10 of Demaine-Leiserson's course at MIT | ||

26/09/2018 | Sorting atomic items: sorting vs permuting, comments on the time and I/O bounds, binary merge-sort and its bounds. Snow Plow and compression. Multi-way mergesort. Algorithm for Permuting. | Chap. 5 of the notes | |

01/10/2018 | Divide and Conquer Algorithm for List Ranking: Example. Deterministic Coin Tossing. | ||

02/10/2018 | Lower bounds for sorting. The case of D>1 disks: non-optimality of multi-way MergeSort, the disk-striping technique. Quicksort: recap on best-case, worst-case. | Lower bound of Permuting is optional (sect 5.2.2). | lec.6 |

03/10/2018 | Quicksort: Average-case with analysis. Selection of k-th ranked item in linear average time (with proof). 3-way partition for better in-memory quicksort. RandSelect. | lec.7 analisi caso medio (Note di F.Luccio) | |

8/10/2018 | Bounded Quicksort; Multiway Quicksort. Selection of k-1 “good pivot” via Oversampling. Proof of the average time complexity | lec 8. | |

9/10/2017 | Dual Pivot QuickSort. Recap: BFS and DFS visits, Minimum Spanning Tree problem: Kruskal and Prim algorithms and analysis. | CLR cap.23 | lec.9 |

10/10/2018 | Algorithms for external and semi-external computation of MST, Sybein algorithm. | Sect 11.5 of the Mehlhorn-Sander's book. | |

15/10/2017 | Fast set intersection, various solutions: scan, sorted merge, binary search, mutual partition, binary search with exponential jumps. | Chap. 6 of the notes. | |

16/10/2017 | Fast set intersection: two-level scan, random shuffling. String sorting: comments on the difficulty of the problem on disk, lower bound. LSD-radix sort with proof of time complexity and correctness. | Chap. 7 of the notes. For “Interpolation Search” please read the corresponding section in this preliminary note. | |

17/10/2017 | MSD-radix sort and the trie data structure. Multi-key Quicksort. Ternary search tree. | ||

22/10/2017 | Exercises. | ||

23/10/2017 | Exercises. | ||

24/10/2017 | Exercises. | simulazione alg. Sybein | |

A copy of the full notes is available here (updated 19-12-2018, Thanks to Gemma Martini, Francesco Tosoni, Daniele Gadler, and all other students that contributed to improve them this year). | |||

05/11/2018 | Prefix search: definition of the problem, solution based on arrays, Front-coding, two-level indexing. Locality Preserving front coding and its use with arrays. | Chap. 9 of the notes: 9.1, 9.3. | |

06/11/2017 | Interpolation search. Compacted tries. Analysis of space, I/Os and time of the prefix search for all data structures seen in class. More on two-level indexing of strings: Solution based on Patricia trie, with analysis of space, I/Os and time of the prefix search. Locality Preserving front coding and its use with Patricia trie. | Chap. 9 of the notes: 9.4 and 9.5. | |

12/11/2018 | Substring search: definition, properties, reduction to prefix search. The Suffix Array. Binary searching the Suffix Array: p log n. Searching in Suffix Arrays with p + log n. Suffix Array construction via qsort and its asymptotic analysis. LCP array construction in linear time. | Chap. 10 of the notes: 10.1, 10.2.1 and 10.2.2, 10.2.3 (no “The skew algorithm”, “The Scan-based algorithm”). | |

13/11/2018 | Suffix Trees: properties, structure, pattern search, space occupancy. Construction of Suffix Trees from Suffix Arrays and LCP arrays, and vice versa. Text mining use of suffix arrays. | Sect 10.3, 10.3.1, 10.3.2, 10.4.3 | |

14/11/2018 | The k-mismatch problem with SA+LCP or ST and RMQ data structure. Auto-completion search. RMQ and LCA queries, equivalence and reductions, their algorithmic solutions and few applications. | Sect 10.4.1 | |

19/11/2017 | Prefix-free codes, notion of entropy, optimal codes. Integer coding: the problem and some considerations. The codes Gamma and Delta, space/time performance and consideration on optimal distributions. Rice, PForDelta. | Chap. 11 of the notes | |

20/11/2017 | Coders: (s,c)-codes, variable-byte, Interpolative. Elias-Fano. With examples. | ||

21/12/2018 | Exercises. | ||

26/11/2017 | Huffman, with optimality (proof). Canonical Huffman: construction, properties, decompression algorithm. | Chap. 12 of the notes (no sect 12.1.2). | |

27/11/2017 | Arithmetic coding: properties, algorithm and proofs. Dictionary-based compressors: properties and algorithmic structure. LZ77, LZSS and LZ8. | Chap. 12 sect 12.2. No PPM and Range coding. Chap 13, no from par 13.3 and following ones. | |

28/11/2017 | No Lecture. | ||

Students are warmly invited to refresh their know-how about: hash functions and their properties; hashing with chaining. | Lectures 7 of Demaine-Leiserson's course at MIT | ||

3/12/2018 | LZ compression via Suffix Tree. Hashing and dictionary problem: direct addressing, simple hash functions, hashing with chaining, uniform hashing and its computing/storage cost, universal hashing (definition and properties). | Chap 10 at sect 10.4.2. Chap. 8 of the notes. Theorem 8.3 without proof, Theo 8.5 without proof (only the statements). | |

4/12/2018 | Two examples of Universal Hash functions: one with correctness proof, the other without. Perfect hash table (with proof). Minimal ordered perfect hashing: definition, properties, construction, space and time complexity. | ||

5/12/2018 | Cuckoo hashing (with proof). Bloom Filter: properties, construction, query and insertion operations, error estimation (with proofs). | No proof of lower bound. | |

6/12/2018 Room C1 9:00 - 11:00 | Exercises. | ||

7/12/2018 Room C1 9:00 - 11:00 | Exercises. | Scribe Gemma Martini | |

10/12/2017 | No lecture | ||

11/12/2017 | Randomized data structures: Skip lists (with proofs and comments on I/Os), and Treaps (with proofs). | Notes by others. Study also Theorems and Lemmas. See Demaine's lecture num. 12 on skip lists. | |

12/12/2017 | Exercises | ||

14/12/2017 | Exercises |