### Indice

# Algorithm Engineering A.A. 2010-2011

**Teacher**: Paolo Ferragina

**CFU:** 6 (first semester).

**Course ID:** 283AA.

**Language:** English.

**Degree:** All master degrees in Computer Science, and the one in CS&Networking in collaboration with the Scuola Normale Superiore “S. Anna”.

**Lectures schedule:** Monday, 9-11 (Room C1); Thursday, 14-16 (Room C).

**Question time:** Monday or Thursday at 11-12.30, Room 295 (Ferragina's office), Dept of Computer Science.

**News:** about this course will be distributed via a Tweeter-channel with hashtag `#ae2010`

**Official lectures schedule:** The schedule and content of the lectures is available with the official registro.

# News

# Goals

In this course we will study, design and analyze (theoretically and experimentally) advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. These algorithmic tools will be designed and analyzed in several models of computation— such as RAM, 2-level memory, cache-oblivious, streaming— in order to take into account the architectural features and the memory hierarchy of modern PCs.

Every lecture will follow a problem-driven approach that starts from a real software-design problem, abstracts it in a combinatorial way (suitable for an algorithmic investigation), and then introduces algorithmic solutions aimed at minimizing the use of some computational resources like time, space, communication, I/O, energy, etc. Some of these solutions will be discussed at an experimental level, in order to introduce proper engineering and tuning tools for algorithmic development.

# Background

If you wish to refresh your mind on Algorithms and Data Structures, I suggest you to follow the Video Lectures by Erik Demaine and Charles Leiserson, specifically Lectures 1-5, 7 and 10. There it is missing the part on basic graph problems (representation, DFS, BFS, topological sort) which you may browse in any book, such as Introduction to Algorithms by Cormen-Leiserson-Rivest-Stein, third edition.

# Exam

# Books, Notes, etc.

I'll project no slides for teaching, and use *just* the *old-fashioned* blackboard.

Since the course covers many different algorithmic themes and will deal with advanced results and issues which are not yet part of books, I'll distribute notes and copies of papers/books which will constitute the reading material of the course.

This is a preliminary list of books from which I've taken some parts (details in the lectures):

**[M]**U. Manber, “Introduction to Algorithms: A Creative Approach”, Addison-Wesley, 1989.**[MS]**K. Mehlhorn and P. Sanders. “Algorithms and data structures: The basic toolbox”, Springer, 2009**[WMB]**I.H. Witten and A. Moffat and T.C. Bell, “Managing Gigabytes”, Morgan Kauffman, Second edition, 1999.

# List of Lectures

Date | Lecture | Biblio |
---|---|---|

23/09/10 | Refreshing: Analysis of algorithms, (asymptotic) time/space complexity, Big-Oh notation, Divide-and-Conquer, Binary search, Recurrences, and few other examples. | Lectures 1, 2 and 3 of Demaine-Leiserson's course at MIT |

24/09/10 | Refreshing: Mergesort and Quicksort. Heap: properties and operations. | Lectures 4 of Demaine-Leiserson's course at MIT |

27/09/10 | Refreshing: Hashing: hash functions and their properties. Hashing with chaining. Binary Search Trees: recursive visit, pre/in/post visits. Red-Black Trees. | Lectures 7 and 10 of Demaine-Leiserson's course at MIT |

30/09/10 | Refreshing: Graphs: how to represent it in memory. DFS and BFS visits, properties and applications. | part of Lecture 17 of Demaine-Leiserson's course at MIT. Look at the book CLR. |

05/10/10 | Definition of “Algorithm Engineering”. Models of computation: 2-level memory and cache-oblivious models. An example: summing n numbers. | |

06/10/10 | Effect of caching on the performance of external-memory algorithms. Maximum sub-array Sum. | Notes |

07/10/10 | Random sampling in the disk model and in the streaming model. | Notes |

14/10/10 | Permuting and Sorting in the RAM model and the 2-level memory model: binary mergesort, its analysis and its limitations. Some optimizations: the SnowPlow technique. | |

18/10/10 | Multi-way mergesort. The I/O lower-bound for sorting and permuting. Sorting on D disks: disk striping and its UN-optimality. | |

25/10/10 | Quicksort: pivot selection, average-case analysis. Randomised selection of pivot in linear time. | Notes |

04/11/10 | String sorting: limits of qsort and comparison-based algorithms. MSD-first Radixsort, LSD-first Radixsort, Multi-key Quicksort. | Notes |

07/11/10 | Few exercises. | |

11/11/10 | Prefix search: Array of string pointers, front coding, locality preserving front-coding. | |

15/11/10 | Interpolation search. | preliminary notes |

22/11/10 | Tries, Patricia Trie, String B-tree. | |

02/12/2010 | Text indexing: Suffix tree and array. Text mining with the suffix array. Suffix array's construction. | notes and slides |

06/12/2010 | Hashing: chaining, universal, perfect. | notes (no open addressing) |

09/12/2010 | d-left hashing, cuckoo hashing, minimal ordered perfect hashing. | note1, note2, slides |

20/12/2010 | The Bloom Filter and its spectral variant. | slides |

10/01/2011 | Data compression: notion of entropy, Huffman coding (and its optimality), Canonical Huffman codes. | MG (up to page 41) |

13/01/2011 | Data compression: LZ77, LZ78 and gzip. | MG pages 74-81 |

17/01/2011 | Data compression: Burrows-Wheeler Transform, MTF, RLE and bzip. Skip lists: description, properties, proof for heigth and search time. | slides and notes |