This is the course I teached during March 2010 at the Bertinoro International Spring School for PhD students in Computer Science and Engineering. The main goal of the course was to address the new algorithmic challenges posed by the processing of massive data occurring in modern information retrieval and data mining applications, and dictated by the architectural features of the memory hierarchy of current computers. These issues force algorithm designers to address simultaneously data compression (fitting more data in the faster/smaller memory levels) and cache-friendly data access (exploiting the accessing features of memory levels).
Every lecture will follow a problem-driven approach that starts from a real software-design problem, abstracts it in a combinatorial way (suitable for an algorithmic investigation), and then introduces basic and sophisticated algorithmic solutions aimed at minimizing the use of computational resources like time, space, communication, I/O, energy consumption, etc.. The theoretical investigation will go hand-in-hand with some algorithm-engineering considerations.
Lecture 1. Introduction to (Modern) Computational Models. Sorting vs Permuting. [slides, material]
Lecture 2. Hashing: Uniform, Universal, Perfect, Cuckoo. [slides, material]
Lecture 3. Dictionaries: From arrays to (Patricia) tries and Bloom Filters, from plain storage to (Locality Preserving) Front Coding. [slides, material]
Lecture 4. Text indexing and mining: suffix trees and arrays. Some mining queries, and the issue “LCA=RMQ”. [slides, material]
Lecture 5. Data compression: basics, Huffman, Arithmetic, bzip2 and beyond. [slides, material]
The exam consists either in the solution of a bunch of exercises, or in the attack of one of the software project listed below. Each project can be conducted in team (no more than 3 students), and should be preferably close to the research interest of the candidate.
The students willing to give the exam should send me an email with subject [BISS09], specifying the chosen subject for the work, and the list of participants in the team. Once negotiated, the assigned teamwork will be inserted in this wiki. The exam must be completed within 2010.