Strumenti Utente

Strumenti Sito


magistraleinformatica:ad:ad_18:start

Year 2018-2019

Announcements

Please register at http://piazza.com/unipi.it/fall2017/642aa for electronic discussion and other activities.

Schedule

  • Please note that the class on April 16 will not take place.
  • Class hours: Tue 16:00‑18:00 (Fib X2), Wed 11:00‑13:00 (Fib X2), Fri 11:00‑13:00 (Fib X2)
  • Office hours: Fri 13:00-16:00 or by appointment.

Overview

The advanced nature of this course focuses on developing algorithmic design skills, exposing the students to complex problems that cannot be directly handled by standard libraries (being aware that several basic algorithms and data structures are already covered by the libraries of modern programming languages), thus requiring a significant effort in problem solving. These problems involve all basic data types, such as integers, strings, (geometric) points, trees and graphs as a starting point. The syllabus is structured to highlight the applicative situations in which the corresponding algorithms can be successfully employed, making references to software applications and libraries. The level of detail in each argument can change year-by-year, and will be decided according to requests coming from other courses in the curriculum and/or specific issues arising in, possibly novel, applicative scenarios.

Exams

Written exam:

  1. choose one of the topics discussed in class
  2. write a very short to-do list and ask the instructor for approval
  3. if the instructor suggests some mods, modify the to-do list according to the instructor's comments and repeat step 2
  4. if the chosen topic and the to-do list are approved, expand the to-do list into a more detailed to-do list and repeat step 3
  5. make a written report in English and submit it to the instructor (recall to add at least 20% new content, when compared to what seen in class)
  6. meet the instructor to read together the report and get some comments on it
  7. make the necessary mods

Suggested reading: some useful tips for scientific writing in English (first two sections) by J.S. Vitter.

Example of interaction: student and instructor discussing the report's content and structure.

Oral exam: topics discussed in class, please read the references in the notes.

Topics

Caveat: Several topics are the outcomes of recent advancements in the field, and thus the course material mostly consists in research papers or book chapters.

Date Topics References and notes
28.02.2019 Introduction on the course and glimpse on the topics none

Randomization, hashing and data streaming

Randomization is a powerful tool to solve large-scale problems. After introducing the concept of randomized algorithms and hashing, we consider some applications, such as data streaming algorithms, a field emerged in the last decade. Here data flow as a stream and one-pass algorithms with limited memory can process it. We focus on the count-min sketch paradigm and its applications. [Note: to refresh the basic notions on counting and probability, please refer to Appendix C in Cormen-Leiserson-Rivest-Stein's book “Introduction to Algorithms”, 3rd ed., MIT Press.]

Date Topics References and notes
05.03.2019 Playing with probability. Random indicator variables: secretary problem and random permuting (suggested reading: birthday paradox). Randomized quick sort. [CLRS 5.1-5.3 (optional 5.4.1), par. 7.3] code
06.03.2019 Virus scan and stream analysis with Karp-Rabin fingerprints: randomized checking and pattern matching. Montecarlo and Las Vegas algorithms. [RM par.7.4-7.6] code
08.03.2019 Dictionary of keywords. Quick review of classical hashing. Universal hashing. Markov's inequality. Perfect hashing. [CLRS 11.2, 11.3.3, CLRS 11.5 ] code
12.03.2019 Data Streaming algorithms. Motivations and examples. Count-Min Sketches sects.1-3, 4.1 Site Notes code
13.03.2019 Case study on hashing: rsync and file synchronization using hash functions. slides
15.03.2019 Queries with Count-Min Sketches: implementation and analysis. sects.3-4 Notes
19.03.2019 Case study on hashing: document tagging and perfect hashing. paper code
20.03.2019 Document resemblance with MinHash, k-sketches and the Jaccard similarity index. Azuma-Hoeffding bound. paper paper Azuma-Hoeffding code
22.03.2019 Proxy caches and dictionaries with errors: Bloom filters. Survey: except par.2.5-2.6 (optional: par.2.2)
26.03.2019 Bloom filters (continued). Cuckoo hashing. Notes Notes code
27.03.2019 Case study on data streams (I): probabilistic counting. slides code
29.03.2019 Cuckoo hashing (continued). Distributed server and load balancing through hashing. blog Sect.7 and 8.1
09.04.2019 Networked data and randomized min-cut algorithm for graphs. par.1.1
10.04.2019 Case study on data streams (II): set membership and heavy hitters. slides code
12.04.2019
Activity in class
Official documents for the course
magistraleinformatica/ad/ad_18/start.txt · Ultima modifica: 10/04/2019 alle 12:21 (2 settimane fa) da Roberto Grossi