Lecturer: Nadia Pisanti
[May 7th] Seminar schedules have been published.
[April 30th] Please be aware that there will be no class on Monday, May 27th due to University cancelling all lectures on that day to allow students to go home and vote for EU elections.
[April 15th] As communicated in the class, on Monday April 29th we will lecture 4 academic hours starting at 14:00 to catch up the classes that were canceled last week.
[April 1st *not a joke*] I recall that next week there will be no classes as comunicated at the beginning of the course. See you on April 15th.
[March 26th] Papers have been assigned! The student Antonio Nuzzo is asked to send an email to prof.Pisanti (his email bounced).
[March 12th] The assignment of papers for the seminar will take place on Monday March 25th.
[March 5th] From now on, in absence of news, lectures will be on Monday's from 14:00 to 15:30 and on Tuesday's from 9:15 to 10:45. Always in room L1.
[February 27th] Next week we will have classes on Monday both at 14:00 and at 16:00, and both in room L1.
[February 12th] The classes of April 8th and April 9th will not take place: we will have to find two slots to catch them up.
[February 12th] This page has been created!
This course has the goal to give the student an overview of algorithmic methods that have been conceived for the analysis of genomic sequences. We will focus both on theoretical and combinatorial aspects as well as on practical issues such as whole genomes sequencing, sequences alignments, the inference of repeated patterns and of long approximated repetitions, the computation of genomic distances, and several biologically relevant problems for the management and investigation of genomic data. The exam (see below for its description) has the goal to evaluate the students understanding of the problems and the methods described in the course. Moreover, the exam is additionally meant as a chance to learn how a scientific paper is like, and how to make an oral presentation on scientific/technical topics, that is designed for a specific audience.
A brief introduction to molecular biology: DNA, proteins, the cell, the synthesis of a protein.
Sequences Alignments: Dynamic Programming methods for local, global, and semi-local alignments. Computing the Longest Common Subsequences. Multiple Alignments.
Pattern Matching: Exact Pattern Matching: algorithms (Knuth-)Morris-Pratt, Boyer-Moore, Karp-Rabin with preprocessing of the pattern. Algorithm with preprocessing of the text: use of indexes. Motifs Extraction: KMR Algorithm for the extracion of exact motifs and its modifications for the inference of approximate motifs.
Finding Repetitions: Algorithms for the inference of long approximate repetitions. Filters for preprocessing.
Fragment Assembly: Genomes sequencing: some history, scientific opportunities, and practical problems. Some possible approaches for the problem of assembling sequenced fragments. Link with the “Shortest common superstring” problem, the Greedy solution. Data structures for representing and searching sequencing data.
New Generation Sequencing: Applications of High Throughput Sequencing and its algorithmic problems and challenges. Investigating data types resulting from the existing biotechnologies, and the possible data structures and algorithms for their storage and analysis.
A basic course on algorithmic.
SUFFIX TREE tre.pdf
PATTERN MATCHING patternmatching1.pdf e patternmatching2.pdf
FRAGMENT ASSEMBLY fragmentassembly.pdf
SEQUENCES ALIGNMENTS allineamenti.pdf
FINDING REPETITIONS: FILTERING amb.pdf
NEW GENERATION SEQUENCING illumina-assembly.pdf
OVERVIEW OF SEQUENCING TECNOLOGIES en104-pisanti.pdf
BUBBLES IN DE BRUIJN GRAPHS (slides) seminar-bubbles.pdf
Each student is assigned a paper that is a very recent scientific work on topics related to those of the course (tipically it is a paper accepted for publication in the proceedings of an international conference that is going to be held in a few weeks/months). The paper is part of a pool of possible papers selected by the lecturer. The paper assignment follows a brief description of all papers in the pool made by the lecturer, and a bidding phase of the students over such papers. Once the student has his/her paper assigned, the task is to prepare and make a presentation of that work that: (1) describes the results presented in that paper, (2) is suited for the actual audience (that will be the course class) as for comprehension opportunity, (3) sticks to the allowed time slot.
Students presentations usually take place all together somewhen at the end of the course. Exceptions are possible upon request for specific needs. Once the course is over, students can undergo the examination anytime during the academic year by agreeing an appointment: please, send an email to the teacher.
MONDAY May 20th
Marco Cardia: MALVA: genotyping by Mapping-free ALlele detection of known VAriants.
Luca Corbucci: Haplotype-aware graph indexes.
Nicolas Manini: Efficient computation of Sequence Mappability.
Andre' Santos: GenMap: Fast and Exact Computation of Genome Mappability.
TUESDAY May 21st
Shadi Shajari: Simulating the DNA String Graph in Succint Space.
Antonio Nuzzo: Indexing de Bruijn Graphs with minimizers.
Riccardo Manetti: An Efficient, Scalable and Exact representation of High-Dimensional Color Information via de Bruijn Graph Search.
Pouria Faraji: Parallel decompression of gzip-compressed files and random access to DNA sequences.
TUESDAY May 28th
Tabriz Hajiyev: Faster queries for longest substring palindrome after block edit.
Newsha Ozgoli: Safe and Complete algorithms for dynamin programming problems, with an application to RNA folding.
Federico Finocchio: CONSENT - Scalable Self-Correction of long reads with multiple sequence alignment.
Paolo Contenti: Efficient Construction of a Complete Index for Pan-Genomics Read Alignments.