Lecturer: Nadia Pisanti
[May 10th] The seminar's schedule is published in this site.
[May 3rd] The lecture of Monday May 7th will not take place due to the Director's decision to grant students the time to for an assembly.
[April 17th] The lecture of Monday April 30th will not take place.
[March 13th] The assignment of papers for the final exam will take place during the lectures of Monday April 16th and Tuesday April 17th.
[March 8th] On Tuesday March 20th there will be no class due to other professional engagement of the teacher.
[February 27th] On Monday March 5th there will be no classes due to the University's decision to stop teaching activities to support participation to political elections.
[February 19th] The lectures schedule is: Monday from 14 to 15:30 in classroom L1, and Tuesday from 9:15 to 10:45 in classrom L1.
[February 18th] This page has been created!
This course has the goal to give the student an overview of algorithmic methods that have been conceived for the analysis of genomic sequences. We will focus both on theoretical and combinatorial aspects as well as on practical issues such as whole genomes sequencing, sequences alignments, the inference of repeated patterns and of long approximated repetitions, the computation of genomic distances, and several biologically relevant problems for the management and investigation of genomic data. The exam (see below for its description) has the goal to evaluate the students understanding of the problems and the methods described in the course. Moreover, the exam is additionally meant as a chance to learn how a scientific paper is like, and how to make an oral presentation on scientific/technical topics, that is designed for a specific audience.
A brief introduction to molecular biology: DNA, proteins, the cell, the synthesis of a protein.
Sequences Alignments: Dynamic Programming methods for local, global, and semi-local alignments. Computing the Longest Common Subsequences. Multiple Alignments.
Pattern Matching: Exact Pattern Matching: algorithms (Knuth-)Morris-Pratt, Boyer-Moore, Karp-Rabin with preprocessing of the pattern. Algorithm with preprocessing of the text: use of indexes. Motifs Extraction: KMR Algorithm for the extracion of exact motifs and its modifications for the inference of approximate motifs.
Finding Repetitions: Algorithms for the inference of long approximate repetitions. Filters for preprocessing.
Fragment Assembly: Genomes sequencing: some history, scientific opportunities, and practical problems. Some possible approaches for the problem of assembling sequenced fragments. Link with the “Shortest common superstring” problem, the Greedy solution. Data structures for representing and searching sequencing data.
New Generation Sequencing: Applications of High Throughput Sequencing and its algorithmic problems and challenges. Investigating data types resulting from the existing biotechnologies, and the possible data structures and algorithms for their storage and analysis.
A basic course on algorithmic.
SUFFIX TREE tre.pdf
PATTERN MATCHING patternmatching1.pdf e patternmatching2.pdf
FRAGMENT ASSEMBLY fragmentassembly.pdf
SEQUENCES ALIGNMENTS allineamenti.pdf
FINDING REPETITIONS: FILTERING amb.pdf
NEW GENERATION SEQUENCING illumina-assembly.pdf
OVERVIEW OF SEQUENCING TECNOLOGIES en104-pisanti.pdf
BUBBLES IN DE BRUIJN GRAPHS (slides) seminar-bubbles.pdf
Each student is assigned a paper that is a very recent scientific work on topics related to those of the course (tipically it is a paper accepted for publication in the proceedings of an international conference that is going to be held in a few weeks/months). The paper is part of a pool of possible papers selected by the lecturer. The paper assignment follows a brief description of all papers in the pool made by the lecturer, and a bidding phase of the students over such papers. Once the student has his/her paper assigned, the task is to prepare and make a presentation of that work that: (1) describes the results presented in that paper, (2) is suited for the actual audience (that will be the course class) as for comprehension opportunity, (3) sticks to the allowed time slot.
The list of papers suggested for this year are here: paperi-bi-2018.zip.
Students presentations usually take place all together somewhen at the end of the course. Exceptions are possible upon request for specific needs. Once the course is over, students can undergo the examination anytime during the academic year by agreeing an appointment: please, send an email to the teacher.
- Monday May 21st, from 14 to 15:30, room L1:
# MICHAL PUNCOCHAR - Fixed-Parameter Tractable Sampling for RNA Design with Multiple Target Structures
# IVAN GRUJIC - Designing RNA Secondary Structures is Hard
# FEDERICO LAZZARI - Reverse de Bruijn: Utilizing Reverse Peptide Synthesis to Cover All Amino Acid k-mers
# NICOLA FERRONE - Modeling Biological Problems in Computer Science: A Case Study in Genome Assembly
- Tuesday May 22nd, from 9:15 to 11, room L1:
# RUGGIERO DI BENEDETTO - Longest substring palindrome after edit
# GIULIA PUNZI - Nearest constrained circular words
# SILVIA STRANIERI - Order-Preserving Pattern Matching Indeterminate Strings
# FRANCESCO LANDOLFI - Algorithmic Framework for Approximate Matching Under Bounded Edits with Applications to Sequence Analysis
- Wednesday May 23rd, from 16:00 to 17:45, Sala Seminari Ovest:
# AYTAN YAGUBLU - AllSome Sequence Bloom Trees
# FRANCESCA NARETTO - Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index
# MADDALENA AMENDOLA - DISCOSNP++: de novo detection of small variant from raw unassembled read set(s)
# MICHELE MICCINESI - Optimal Omnitig Listing for Safe and Complete Contig Assembly
- Thursday May 24th, from 16:00 to 17:45, Sala Seminari Ovest:
# DAVIDE RUCCI - Dynamic Programming on DAGs: Co-Linear Chaining Extended
# GIANMARCO RICCIARELLI - Circular Networks from Distorted Metrics
# ANTONIO SISBARRA - Algorithms for Computing the Family-Free Genomic Similarity under DCJ
# LEONARDO CARIAGGI - GTED: Graph Traversal Edit Distance
- Monday May 28st, from 14 to 15:30, room L1:
# GEMMA MARTINI - Assembly of Long Error-Prone Reads and Repeat Classification
# VLAD PANDELEA - Full-length de novo viral quasispecies assembly through variation graph construction
# CARMINE CASERIO - De novo assembly of viral quasispecies using overlap graphs
# FRANCESCO CARIAGGI - A graph-based approach to diploid genome assembly
- Tuesday May 29th, from 9:15 to 11, room L1:
# GASPARE FERRARO - TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes
# SIMONE SCHIRINZI - Faster Online Elastic Degenerate String Matching
# JOHANNES BUSCH - Efficient Algorithms to Discover Alterations with Complementary Functional Association in Cancer
# GIULIANO CORNACCHIA - Long Reads Enable Accurate Estimates Of Complexity Of Metagenomes
- Dates to be defined during or after summer exams session:
# JAVAD KHALILI - Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing
# LAPO TOLONI - Analysis of genomic rearrangements by using the Burrows-Wheeler transform of short-read data
# SARA VITELLARO - A safe and complete algorithm for metagenomic assembly