Reading Group: String Algorithms
This seminar will discuss recent algorithms for text processing and sequence similarity, as published in conferences like STOC, FOCS or SODA. We study sequence similarity measures such as edit distance (ED) and Hamming distance (HD), which are ubiquitous in text processing and bioinformatics. Starting from the basic dynamic programming algorithms that run in quadratic time, we study approximation algorithms that run in near-linear time or even sublinear time. We also cover approximate pattern matching as well as embedding, sketching, and possibly streaming algorithms.
The organizers bring expertise on different aspects of these algorithms. One goal of this course is that the organizers teach each other what they know about string algorithms. The seminar will therefore feature about 10 talks by the organizers, presenting their work on string algorithms, in a format similar to the Algorithms and Data Structures core lecture. Additional to attending these lectures, your task is to read and present one paper on string algorithms.
We also encourage students to take the opportunity and actively work on current research questions with us.
Please register on this website to get access to the full course material, including the link to follow the course virtually.
If you want to take this seminar for credit, you must additionally register in the seminar system until April 12: https://seminars.cs.uni-saarland.de/seminars22
The core lecture Algorithms and Data Structures is a mandatory requirement. In exceptional cases we can allow students who do not fulfill this requirement but have attended other relevant courses (e.g. basic lecture on Algorithms and Data Structures, Algorithms for Sequence Analysis).
We will meet once a week on Wednesdays 16:15 to 17:45, starting on April 13. The seminar takes place in building E2 4, lecture hall IV, but it is also possible to join virtually via Microsoft Teams (see the Virtual Guide). If you are a student taking credit points for the seminar, you will be assigned a paper from the list below and will give a 60-to-90-minute presentation about it. One of the organizers will be your advisor and you can discuss the paper with them before your presentation. You are expected to give a practice talk to your advisor at least one week before the scheduled talk. Successful participation is rewarded with 7 credit points, and the final grade is determined by the quality of the presentation.
This is the tentative schedule of the course.
|13.04.2022||Basics I: dynamic programming, kangaroo jumping etc.||Karl Bringmann|
|20.04.2022||Basics II: approximating HD, LCE queries, fine-grained lower bounds for exact ED algorithms||Karl Bringmann|
|27.04.2022||Approximating ED: k-approximation in sublinear time [GKKS] + tree distance framework [AKO]||Alejandro Cassis|
|04.05.2022||Approximating ED: ko(1)-approximation in near-linear time and how to go sublinear [AKO, BCFN]||Nick Fischer|
|11.05.2022||Approximating ED: constant-factor-approximation in subquadratic time [CDGK+, AN]||Nick Fischer|
|18.05.2022||Approximating ED: a simple sublinear algorithm (student topic 1)|
|25.05.2022||Embedding ED into HD: distortion k [CGK], sketching ED [BZ,KPS]||Tomasz Kociumaka|
|01.06.2022||Embedding ED into HD: distortion no(1) [OR] and algorithmic uses [AO]||Alejandro Cassis|
|08.06.2022||Pattern Matching with HD errors: Exact algorithms (student topic 2)|
|15.06.2022||Pattern Matching with HD errors: Lower bounds (student topic 3)|
|29.06.2022||Pattern Matching with ED errors: Exact algorithms [CKWa,CKWb]||Philip Wellnitz|
|13.07.2022||Pattern Matching with HD errors: Approximation algorithms (student topic 5)|
|20.07.2022||Pattern Matching with HD errors: Streaming approximation algorithms + sublinear algorithms [CGKK+]||Tomasz Kociumaka|
|27.07.2022||Pattern Matching with ED errors: Streaming algorithms [KPS]||Tomasz Kociumaka|
The following five topics will be distributed among the students participating in the seminar. As a student, you will be asked to select three of these topics (with priorities) and we will find a distribution making everyone as happy as possible.
You may also decide to present one of the other papers listed in the references, but be warned that they differ in understandability/necessary prerequisites from the student topics.
|1||A simple sublinear algorithm for gap edit distance [BCR]: The authors Brakensiek, Charikar and Rubinstein present one of the arguably easiest algorithms achieving a non-trivial edit distance approximation in sublinear time. The techniques involve rolling hashes and the "ruler trick" which is commonly used for sliding-window Hamming distance problems.||Alejandro Cassis|
|2||Pattern Matching with HD errors: Exact algorithms [A, GU]: In this session we want to understand the best-known algorithms for the pattern matching problem with k mismatches. The presentation starts with the classical algorithm by Abrahamson [A] and shows how to obtain the generalized algorithm by Gawrychowski and Uznański [GU].||This presentation is only concerned about the algorithmic part in [GU] (not the lower bounds).||Tomasz Kociumaka|
|3||Pattern Matching with HD errors: Lower bounds [GU, GLU]: In this session the goal is to prove that the algorithms from the previous topic are best-possible [GU], conditioned on some popular hypothesis in the area of fine-grained complexity theory. Moreover, the presentation includes a selection of the fine-grained equivalences presented in [GLU].||
We especially recommend this paper to students of the lecture "Fine-Grained Complexity Theory". This presentation is only concerned about the lower bound in [GU] (not the algorithmic part).
|4||Pattern Matching (with HD errors) in the Streaming Model [PP]: This paper by Porat and Porat studies the complexity of pattern matching and pattern matching with mismatches in a streaming setting (i.e., in a setting where the input strings can only be read once from left to right and we want to minimize the algorithm's space usage). The techniques involve rolling hashes and a periodicity argument.||Karl Bringmann|
|5||Pattern Matching with HD errors: Approximation algorithms [CGKK+]: This recent paper by Chan, Golan, Kociumaka, Kopelowitz and Porat approximates the pattern matching problem with mismatches in a purely combinatorial way. For context: Previous papers (like [KP]) relied on the Fast Fourier Transform which inherently requires time O(n log n), whereas this new technique gives algorithm with close-to-linear and even linear running times.||This paper [CGKK+] includes lots of results; for this talk only the content in Sections 1-4 is considered relevant.||Nick Fischer|
In the following we give a tentative list of the papers which we will (partially) present in this seminar.