Reading Group: String Algorithms Karl Bringmann, Tomasz Kociumaka

Registration for this course is open until Friday, 22.07.2022 00:00.

News

22.04.2022

Start at 16:15

Since the room is in use earlier, we will start at 16:15 from now on.

13.04.2022

Virtual Participation

After registering on this webpage, you can log in and access Information->Virtual Guide, where you can find the link to follow the course virtually.

 

Reading Group: String Algorithms

This seminar will discuss recent algorithms for text processing and sequence similarity, as published in conferences like STOC, FOCS or SODA. We study sequence similarity measures such as edit distance (ED) and Hamming distance (HD), which are ubiquitous in text processing and bioinformatics. Starting from the basic dynamic programming algorithms that run in quadratic time, we study approximation algorithms that run in near-linear time or even sublinear time. We also cover approximate pattern matching as well as embedding, sketching, and possibly streaming algorithms.

The organizers bring expertise on different aspects of these algorithms. One goal of this course is that the organizers teach each other what they know about string algorithms. The seminar will therefore feature about 10 talks by the organizers, presenting their work on string algorithms, in a format similar to the Algorithms and Data Structures core lecture. Additional to attending these lectures, your task is to read and present one paper on string algorithms.

We also encourage students to take the opportunity and actively work on current research questions with us.

Registration

Please register on this website to get access to the full course material, including the link to follow the course virtually.

If you want to take this seminar for credit, you must additionally register in the seminar system until April 12: https://seminars.cs.uni-saarland.de/seminars22

Requirements

The core lecture Algorithms and Data Structures is a mandatory requirement. In exceptional cases we can allow students who do not fulfill this requirement but have attended other relevant courses (e.g. basic lecture on Algorithms and Data Structures, Algorithms for Sequence Analysis).

Modalities

We will meet once a week on Wednesdays 16:15 to 17:45, starting on April 13. The seminar takes place in building E2 4, lecture hall IV, but it is also possible to join virtually via Microsoft Teams (see the Virtual Guide). If you are a student taking credit points for the seminar, you will be assigned a paper from the list below and will give a 60-to-90-minute presentation about it. One of the organizers will be your advisor and you can discuss the paper with them before your presentation. You are expected to give a practice talk to your advisor at least one week before the scheduled talk. Successful participation is rewarded with 7 credit points, and the final grade is determined by the quality of the presentation.

Schedule

This is the tentative schedule of the course.

Date Topic Speaker
13.04.2022 Basics I: dynamic programming, kangaroo jumping etc. Karl Bringmann
20.04.2022 Basics II: approximating HD, LCE queries, fine-grained lower bounds for exact ED algorithms Karl Bringmann
27.04.2022 Approximating ED: k-approximation in sublinear time [GKKS] + tree distance framework [AKO] Alejandro Cassis
04.05.2022 Approximating ED: ko(1)-approximation in near-linear time and how to go sublinear [AKO, BCFN] Nick Fischer
11.05.2022 Approximating ED: constant-factor-approximation in subquadratic time [CDGK+, AN] Nick Fischer
18.05.2022 Approximating ED: a simple sublinear algorithm (student topic 1)  
25.05.2022 Embedding ED into HD: distortion k [CGK], sketching ED [BZ,KPS] Tomasz Kociumaka
01.06.2022 Embedding ED into HD: distortion no(1) [OR] and algorithmic uses [AO] Alejandro Cassis
08.06.2022 Pattern Matching with HD errors: Exact algorithms (student topic 2)  
15.06.2022 Pattern Matching with HD errors: Lower bounds (student topic 3)  
22.06.2022 Pattern Matching with ED errors: Exact algorithms [CKWa,CKWb] - virtual talk Philip Wellnitz
29.06.2022 Pattern Matching with HD errors: Streaming algorithms (student topic 4)  
06.07.2022 Pattern Matching with HD errors: Approximation algorithms (student topic 5)  
13.07.2022 Pattern Matching with HD errors: Streaming approximation algorithms + sublinear algorithms [CGKK+] Tomasz Kociumaka
20.07.2022 Pattern Matching with ED errors: Streaming algorithms [KPS] Tomasz Kociumaka

Student Topics

The following five topics will be distributed among the students participating in the seminar. As a student, you will be asked to select three of these topics (with priorities) and we will find a distribution making everyone as happy as possible.

You may also decide to present one of the other papers listed in the references, but be warned that they differ in understandability/necessary prerequisites from the student topics.

Ref Topic Comment Advisor
1 A simple sublinear algorithm for gap edit distance [BCR]: The authors Brakensiek, Charikar and Rubinstein present one of the arguably easiest algorithms achieving a non-trivial edit distance approximation in sublinear time. The techniques involve rolling hashes and the "ruler trick" which is commonly used for sliding-window Hamming distance problems.   Alejandro Cassis
2 Pattern Matching with HD errors: Exact algorithms [A, GU]: In this session we want to understand the best-known algorithms for the pattern matching problem with k mismatches. The presentation starts with the classical algorithm by Abrahamson [A] and shows how to obtain the generalized algorithm by Gawrychowski and Uznański [GU]. This presentation is only concerned about the algorithmic part in [GU] (not the lower bounds). Tomasz Kociumaka
3 Pattern Matching with HD errors: Lower bounds [GU, GLU]: In this session the goal is to prove that the algorithms from the previous topic are best-possible [GU], conditioned on some popular hypothesis in the area of fine-grained complexity theory. Moreover, the presentation includes a selection of the fine-grained equivalences presented in [GLU].

We especially recommend this paper to students of the lecture "Fine-Grained Complexity Theory". This presentation is only concerned about the lower bound in [GU] (not the algorithmic part).

Karl Bringmann
4 Pattern Matching (with HD errors) in the Streaming Model [PP]: This paper by Porat and Porat studies the complexity of pattern matching and pattern matching with mismatches in a streaming setting (i.e., in a setting where the input strings can only be read once from left to right and we want to minimize the algorithm's space usage). The techniques involve rolling hashes and a periodicity argument.   Karl Bringmann
5 Pattern Matching with HD errors: Approximation algorithms [CGKK+]: This recent paper by Chan, Golan, Kociumaka, Kopelowitz and Porat approximates the pattern matching problem with mismatches in a purely combinatorial way. For context: Previous papers (like [KP]) relied on the Fast Fourier Transform which inherently requires time O(n log n), whereas this new technique gives algorithm with close-to-linear and even linear running times. This paper [CGKK+] includes lots of results; for this talk only the content in Sections 1-4 is considered relevant. Nick Fischer

References

In the following we give a tentative list of the papers which we will (partially) present in this seminar.

Ref Paper Title Authors
[A] Generalized String Matching Karl Abrahamson
[AKO] Polylogarithmic approximation for edit distance and the asymmetric query complexity Alexandr Andoni, Robert Krauthgamer, Krzysztof Onak
[AN] Edit distance in near-linear time: It's a constant factor Alexandr Andoni, Negev Shekel Nosatzki
[ANSS] Estimating the longest increasing subsequence in nearly optimal time Alexandr Andoni, Negev Shekel Nosatzki, Sandip Sinha, Clifford Stein
[AO] Approximating edit distance in near-linear time Alexandr Andoni, Krzysztof Onak
[BCR] A simple sublinear algorithm for gap edit distance Joshua Brakensiek, Moses Charikar, Aviad Rubinstein
[BCFN] Almost-optimal sublinear-time edit distance in the low distance Regime Karl Bringmann, Alejandro Cassis, Nick Fischer, Vasileios Nakos
[BZ] Edit Distance: Sketching, Streaming and Document Exchange Djamal Belazzougui, Qin Zhang
[CDGK+] Approximating edit distance within constant factor in truly sub-quadratic time Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, Michael Saks
[CGK] Streaming algorithms for embedding and computing edit distance in the low distance regime Diptarka Chakraborty, Elazar Goldenberg, Michal Koucký
[CGKK+] Approximating text-to-pattern Hamming distances Timothy M. Chan, Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat
[CKWa] Faster approximate pattern matching: A unified approach Panagiotis Charalampopoulos, Tomasz Kociumaka, Philip Wellnitz
[CKWb] Faster Pattern Matching under Edit Distance Panagiotis Charalampopoulos, Tomasz Kociumaka, Philip Wellnitz
[GU] Optimal trade-offs for pattern matching with k mismatches Paweł Gawrychowski, Przemysław Uznański
[GKKS] Gap Edit Distance via Non-Adaptive Queries: Simple and Optimal Elazar Goldenberg, Tomasz Kociumaka, Robert Krauthgamer, Barna Saha
[GKS] Sublinear algorithms for gap edit distance Elazar Goldenberg, Robert Krauthgamer, Barna Saha
[GLU] Hamming distance completeness and sparse matrix multiplication Daniel Graf, Karim Labib, Przemysław Uznański
[KPS] Small space and streaming pattern matching with k edits Tomasz Kociumaka, Ely Porat, Tatiana Starikovskaya
[KP] A simple algorithm for approximating the text-to-pattern Hamming distance Tsvi Kopelowitz, Ely Porat
[OR] Low distortion embeddings for edit distance Rafail Ostrovsky, Yuval Rabani
[PP] Exact and approximate pattern matching in the streaming model Benny Porat, Ely Porat


Privacy Policy | Legal Notice
If you encounter technical problems, please contact the administrators