News

Missing files uploaded; project deadline 15.09.

Written on 14.08.23 by Sven Rahmann

Dear all,

the missing files from the last lecture have been uploaded.

Please remember your projects. The deadline can be moved if needed. In principle, I need to review the work in a timely manner so we can conduct exams before the 30.09. in principle (end of semester). However, it can also be… Read more

Dear all,

the missing files from the last lecture have been uploaded.

Please remember your projects. The deadline can be moved if needed. In principle, I need to review the work in a timely manner so we can conduct exams before the 30.09. in principle (end of semester). However, it can also be moved into October, but must be finished before the start of the new lecture period. So I'm asking you to send me your project work as soon as it is finished, but in one month (15.09.) at the latest. Exam dates will be made on an individual basis. The exam will discuss the project (for a few minutes) but then ask about the theory (hashes, filters, some Python).

Project Assignment

Written on 24.06.23 by Sven Rahmann

Hello,

I have completed the project assignment based on your preferences. Please read it in the description of the project slides in the Materials section. Please let me know if something is wrong. Next week, I will add slides about possible data to work on. I would like have short individual… Read more

Hello,

I have completed the project assignment based on your preferences. Please read it in the description of the project slides in the Materials section. Please let me know if something is wrong. Next week, I will add slides about possible data to work on. I would like have short individual meetings about each project to discuss specifics.

Project Proposals online

Written on 12.06.23 by Sven Rahmann

Hello,

I have finally uploaded some project proposals. Please find them in the Materials section and have a look. Please send me your preferred project (rahmann@cs) within one week (till Monday, June 19, 23:59). If you have a different idea, you can also do something else (within the general… Read more

Hello,

I have finally uploaded some project proposals. Please find them in the Materials section and have a look. Please send me your preferred project (rahmann@cs) within one week (till Monday, June 19, 23:59). If you have a different idea, you can also do something else (within the general context of the lecture). In that case, please write a short sketch of your idea, also till Monday evening.

Furthermore, the slide sets have been updated, including a preview of the next lecture (on 23.06.)

No tutorial today (31/05)

Written on 31.05.23 by Vu Lam Dang

Hi everyone,

I am sorry to announce that there will be no tutorial today due to a health problem.

In the meantime if you have any questions that we normally discuss during the tutorial session, don't hesitate to write me an email at dang@cs.uni-saarland.de

Have a nice day,
Vu Lam.

Updates (Lecture slides; Project work)

Written on 28.05.23 by Sven Rahmann

Hello everyone,

hope you are enjoying a long weekend with tomorrow's holiday. In the meantime, we have updated the slides on Bloom filters and uploaded the slides on XOR filters, which you can find in the Materials section.

Please remember to complete the practice assignment on computing the FPR… Read more

Hello everyone,

hope you are enjoying a long weekend with tomorrow's holiday. In the meantime, we have updated the slides on Bloom filters and uploaded the slides on XOR filters, which you can find in the Materials section.

Please remember to complete the practice assignment on computing the FPR of blocked Bloom filters by Friday 02.06. Everything you need (formulas) are on the Bloom filter slides; you just have to put it together and run it for some values of h and c.

To clarify a question that came up: These assignments are, in a sense, optional, i.e. there is no punishment for not doing them. The requirement for passing the course is doing a project (to be discussed on Friday 02.06.) and then passing an exam on the course material and your project. However!, doing these assignments will help your understanding of the material (I hope) and give you some more programming practice before things get busy.

Friday 02.06. is also the day of the next lecture, where we complete the XOR filters (I had planned on completing them last time) and then talk about hashing more generally, e.g., what are good and simple hash functions; what are good collision resolution strategies? I will present project ideas, and you are also welcome to propose your own ideas.

After that lecture there will be a longer break and the following lecture will be on 23.06. During this time between lectures, you should start project work.

Slides and Assignment on Bloom Filters + Next Lecture Dates

Written on 19.05.23 by Sven Rahmann

Good afternoon,

the slides of today's lecture (and some more) are now online, as well as assignment 3 which asks you to visualize the false positive rate of standard vs. blocked Bloom filters. It is due in 2 weeks (on 02.06.).

As for the updated schedule, the plan is to move ahead with the… Read more

Good afternoon,

the slides of today's lecture (and some more) are now online, as well as assignment 3 which asks you to visualize the false positive rate of standard vs. blocked Bloom filters. It is due in 2 weeks (on 02.06.).

As for the updated schedule, the plan is to move ahead with the posted schedule on the main page (except that the 12.05. was moved to 19.05.). This means, the next lecture will already be next week (26.05.) and then one more the week after (02.06.). We will continue a bit on Bloom filters (especially their use as k-mer filters!), and then move to hashing, so we can implement key-value stores for k-mers and associated information.

Today's lecture moved

Written on 12.05.23 by Sven Rahmann

Good morning,

unfortunately, due to personal reasons, I have to cancel the lecture today. It will be moved to next week. I am really sorry for the inconvenience. The topic is "Bloom Filters", an important basic data structure, with many variations that make it work much better in practice.

An… Read more

Good morning,

unfortunately, due to personal reasons, I have to cancel the lecture today. It will be moved to next week. I am really sorry for the inconvenience. The topic is "Bloom Filters", an important basic data structure, with many variations that make it work much better in practice.

An updated schedule will be posted later today.

Sven Rahmann

Slides and Assignment online

Written on 28.04.23 by Sven Rahmann

Hello everyone,

after today's lecture, you are well prepared for a first practice assignment: to write your intbitarray.py implementation, which will allow you to create an array of integers of any bit-width 1 <= w <= 64, without wasting any bits.

Please carefully review the bitarray lecture… Read more

Hello everyone,

after today's lecture, you are well prepared for a first practice assignment: to write your intbitarray.py implementation, which will allow you to create an array of integers of any bit-width 1 <= w <= 64, without wasting any bits.

Please carefully review the bitarray lecture before you start with the assignment. It is due in 2 weeks from today.

Happy coding!

Tutorial Today at 15:00

Written on 19.04.23 by Sven Rahmann

Good morning,

Wednesday is our tutorial day; the E2.1 Computer Pool has been reserved for this course from 14 - 18.

However, due to another start-of-semester-kickoff meeting, the tutorial will start 15:00 today!

As you currently do not have a project assignment, we will use this week to make… Read more

Good morning,

Wednesday is our tutorial day; the E2.1 Computer Pool has been reserved for this course from 14 - 18.

However, due to another start-of-semester-kickoff meeting, the tutorial will start 15:00 today!

As you currently do not have a project assignment, we will use this week to make sure everyone has a good setup, mainly: shell, conda, git. Please make sure you do the setup this week, as we cannot repeat the same instructions every week after.

 

Welcome to the Course!

Written on 15.04.23 by Sven Rahmann

Hello,

in our first lecture, we discussed that the focus of the course is on project work using jit-compiled Python for implementing an alignment-free sequence analysis project. I presented a simple example how compilation speeds up Python code, and in particular explained the advantages of late… Read more

Hello,

in our first lecture, we discussed that the focus of the course is on project work using jit-compiled Python for implementing an alignment-free sequence analysis project. I presented a simple example how compilation speeds up Python code, and in particular explained the advantages of late compilation. The example can be found in the Materials section (after registration), together with the slide set.

There are no video recordings of this lecture.

 

Show all

Alignment-free Sequence Analysis with Python

(Special lecture 1V+2Ü, 5 CP for MSc Bioinformatics, open to other CS degree programs as well: Computer Science, DSAI, Cybersecurity, etc.)

The lectures start on Friday April 14, 2023. The last lecture is on Friday, July 21, 2022 (summary and review).

Lectures:

  • Every second Friday 08:30 - 10:00 (this is a 1V course, given as 2V bi-weekly).
  • 8 Lecture Dates: 14.04, 28.04., 12.05.19.05., 26.05. 02.06. (not 09.06.!), 23.06. 07.07., 21.07.
  • Room: E2.1, room 0.01 (Center for Bioinformatics, seminar room, ground floor)

Tutorials and Practical sessions (with Vu Lam Dang)

  • Every Wednesday 14-18 (ZIB Pool)

You need to be registered for this course in order to access the Materials section with lecture slides, homework assignments, etc.

Contents (preliminary; may change)

  • Interpreted vs. eagerly compiled vs. jit-compiled programs
  • jit-compiling Python and the LLVM compiler infrastructure
  • encoding DNA and its k-mers
  • low-level data structures: bit arrays, arbitrary bit-width integer arrays
  • rank and select queries on bit vectors
  • wavelet trees and related data structures
  • probabilistic filtering (set membership data structures)
  • cache-friendly hashing algorithms
  • multi-threaded programming and parallelizing hash tables
  • consumer-producer pipelines as programming tools
  • strong vs. weak k-mers
  • localizing read origin by a single k-mer
  • alignment-free single nucleotide variant detection
  • alignment-free small to medium-size indel detection
  • alignment-free structural variant detection

 

Requirements

  • You should be familiar with the material "Algorithms for Sequence Analysis" (which may also be taken in parallel to this course).
  • You should have very good Python programming skills (beyond basics)
  • You should be interested in low-level nuts-and-bolts algorithm engineering topics.
  • You should enjoy project work.

 

Passing the Course

In order to pass this (graded) course for 5 ECTS credits, a number of conditions needs to be satisfied:

  • You need to register for the course here in the CMS to access the course materials. including projects.
  • You need to deliver an individual programming project, which needs to run on known and unknown test data (to be discussed when the individual projects are started). The project is ungraded, but needs to test successfully in order to be admitted to the exam.
  • You need to pass a graded oral exam of 20-25 min about the lecture contents and your specific programming project. For the oral exam, LSF registration is necessary (up to 1 week before the exam date; no exceptions).

 

Privacy Policy | Legal Notice
If you encounter technical problems, please contact the administrators.