News
31.05.2023
|
No tutorial today (31/05)Hi everyone, I am sorry to announce that there will be no tutorial today due to a health problem. In the meantime if you have any questions that we normally discuss during the tutorial session, don't hesitate to write me an email at... Read more Hi everyone, I am sorry to announce that there will be no tutorial today due to a health problem. In the meantime if you have any questions that we normally discuss during the tutorial session, don't hesitate to write me an email at dang@cs.uni-saarland.de Have a nice day, |
28.05.2023
|
Updates (Lecture slides; Project work)Hello everyone, hope you are enjoying a long weekend with tomorrow's holiday. In the meantime, we have updated the slides on Bloom filters and uploaded the slides on XOR filters, which you can find in the Materials section. Please remember to complete the... Read more Hello everyone, hope you are enjoying a long weekend with tomorrow's holiday. In the meantime, we have updated the slides on Bloom filters and uploaded the slides on XOR filters, which you can find in the Materials section. Please remember to complete the practice assignment on computing the FPR of blocked Bloom filters by Friday 02.06. Everything you need (formulas) are on the Bloom filter slides; you just have to put it together and run it for some values of h and c. To clarify a question that came up: These assignments are, in a sense, optional, i.e. there is no punishment for not doing them. The requirement for passing the course is doing a project (to be discussed on Friday 02.06.) and then passing an exam on the course material and your project. However!, doing these assignments will help your understanding of the material (I hope) and give you some more programming practice before things get busy. Friday 02.06. is also the day of the next lecture, where we complete the XOR filters (I had planned on completing them last time) and then talk about hashing more generally, e.g., what are good and simple hash functions; what are good collision resolution strategies? I will present project ideas, and you are also welcome to propose your own ideas. After that lecture there will be a longer break and the following lecture will be on 23.06. During this time between lectures, you should start project work. |
19.05.2023
|
Slides and Assignment on Bloom Filters + Next Lecture DatesGood afternoon, the slides of today's lecture (and some more) are now online, as well as assignment 3 which asks you to visualize the false positive rate of standard vs. blocked Bloom filters. It is due in 2 weeks (on 02.06.). As for the updated schedule, the... Read more Good afternoon, the slides of today's lecture (and some more) are now online, as well as assignment 3 which asks you to visualize the false positive rate of standard vs. blocked Bloom filters. It is due in 2 weeks (on 02.06.). As for the updated schedule, the plan is to move ahead with the posted schedule on the main page (except that the 12.05. was moved to 19.05.). This means, the next lecture will already be next week (26.05.) and then one more the week after (02.06.). We will continue a bit on Bloom filters (especially their use as k-mer filters!), and then move to hashing, so we can implement key-value stores for k-mers and associated information. |
12.05.2023
|
Today's lecture movedGood morning, unfortunately, due to personal reasons, I have to cancel the lecture today. It will be moved to next week. I am really sorry for the inconvenience. The topic is "Bloom Filters", an important basic data structure, with many variations that make it... Read more Good morning, unfortunately, due to personal reasons, I have to cancel the lecture today. It will be moved to next week. I am really sorry for the inconvenience. The topic is "Bloom Filters", an important basic data structure, with many variations that make it work much better in practice. An updated schedule will be posted later today. Sven Rahmann |
28.04.2023
|
Slides and Assignment onlineHello everyone, after today's lecture, you are well prepared for a first practice assignment: to write your intbitarray.py implementation, which will allow you to create an array of integers of any bit-width 1 <= w <= 64, without wasting any bits. Please... Read more Hello everyone, after today's lecture, you are well prepared for a first practice assignment: to write your intbitarray.py implementation, which will allow you to create an array of integers of any bit-width 1 <= w <= 64, without wasting any bits. Please carefully review the bitarray lecture before you start with the assignment. It is due in 2 weeks from today. Happy coding! |
19.04.2023
|
Tutorial Today at 15:00Good morning, Wednesday is our tutorial day; the E2.1 Computer Pool has been reserved for this course from 14 - 18. However, due to another start-of-semester-kickoff meeting, the tutorial will start 15:00 today! As you currently do not have a project... Read more Good morning, Wednesday is our tutorial day; the E2.1 Computer Pool has been reserved for this course from 14 - 18. However, due to another start-of-semester-kickoff meeting, the tutorial will start 15:00 today! As you currently do not have a project assignment, we will use this week to make sure everyone has a good setup, mainly: shell, conda, git. Please make sure you do the setup this week, as we cannot repeat the same instructions every week after.
|
15.04.2023
|
Welcome to the Course!Hello, in our first lecture, we discussed that the focus of the course is on project work using jit-compiled Python for implementing an alignment-free sequence analysis project. I presented a simple example how compilation speeds up Python code, and in particular... Read more Hello, in our first lecture, we discussed that the focus of the course is on project work using jit-compiled Python for implementing an alignment-free sequence analysis project. I presented a simple example how compilation speeds up Python code, and in particular explained the advantages of late compilation. The example can be found in the Materials section (after registration), together with the slide set. There are no video recordings of this lecture.
|
Alignment-free Sequence Analysis with Python
(Special lecture 1V+2Ü, 5 CP for MSc Bioinformatics, open to other CS degree programs as well: Computer Science, DSAI, Cybersecurity, etc.)
The lectures start on Friday April 14, 2023. The last lecture is on Friday, July 21, 2022 (summary and review).
Lectures:
- Every second Friday 08:30 - 10:00 (this is a 1V course, given as 2V bi-weekly).
- 8 Lecture Dates: 14.04, 28.04.,
12.05.19.05., 26.05. 02.06. (not 09.06.!), 23.06. 07.07., 21.07. - Room: E2.1, room 0.01 (Center for Bioinformatics, seminar room, ground floor)
Tutorials and Practical sessions (with Vu Lam Dang)
- Every Wednesday 14-18 (ZIB Pool)
You need to be registered for this course in order to access the Materials section with lecture slides, homework assignments, etc.
Contents (preliminary; may change)
- Interpreted vs. eagerly compiled vs. jit-compiled programs
- jit-compiling Python and the LLVM compiler infrastructure
- encoding DNA and its k-mers
- low-level data structures: bit arrays, arbitrary bit-width integer arrays
- rank and select queries on bit vectors
- wavelet trees and related data structures
- probabilistic filtering (set membership data structures)
- cache-friendly hashing algorithms
- multi-threaded programming and parallelizing hash tables
- consumer-producer pipelines as programming tools
- strong vs. weak k-mers
- localizing read origin by a single k-mer
- alignment-free single nucleotide variant detection
- alignment-free small to medium-size indel detection
- alignment-free structural variant detection
Requirements
- You should be familiar with the material "Algorithms for Sequence Analysis" (which may also be taken in parallel to this course).
- You should have very good Python programming skills (beyond basics)
- You should be interested in low-level nuts-and-bolts algorithm engineering topics.
- You should enjoy project work.
Passing the Course
In order to pass this (graded) course for 5 ECTS credits, a number of conditions needs to be satisfied:
- You need to register for the course here in the CMS to access the course materials. including projects.
- You need to deliver an individual programming project, which needs to run on known and unknown test data (to be discussed when the individual projects are started). The project is ungraded, but needs to test successfully in order to be admitted to the exam.
- You need to pass a graded oral exam of 20-25 min about the lecture contents and your specific programming project. For the oral exam, LSF registration is necessary (up to 1 week before the exam date; no exceptions).