News
Currently, no news are available
Statistical Data Compression
Advanced lecture with theoretical and practical exercises, V2+Ü2, for Computer Science and related programs.
Please verify with your exam office if you can get ECTS credit for this course if you are not studying Computer Science.
| Prerequisites |
Solid foundations of mathematics (MInf1 - Minf3, or [Bio]StatsLab instead of MInf3); solid programming skills. |
| Credits | 6 ECTS credits |
| Required time | V2+Ü2 (2 hours of lectures, 2 hours of tutorials per week) |
| Language | English |
| Final Exam | written; need to have passed midterm to take part. Course failed if failed. If passed, 70% of your grade. |
| Midterm Exam | mandatory to pass to qualify for final exam. If passed, 30% of your grade. Tentative date: 26 Nov 2026 14:00 - 16:00 c.t. |
| Registration | click on Registration in the menu header |
| Details | Materials will be available after registration under Information > Materials |
| Times |
Lectures: Tentatively HS001 E2.5 on Tuesdays (10:00 - 11:00 s.t.) and Thursdays (14:00 - 15:00 s.t.). |
| Mode | lecture in presence |
| Link | https://cms.sic.saarland/statdc26/ |
| Instructor | Kamila Szewczyk |
| Examiner | Prof. Dr. Sven Rahmann |
| Tutorials | Kamila Szewczyk |
Content Overview
- Basic notions of information theory
- Review of string algorithms (suffix arrays, Burrows-Wheeler Transform BWT)
- Optimal prefix codes (e.g. Huffman codes)
- Optimal codes (arithmetic coding)
- Asymmetric numeral systems (rANS, tANS)
- Lempel-Ziv factoring and compression (LZ77, LZ78, LZP/ROLZ)
- LZ match finding algorithms
- Adaptive Huffman codes
- Adaptive range coding
- Introduction to Prediction by Partial Matching (PPM)
- Context-adaptive binary arithmetic coding
- Dynamic Markov coding
- Advanced techniques im PPM (information inheritance, secondary escape estimation, low order estimation)
- Advanced low-order bitwise models (indirect, finite state machines)
- Linear/logistic context mixing
- Non-linear post-mixing maps (adaptive probability maps, APM)
- Symbol ranking (SR); move-to-front (MTF), quantised local-frequency coding (QLFC)
- BWT post-coding techniques
- Parsing optimization for context mixing (PAQ9A)
- PAQ8 architecture
- Compression via large language models (LLMs)
- Neural network mixing
- Case study: DNA compression
Target audience
This course is offered as an advanced / specialized lecture in B.Sc. or M.Sc. Computer Science programs at UdS.
It should be taken later in the B.Sc. program or any time in the M.Sc. program.
The material requires good mathematical knowledge (probabilities!) and solid low-level programming skill (you need to know your hardware).
Good programming languages to know would be C, Rust or C++, for example.
