Registration for this course is open until Sunday, 25.10.2026 23:59.

News

Currently, no news are available

Statistical Data Compression

Advanced lecture with theoretical and practical exercises, V2+Ü2, for Computer Science and related programs.
Please verify with your exam office if you can get ECTS credit for this course if you are not studying Computer Science.

Prerequisites

Solid foundations of mathematics (MInf1 - Minf3, or [Bio]StatsLab instead of MInf3); solid programming skills.
Good knowledge of basic probability theory (e.g. conditional probabilities) is necessary.
Some knowledge of string algorithms (e.g. full text indices, suffix arrays) is also assumed.
That said, we will offer quick refreshers on both subjects.

Credits 6 ECTS credits
Required time V2+Ü2 (2 hours of lectures, 2 hours of tutorials per week)
Language English
Final Exam written; need to have passed midterm to take part. Course failed if failed. If passed, 70% of your grade.
Midterm Exam mandatory to pass to qualify for final exam. If passed, 30% of your grade.
Tentative date: 26 Nov 2026 14:00 - 16:00 c.t.
Registration click on Registration in the menu header
Details Materials will be available after registration under Information > Materials
Times
 

Lectures: Tentatively HS001 E2.5 on Tuesdays (10:00 - 11:00 s.t.) and Thursdays (14:00 - 15:00 s.t.).
Tutorials: Tentatively after each lecture - HS001 E2.5 on Tuesdays (11:00 - 12:00 c.t.) and Thursdays (15:00 - 16:00 c.t.). c.t.: there will be a 15-minute break between the lecture and tutorial sessions.
Office Hours: On appointment

Mode lecture in presence
Link https://cms.sic.saarland/statdc26/
Instructor Kamila Szewczyk
Examiner Prof. Dr. Sven Rahmann
Tutorials Kamila Szewczyk

 

Content Overview

  • Basic notions of information theory
  • Review of string algorithms (suffix arrays, Burrows-Wheeler Transform BWT)
  • Optimal prefix codes (e.g. Huffman codes)
  • Optimal codes (arithmetic coding)
  • Asymmetric numeral systems (rANS, tANS)
  • Lempel-Ziv factoring and compression (LZ77, LZ78, LZP/ROLZ)
  • LZ match finding algorithms
  • Adaptive Huffman codes
  • Adaptive range coding
  • Introduction to Prediction by Partial Matching (PPM)
  • Context-adaptive binary arithmetic coding
  • Dynamic Markov coding
  • Advanced techniques im PPM (information inheritance, secondary escape estimation, low order estimation)
  • Advanced low-order bitwise models (indirect, finite state machines)
  • Linear/logistic context mixing
  • Non-linear post-mixing maps (adaptive probability maps, APM)
  • Symbol ranking (SR); move-to-front (MTF), quantised local-frequency coding (QLFC)
  • BWT post-coding techniques
  • Parsing optimization for context mixing (PAQ9A)
  • PAQ8 architecture
  • Compression via large language models (LLMs)
  • Neural network mixing
  • Case study: DNA compression


Target audience 

This course is offered as an advanced / specialized lecture in B.Sc. or M.Sc. Computer Science programs at UdS.
It should be taken later in the B.Sc. program or any time in the M.Sc. program.
The material requires good mathematical knowledge (probabilities!) and solid low-level programming skill (you need to know your hardware).
Good programming languages to know would be C, Rust or C++, for example.

 

Privacy Policy | Legal Notice
If you encounter technical problems, please contact the administrators.