Main Page

News

Final Code Submission Deadline

Written on 21.07.25 by Vaastav Anand

Hi all,

Just a reminder that the deadline for submitting links (or zip files) for the final codebase is today at 8pm.

If we don't receive the links by 8pm tonight, you will receive a 0 for the implementation part of the project.

Presentation Schedule Tomorrow

Written on 15.07.25 by Vaastav Anand

The class will start at 2pm tomorrow (instead of 2:15) to accommodate all the presentations.

Here is the presentation order:

1. Jinhao & Bekhrouz

2. Umair & Talal

3. Aiman & Asim

4. Paritosh

5. Zawyar & Ali

6. Lukas & Bastien

7. Marius & Felix

Each slot will be 15 minutes:… Read more

The class will start at 2pm tomorrow (instead of 2:15) to accommodate all the presentations.

Here is the presentation order:

1. Jinhao & Bekhrouz

2. Umair & Talal

3. Aiman & Asim

4. Paritosh

5. Zawyar & Ali

6. Lukas & Bastien

7. Marius & Felix

Each slot will be 15 minutes: 10 minute presentation + 5 minute Q/A

Assignment 3 grades released

Written on 26.06.25 by Matheus Ulhoa Avelar Stolet

Assignment 3 has now been graded and the grades have been pushed to assn3_grades branch in a file called a3_grades.txt

If you did not receive a grade then please contact the course staff with a link to your repo.

Lecture 10 slides posted

Written on 25.06.25 by Vaastav Anand

Lecture 10 slides have been posted on CMS

Reminder: Assignment 4 Check-In Today

Written on 23.06.25 by Vaastav Anand

The first check-in for assignment 4 will happen during office hours today.

As stated in the lecture last week, for this check-in, please bring an implementation and experiment plan to the check-in.

Assignment 2 grades released

Written on 18.06.25 by Vaastav Anand

Assignment 2 has now been graded and the grades have been pushed to assn2_grades branch in a file called a2_grading.txt

If you did not receive a grade then please contact the course staff with a link to your repo.

Lecture 9 slides posted

Written on 18.06.25 by Vaastav Anand

Lecture 9 slides have now been posted on CMS

Project Assignments

Written on 18.06.25 by Vaastav Anand

Talal + Umair: Kubernetes Integration with Blueprint and Autoscaling (Assigned: Wednesday 4pm)

Bekhrouz + Jinhao: Evaluating the behavior of different retry algorithms (Assigned: Thursday 1pm)

Felix + Marius: Reproducing Laser of Death emergent misbehavior (Assigned: Thursday 4.40pm)

Ali… Read more

Talal + Umair: Kubernetes Integration with Blueprint and Autoscaling (Assigned: Wednesday 4pm)

Bekhrouz + Jinhao: Evaluating the behavior of different retry algorithms (Assigned: Thursday 1pm)

Felix + Marius: Reproducing Laser of Death emergent misbehavior (Assigned: Thursday 4.40pm)

Ali Fahad + Zawyar: Integrating Metafor with Blueprint (Assigned: Monday 7.30am, Restricted to just building an emulator with Blueprint from the Metafor specs)

Lukas + Bastien: Implement and analyze prioritized load shedding and compare its behavior to circuit breakers in dealing with metastability failures (Assigned: Monday 2.02pm)

Aiman + Asim: Implementing a critical path analysis service that can be integrated with Blueprint (Assigned: Monday 2.04pm)

Paritosh: Implementing Runtime Causal Inference for Root Cause Analysis (Assigned: Monday 2.49pm)

If you don't see your name assigned to any of the projects, then we did not receive a preferences e-mail from you. Please reach out to the course staff as soon as possible.

Project Team + Preferences

Written on 16.06.25 by Vaastav Anand

All teams must submit the list of their team members along with their top 5 project preferences by the end of today.

If you have already sent in your preferences and list of team members, then you do not have any further action to take.

If you have found a teammate but you have not sent the… Read more

All teams must submit the list of their team members along with their top 5 project preferences by the end of today.

If you have already sent in your preferences and list of team members, then you do not have any further action to take.

If you have found a teammate but you have not sent the preferences yet, then please do send in the preferences ASAP.

If you have not been able to find a teammate, then please send your individual preferences and we will form the teams from the group of students who haven't been able to find/register a team.

Send your preferences to the email: vaastav@mpi-sws.org by 5pm today.

Lecture 8 slides posted

Written on 11.06.25 by Vaastav Anand

Lecture 8 slides have been posted

Lecture today in the usual room 005

Written on 11.06.25 by Vaastav Anand

Our lecture today will be in our usual room, 005

Lecture 7 slides posted

Written on 04.06.25 by Vaastav Anand

Lecture 7 slides have now been posted on CMS

Assignment 3 released

Written on 02.06.25 by Vaastav Anand

Assignment 3 has now been released.

Due Date: 18th June, 2025 5pm PDT

Lecture 6 slides posted

Written on 29.05.25 by Vaastav Anand

Lecture 6 slides have been posted

Lecture 5 slides posted

Written on 24.05.25 by Vaastav Anand

Lecture 5 slides have been posted

Assignment 2 Deadline change

Written on 24.05.25 by Vaastav Anand

Assignment 2 deadline has moved back to Friday, May 30 5pm

No Seminar Today

Written on 14.05.25 by Vaastav Anand

Hi everyone,

This is just a reminder email that there is no seminar today. Our next meeting will be next week on May 21st, 2025.

Assignment 1 grades released

Written on 12.05.25 by Vaastav Anand

Hi everyone,

Assignment 1 grades have been pushed to your private forks in a file called assn1_grade.txt in the luggagsehare folder.

Overall, everyone did a very good job in implementing assignment 1.

Office Hourse Today in Room 105

Written on 12.05.25 by Vaastav Anand

Office Hours today have been shifted to Room 105 due to an ongoing event in 005.

Assignment 2 released

Written on 12.05.25 by Vaastav Anand

Assignment 2 has now been released on gitlab.

You can find the instructions here: Assignment 2

Assignment 1 Deadline and Assignment 2 release

Written on 10.05.25 by Vaastav Anand

Assignment 1 deadline has now passed and your submissions for assignment have now been locked in.

Assignment 2 will be released monday morning.

Happy Weekend!

Lecture 4 Slides posted

Written on 08.05.25 by Vaastav Anand

Lecture 4 slides are now posted on CMS

LSF Registration Deadline

Written on 06.05.25 by Vaastav Anand

Hi all,

It was brought to my attention that the LSF registration deadline is today. If you are taking this course for credit, then please register in the LSF.

You wouldn't be able to receive credit for the course if you miss the registration deadline.

Lecture 3 slides posted

Written on 03.05.25 by Vaastav Anand

Lecture 3 slides have now been posted on CMS

Office Hours Timings and Location

Written on 28.04.25 by Vaastav Anand

Office Hours Timing: Mondays 2pm - 3pm

Location: Room 005, E1 5 (all mondays except 12th May, 2025)

Location on May 12th, 2025: Room 029, E1 5

Lecture 2 slides posted

Written on 25.04.25 by Vaastav Anand

Lecture 2 slides are now posted on CMS

Assignment 1 released

Written on 24.04.25 by Vaastav Anand

Assignment 1 is now released at: https://gitlab.cs.uni-saarland.de/os/cldrel-25ss/assignments/-/tree/assn1

Each student should have received an invite to join their own fork of the assignments repository.

If you did not get an invitation to your own fork of the assignments repo, then it means… Read more

Assignment 1 is now released at: https://gitlab.cs.uni-saarland.de/os/cldrel-25ss/assignments/-/tree/assn1

Each student should have received an invite to join their own fork of the assignments repository.

If you did not get an invitation to your own fork of the assignments repo, then it means we were unable to find your username in the gitlab system. Please ensure that you have an active gitlab account and then contact the instructors with your account details to get access to your own fork.

Assignment Due Date: 10th May, 2025. 5pm CEST.

Lecture 1 Slides posted

Written on 14.04.25 by Vaastav Anand

Lecture 1 Slides are now posted on the CMS website

Show all

Reliability in Modern Cloud Systems

Cloud systems power a large fraction of the computing world today. Ensuring that these systems are correct and performant remains a key challenge that continues to bedevil developers. In this seminar, we will explore various themes around the various forms of reliability in modern cloud systems as well as learn about state-of-the-art strategies for mitigating incidents and understanding issues in modern cloud systems today.

Pre-requisites: Programming 2, Software Engineering Lab (Praktikum)

Recommended: Distributed Systems

Places: 20

Kickoff Meeting: 14.04.25, Monday 2:15pm-3:45pm

Lecture Time (23.04.25 onwards): Wednesdays, 2:15pm-3:45pm

Lecture Room: 005, E1 5

Office Hours (28.04.25 onwards): Mondays, 2pm-3pm

Office Hours Room: 005, E1 5 on all days except May 12th (Room 029)

Format

Each lecture will be divided into 2 parts:

- Lecture Part: In this part, the instructors will give a lecture on a specific topic in reliability.

- Discussion Part: In this part we will discuss the assigned reading and the previous week's lecture.

Assignments

All assignments will be based on Blueprint, a toolchain for generating microservice implementations and for exploring the design space of microservices.

Grading

- Assignment 1 - Implementing a basic Microservice Application using Blueprint: 10%

- Assignment 2 - Adding Observability to the Application and collecting traces from a workload: 20%

- Assignment 3 - Reproducing a Retry Storm: 25%

- Assignment 4 - Open Ended Project: 40%

- Participation in Discussion: 5%

Assignment 4: Open-Ended Project Details

List of Project Ideas: Project Ideas

Logistics:

There will be 6 teams of 2 and 1 team of 3. Each team will work on a unique project. If you have a different project idea from the ones listed, please contact the instructors ASAP.

Send top 5 preferences and team information (i.e. the members) by Monday June 16th 5pm PST via email.

Project assignments will be sent out by Wednesday June 18th start of class.

Grading:

60% Implementation

30% Technical Implementation * Difficulty Bonus
10% weekly check-ins (every monday during office hours)
10% Integration + Use of Blueprint
10% Ease-of-use + Documentation

40% Presentation (16th July, 2025)

25% on technical content
10% on q/a
5% on presentation

5% Bonus available for those who successfully produce a Pull Request for Blueprint.

Course Schedule

Date	Lecture Details	Readings	Assignment	Slides
09.04.25	No seminar	N/A
14.04.25	Part 1: Kickoff Meeting Part 2: From Monoliths to Microservices			Kickoff Logistics, Lecture 1
23.04.25	Part 1: Paper Discussion Part 2: The Tail at Scale	Blueprint: A Toolchain for Highly Reconfigurable Microservices	Assignment 1 released	Lecture 2
30.04.25	Part 1: Paper Discussion Part 2: Reliability Basics	Tales of The Tail: Past and the Future		Lecture 3
07.05.25	Part 1: Paper Discussion Part 2: The Pillars of Observability	If At First You Don’t Succeed, Try, Try, Again...? Insights and LLM-informed Tooling for Detecting Retry Bugs in Software Systems		Lecture 4
10.05.25	Assignment 1 Submission Deadline		Assignment 1 Deadline: 5pm CEST
12.05.25	Assignment 2 released		Assignment 2
14.05.25	No seminar
21.05.25	Part 1: Discussion Part 2: Of Failures and Incidents	Dapper, a Large-Scale Distributed Systems Tracing Infrastructure		Lecture 5
28.05.25	Part 1: Discussion Part 2: Cross System Interaction Failures	What bugs cause production cloud incidents?		Lecture 6
30.05.25	Assignment 2 Due; Assignment 3 released		Assignment 3
04.06.25	Part 1: Discussion Part 2: Dealing with Metastability (Load Shedding Techniques)	Fail through the Cracks: Cross-System Interaction Failures in Modern Cloud Systems Metastable Failures in the Wild		Lecture 7
11.06.25	Part 1: Discussion Part 2: Root Cause Analysis	Analyzing Metastable Failures		Lecture 8
18.06.25	Part 1: Discussion Part 2: Testing & Formal Methods	Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems	Assignment 3 due; Assignment 4 released	Lecture 9
25.06.25	Part 1: Discussion Part 2: Resource Utilization	Executing microservice applications on serverless, correctly Building Reliable Cloud Services Using P# (Experience Report)		Lecture 10
02.07.25	Part 1: Discussion Part 2: Hardware Reliability	Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms		Lecture 11
09.07.25	Part 1: Data Center Design Part 2: Discussion	RAS: Continuously Optimized Region-Wide Datacenter Resource Allocation		Lecture 12
16.07.25	Demos and Presentations		Assignment 4 due