Topics

Topic 1: On-demand shaping of the spectrum of a laser via an acousto-optical modulator

Acousto-optical modulators (AOMs) change the spectrum of an incoming laser beam depending on an applied radio frequency (rf) signal. The input-output relation is typically non-linear, and the output spectrum cannot be easily predicted. Instead, the output can be measured by interferometry and used as input of a feedback loop. We are looking for a learning algorithm that dynamically derives the required rf spectrum in order to obtain a desired output spectrum of the light.

Data will have to be produced, for which there are two options. Either the whole problem can be simulated, at various degree of complexity (and difficulty, from easy to advanced). Or an experimental set-up can be built, with support of the Team, that already implements the apparatus (again, at various degree of difficulty). The choice depends on background, aspiration, and skill of the students.

Project Donor: Prof Dr. Jürgen Eschner and team (Experimental Physics - Quantum Photonics)

-------------------------------------------------------------------------------------------------------------

Topic 2: Minimization of the cost of a network in the presence of dynamical constraints

The task is to numerically analyse the problem of finding the optimal path connecting two nodes of a network in the presence of dynamical constraints. The constraints penalize one route over the other as a function of time, for instance by modulating their lengths or the dissipation along them. Consequently, the cost associated with the different routes changes dynamically. Whether the costs are minimized, indicating that transport adapts to the best route, depends on the properties of the algorithm and the rate of modulation.

The analysis will be performed by means of an algorithm that is inspired by the food search of the slime mold Physarum polycephalum, a primitive yet remarkable organism that is capable of solving optimization problems. The algorithm is based on an activation function that determines the capacity of an edge by means of a nonlinear function of the flow across the edge.

For a specific activation function (sigmoidal), the supervising team showed that, in the presence of stochastic forces, transport can dynamically adapt to time-dependent conditions. In particular, there is an optimal mean strength of the stochastic forces for which transport synchronizes with time-periodic constraints. The students will extend this analysis to a class of activation functions in the presence of noise, studying when transport chooses the instantaneous optimal route. The students will implement the algorithm of in Python. Changing the activation function consist in varying an exponent in the equations. Stochastic effects are simulated within stochastic differential equations, which the students will learn to solve numerically. Adaptivity of the solution will be assessed by analysing correlations between transport and the modulation of the constraints.

Data is available and will be made accessible.

Project Donor: Prof. Dr. Giovanna Morigi and team (Theoretical physics)

-------------------------------------------------------------------------------------------------------------

Topic 3: Time-sensitive computer vision approaches for filling gaps in mobile coverage maps

Mobile internet has propelled billions of people in the world into the digital age within the past 20 years. As we can witness every day, “going digital” has altered our behaviour in many ways, from doomscrolling to swiping. Strikingly, the channels how mobile internet influences us are still little understood. While there are multiple layers to the story (e.g. Instagram can foster economic productivity for one person while it is a sole source of endless entertainment for another), they all rely on the same necessary condition: access to the internet. As many countries in the world leap-frogged fixed internet networks, mobile networks are the major access point to the internet for billions of people. However, data on how those mobile networks evolved is patchy.

This project sets out to fill those gaps by creating a time-series of mobile coverage maps for every country in the world for the past 25 years based on 288 existing coverage maps (stored as .geotiff files) that can be used as training data. The students will implement 2-3 different computer vision approaches to address this prediction problem, implement appropriate evaluation metrics and present the results of their comparative analysis in a report. All necessary data is available and will be provided.

Project Donor: Prof. Dr. Ingmar Weber / Dr. Till Koebe (Societal Computing)

-------------------------------------------------------------------------------------------------------------

Topic 4: Modelling user access costs to mobile internet around the globe

For people around the world to reap the digital dividend, being connected to the internet is essential. However, there are two sides to connectivity: Having a network in place that provides connectivity and having an end user device available to access it. For most people, the mobile phone constitutes the primary access point to the world wide web. Thus, to understand the global digital divide, understanding access costs are key. While country-level time-series data on the prices of mobile internet are readily available, similar cannot be said for mobile phone prices.

This project sets out to scrape local market information for mobile phones on a country-by-country basis for different price segments (basic feature phones, basic smartphones and more expensive smartphones) to come up with a global price database and a data pipeline to re-run data collection at a later point in time.

Project Donor: Prof. Dr. Ingmar Weber / Dr. Till Koebe (Societal Computing)

-------------------------------------------------------------------------------------------------------------

Topic 5: Hybrid ML models for the segmentation of materials science image data

When analyzing the microstructures of materials, data with very large variances are generated due to differences in the chemical composition of the material, manufacturing process parameters or microscope settings during analysis. These variances can make machine learning-based evaluations, in this case the classification and segmentation of microstructure images, more difficult. This metadata of the microstructure images (chemical composition, process parameters, microscope settings, etc.) has not yet been taken into account in ML evaluations, although the metadata can explain a large part of the variances that occur and it can therefore be assumed that this metadata could improve the performance of the ML models in certain applications.

The students' task is therefore to program a hybrid segmentation model based on a U-Net that processes images and tabular metadata. ML approaches from medicine, where image data is linked to patient files, can provide inspiration. Two annotated data sets with associated tabular metadata are provided. These datasets will be used to compare the accuracy of microstructure segmentation with and without metadata.

Project Donor: Prof. Dr. Frank Mücklich / Dr. Martin Müller (Functional Materials)

-------------------------------------------------------------------------------------------------------------

Topic 6: Use of AI to segment the cell shapes of migrating cells

Cell migration is important in multiple biological contexts, such as wound healing, the immune response or cancer spreading. To study the dynamics of migrating cells, it is important to extract morphological properties (such as the eccentricity) of the cells and correlate those structural parameter with the dynamics of cells. The morphology of migrating cells can be extracted by fluorescently labelling the cell shape during migration in a live cell tracking experiment. The automatized tool should help to process the time lapse movies and segment the cell shape based on the fluorescent signal. One further step is the segmentation of the cells based on phase contrast images, which can be challenging due to image quality and artifacts caused by phase contrast optics.

Images are available and are constantly generated.

Project Donor: Prof. Dr. Franziska Lautenschläger / Lukas Schuster (Biophysics)

-------------------------------------------------------------------------------------------------------------

Topic 7: Use of AI to segment and quantify microtentacles in circulating tumor cells

We are studying microtentacles, these are tubulin based structures that are found in malignant cells when they detach from the primary tumor and enter the blood stream in order to colonize other tissues and generate metastasis. Microtentacles protrude from the cell membrane and may vary in the amount and length depending on the cell line. We are interested in generate a tool that help us quantify the before mentioned parameters for each individual cell in our experiments, which usually consist of pictures where we can find several cells in the field of view. The microtentacles are fluorescently labeled and can be easily identified. We have available a pool of pictures we can provide already, and new data is constantly being generated.

Images are available and are constantly generated.

Project Donor: Prof. Dr. Franziska Lautenschläger / Enrique Colina (Biophysics)

-------------------------------------------------------------------------------------------------------------

Topic 8: Preserving Safety in Fine-Tuned LLMs: Evaluating the Impact of Different Fine-Tuning Methods

Recent research has shown that LLM safety guardrails—established through costly RLHF processes—can be easily compromised with minimal fine-tuning on harmful data. Even fine-tuning on benign data can degrade a model’s safety. As more users fine-tune models for specific applications, this raises serious risks for end-users. However, one unexplored question is whether different fine-tuning methods vary in their ability to preserve a model’s safeguards. Specifically, this project aims to answer the question:

Which fine-tuning method is most effective at preserving safeguards when fine-tuned on benign data?

For that, we will investigate various fine-tuning approaches, including Self-Supervised Fine-Tuning, Direct Preference Optimization, and combinations of both. Additionally, we may explore efficiency-focused techniques such as LoRA, QLoRA, and full fine-tuning. The resulting models will be evaluated and compared in terms of their safety performance based on established safety benchmarks such as AgentHarm.

We will likely use open models and fine-tune and evaluate them on publicly available and commonly used datasets.

Project Donor: Prof. Dr. Ingmar Weber (Societal Computing)

-------------------------------------------------------------------------------------------------------------

Topic 9: Decoding logical errors on a quantum computer using neural nets

Quantum computing hardware is inherently noisy, which means that gate operations have a nonzero infidelity and errors that occur during the computation need to be corrected in real-time. A fault-tolerant quantum computation thus requires an efficient means to detect and correct errors. Quantum error correction achieves this by encoding a logical qubit redundantly in many physical qubits and by repeatedly measure error syndromes. These are binary variables that indicate the presence and potential location of an error. One critical step in quantum error correction is to infer the required correction from the obtained syndrome data. This decoding task can be phrased as a classification problem and can thus be solved using machine learning methods. In this project, you will use neural networks (e.g. a feedforward or a recurrent neural network) to perform this task for the surface code, which is the leading candidate to achieve quantum advantage.

Data will be generated during the project.

Project Donor: Prof. Dr. Peter P. Orth & Prof. Dr. Markus Bläser (AI & Quantum Computing)

-------------------------------------------------------------------------------------------------------------

Topic 10: Team and Player Strengths and Expected Goals

Ranking systems, such as the ELO algorithm and similar methodologies like Glicko, are widely used in sports ranging from individual competitions like chess, table tennis, or tennis, to team-sports like football. These systems aim to estimate player strength while enabling fair ranking systems even when only a limited subset of players compete against each other. While official ranking systems can only be fed by pure results for fairness reasons, player and team-strength models can also integrate potentially predictive factors like players age, height, handedness, etc. The objective of this project is to reimplement ELO ratings and official ranking systems using real-world-data and compare their predictability for further matches with own-designed player- / team-strength models in either Tennis, Football (soccer) or Cricket. Additionally, students are supposed to compare different elements of an Expected Goals (xG) model and optimize it on historic performances of teams and players.

Data for each sport is available.

Project Donor: Prof. Dr. Pascal Bauer & Luis Holzhauer (Sports Analytics)

-------------------------------------------------------------------------------------------------------------

Topic 11: Historical settlement and urban development in digital mapping: data extraction, analysis and visualisation of maps from the Saar region (20th/21st century)

The research project combines questions and topics of historical research with modern methods of data analysis and automation. It focuses on the computer-aided analysis of scanned historical maps from the 20th and 21st centuries, which document the workers' settlements and industrial cities in the Saar region at different points in time. The aim of the project is to use data extraction, analysis and visualisation techniques to gain new insights into the spatial, social and temporal development of the settlements and cities.

Possible tasks will include the (object) recognition and digitisation of settlement structures using image processing techniques, the use of clustering algorithms – for example to classify settlement types or development phases – and the comparison of historical development with socio-economic or economic changes in the region.

A main aspect of the project is the georeferencing of the scanned maps in order to transfer historical data into modern spatial contexts. This involves an interdisciplinary approach to maps as a visual medium of spatial and social transformation. The mapping and processing of these materials opens up new approaches to the analysis of settlement and urban development over a longer period of time and reveals structural continuities and discontinuities in the [urban and settlement] development of the Saar region.

Project Donor: Joana Baumgärtel, Dr. Birgit Metzger (Cultural and Media History)

-------------------------------------------------------------------------------------------------------------

Topic 12: Form Liberation to Polarisation? Commemoration and Discussion of May 8th on Social Media

May 8th, 2025 marks the 80th anniversary of the liberation from National Socialism and the end of the Second World War. As social media increasingly serve as arenas for the commemoration, negotiation as well as distortion of history, the project aims to explore how the anniversary and the historical event are debated, remembered and instrumentalised on Instagram und TikTok. By using different data collection and mining approaches, the project focuses on understanding how historical memory is shaped in the digital age – putting a special emphasis on how it might be influenced by present-day events such as the war in Ukraine or the political polarisation in the United States.

Given the challenges associated with data access to audiovisual social media, possible tasks are divided into two parts: (1) testing and evaluating different data collection methods, such as scraping and API usage, and (2) processing, visualisation and analysis of the gathered data using various NLP and ML techniques to identify topics, actors or networks. Through its interdisciplinary approach, the project offers a more comprehensive perspective on the impact of social media platforms and contemporary issues on historical memory and the perception of history.

Project Donor: Mia Berg (Cultural and Media History)

-------------------------------------------------------------------------------------------------------------

Topic 13: ‘Anarchists', 'terrorists', 'freedom fighters' – how do the media talk about social movements? The potential of historical press analysis

Words are powerful and shifts in discourse are politically relevant. We see this today not only in the context of migration or gender. It makes a big difference, for example, whether the media speak of ‘climate activists’, ‘Klimakleber’ or ‘climate terrorists’. This is why media discourse, and especially the press, is also a central source for historical research. However, this presents several challenges: although many European newspapers from the 19th and 20th centuries have been digitised in various formats, they are often not systematically searchable as full texts. Where a full-text search is possible, it is not clear how well it works. Finally, there is the question of how best to analyse this mass data and how data science tools can help, for example by recognising typical word combinations.

This leads to the possible tasks for the project group: 1) make existing texts searchable and analysable as well as possible using word recognition; 2) test how well existing word recognition works in different platforms and what error rates can be expected; 3) think about the development of tools for better analysis of such textual mass data with the project supervisors.

Project Donor: Prof. Dr. Fabian Lemmes (Cultural and Media History)

-------------------------------------------------------------------------------------------------------------

Topic 14: Multilingual summarization of court judgments

In this project students are expected to automatize the summarization of judgments of the European Court of Justice (in German: Europäischer Gerichtshof, EuGH) using large language models. The European Court of Justice publishes most of its judgments in all official languages of the European Union member states. The students’ task is to investigate whether a summarization approach of the judgments can benefit from being provided the same judgment in different languages at the same time. One possible solution could be to implement a RAG system which combines the different language versions of the judgments during indexing and retrieving. Another solution could be to provide the different language versions during prompting. Students are free to explore other possible solutions. An evaluation of the task can be conducted using automated metrics like ROUGE or BERTScore.

The data for this project is available at: https://huggingface.co/datasets/joelniklaus/eurlex_resources

Project Donor: Prof. Dr. Christoph Sorge, Bianca Steffes (Legal Informatics)

Project Seminar Data Science and Artificial Intelligence

Topics