News
Guest Lecture by Michel Steuwer (TU Berlin)
Written on 28.01.2025 06:13 by Sebastian Hack
Hi all,
Michel Steuwer from TU Berlin is visiting today and he will give a guest lecture with the title: "How to design the next 700 optimizing compilers".
Abstract:
Massively parallel hardware combined with carefully optimized software has enabled the deep learning revolution. To deliver the efficiency and performance demanded by the next generations of AI applications a zoo of highly specialized hardware devices is developed. Writing software for these devices is highly complex and only the largest companies are able to make massive investments in such short-lived software that currently has to be rewritten and re-optimized for each new generation of hardware devices.
Current automatic optimizing compilers that turn high-level programs into low-level code often disappoint to deliver satisfying performance. They disappoint by failing to exploit the increasingly specialized hardware features, but they equally disappoint by failing to perform crucial high-level optimizations in many important, but less mainstream, application domains.
In this talk, I will present our approach for designing the next generation of optimizing compilers. These compilers systematically optimizing domain-specific applications for a diverse set of specialized hardware. They support a wide range of optimization use-cases ranging from using automatic and AI-driven design space exploration techniques to precise control of optimizations by performance engineers, and gradual combinations of automation and control. Key to our design is that we embrace extensibility and composability of both computations and optimizations. Computations are represented by a pattern-based intermediate representation. Fundamental building blocks are flexible generic patterns. This intermediate representation is easily extensible with domain- and hardware-specific patterns. Optimizations are composed of simple rewrite-rules either in a purposely build strategy language that allows to precise control of optimization strategies, or in a semi-automatic technique using equality saturation. The compiler is easily extensible with domain- and hardware-specific optimization strategies and experts are allowed to control the optimization process to various degrees.
I aim to demonstrate that this generic and flexible design achieves high-performance comparable to existing domain-specific compilers on existing massively parallel hardware and show exciting future research directions that opens up from our approach.