Three become one: How supercomputers work together for fusion research

06.11.2025

Complex simulations such as plasma fusion require enormous computing power that exceeds the capacity of a single supercomputer. But what if the tasks were split between several supercomputers? A research team led by Dirk Pflueger has found a way for three supercomputers at different locations to work together so efficiently that they enable the largest simulations with minimal additional effort.

Understanding what happens during plasma fusion is one of the major research goals in relation to the energy transition. Plasma fusion, a process in which atomic nuclei fuse at temperatures of around 100 million degrees Celsius, could be an almost inexhaustible source of energy free from the dangers of nuclear fission. Before fusion power plants can become a reality, many problems still need to be solved. However, the construction even of test facilities such as the ITER tokamak in France costs several billion euros. Research into nuclear fusion therefore relies on simulations that can carry out these experiments using realistic models. 

 

Plasma fusion in brief

An interdisciplinary undertaking

Advancing such complex simulations demands an interdisciplinary team that combines expertise in high-performance computing, algorithm design, mathematical modeling, and physics. Under the leadership of Dirk Pflueger, Professor of Scientific Computing at the University of Stuttgart, the large DFG-funded project started back in 2013 together with colleagues from the University of Bonn, TU Munich, and the Max Planck Institute for Plasma Physics in Garching.

The GENE simulation code developed at the Max Planck Institute for Plasma Physics enables the team to simulate turbulence in magnetized plasmas. However, it has so far been possible to carry only out a tiny part of a complete simulation because even supercomputers are not able to process such enormous amounts of data. That would exceed their computing and storage capacity. However, mathematical methods can help to simplify the problem and thus reduce the amount of data without compromising the accuracy of the simulation.

Discretization is the process of converting continuous data into discrete, finite values or intervals. The aim is to make a continuously modeled problem manageable for processing by digital computers or to simplify the analysis.

“To bring the physics into the computer, we perform a standard discretization, meaning the problem is broken down by each dimension,” explains Pflueger. For example, a cubic domain can be divided into 10 grid points per direction. In two dimensions, this results in 10² grid points, and in three dimensions, 10³ grid points. “However, in plasma physics, the problem is no longer just three-dimensional but rather six-dimensional. That’s three spatial dimensions plus three velocity dimensions, giving 10⁶, plus time,” says Pflueger. The 10 grid points were also far from sufficient for plasma fusion. With 100 grid points and six dimensions, this amounts to one trillion points. “This is where we are faced with the curse of dimensionality, a major challenge for both discretization and subsequent simulation,” says Pflueger.

Methods for simplifying the problem

The researchers therefore applied a new discretization method, sparse grid technology, a model reduction approach for high-dimensional problems. Pflueger has already dealt with this numerical approach in his own dissertation and carried out important preliminary work. With several generations of doctoral students, he has further developed the method that makes it possible to mitigate the curse of dimensionality.

In sparse grid technology, the number of grid points is reduced to simplify computation. The method switches to a hierarchical representation, a change of basis, which allows a drastic reduction in the number of grid points. A sparse grid can also be represented by a suitable combination of partial solutions. The results are combined in such a way that a good approximation of the desired solution is achieved. This approach is well suited for parallelization and requires fewer computation points.

The scientists also set out to run the first simulation operating simultaneously on multiple supercomputers rather than a single one. To do this, they need a codebase that not only performs the simulation but also coordinates the interaction between the machines. “This poses a challenge for high-performance computing systems,” says Pflueger. “It requires careful consideration of how data is transferred and how computational load is balanced.”  

DisCoTec coordinates and distributes computing tasks across multiple supercomputers

The scientists in Pflueger’s team initially set up an open source code to enable the simulation to be distributed and executed on the various computing systems. Theresa Pollinger then made the breakthrough. She built on this preliminary work as part of her doctoral degree studies with Pflueger and used it to develop the DisCoTec software. DisCoTec uses sparse grid technology and distributes the simulation to the various supercomputers.

“I considered how to divide the work among the machines so that only minimal data exchange would be required. Because only a tiny fraction of the vast amount of data needs to be exchanged.” Her work was not just about writing code and testing code. “We examined questions such as: Does the performance match our expectations? Are the results still accurate? How can we make it faster? And then we started all over again,” says Pollinger. She also improved the underlying numerical method and developed new, suitable algorithms for parallelization, both within the supercomputers and beyond. “Only when the full capacity of the machine is used does it make sense to distribute the simulation further,” says Pollinger.

“Hawk”, “SuperMUC-NG”, and “JUWELS” working in concert

An experiment was designed to prove that simulations distributed across several supercomputers are actually possible. They partnered with the computing centers hosting the supercomputers “Hawk” at the High Performance Computing Center Stuttgart, “SuperMUC-NG” at the Leibniz Supercomputing Center in Garching near Munich, and “JUWELS” at Forschungszentrum Juelich. Pollinger encountered some rather mundane problems in the process. For example, the security measures at the computing centers do not permit data to be exchanged directly between computing nodes over the Internet. In addition, moving data around (i.e., the internal communication of computers) is now more expensive than computing itself.

I really put the IT support teams at the major computing centers through their paces.

Theresa Pollinger

“I really put the IT support teams at the major computing centers through their paces.” There were often special requests such as: ‘Could we repurpose your file transfer system for part of our simulation?’ or: ‘Can you give me your whole machine for exactly these three hours in a few months’ time? I think I was a bit annoying, and they must have thought: ‘Oh, another email from her’,” says Pollinger. Nevertheless, the support staff were highly supportive in making her research possible, and Pollinger is particularly grateful for this.

She eventually developed a six-dimensional geometric configuration of the workflow that resembles a star. This means that the computers do not have to communicate with each other as much. By optimizing load distribution, memory management, and parallel file writing, Pollinger was able to use computing resources much more efficiently.

The star shows how the data is divided between the three supercomputer systems. Each circle represents a calculation network in the experiment on the three supercomputers. The six corners represent the six dimensions. The blue nets are on Hawk, the pink nets on SuperMUC-NG, and the green nets on JUWELS. Where the colors meet, the data was transmitted between the machines via the Internet. Image: Theresa Pollinger.

The overlapping colored circles indicate where data exchange occurs. So where blue and pink or pink and green meet, communication must take place. Most don’t need to be communicated because they are contiguous areas on the same machine and do not have to be sent over the Internet,” says Pollinger.

Conquering the “curse of dimensionality”

The scientists have thus achieved a major breakthrough: They overcame the curse of dimensionality for their application. The concept of combining methods for reducing the problem and distributing the tasks across several supercomputers enables simulations on scales that would otherwise be impossible. “To my knowledge, this is the first time that someone has managed to run such a high-level simulation on several supercomputers at the same time,” says Pflueger.

Even though it was not yet a full plasma simulation but rather a somewhat simplified version, it was still large enough to simulate a tokamak. The scientists’ work shows that it is fundamentally possible to use several supercomputers as a coherent system to simulate complex processes. The results are relevant not only for plasma physics but also for other scientific disciplines such as finance, earthquake simulation, astrophysics, quantum chemistry, and climate research.

Manuela Mild | SimTech Science Communication

Project participants

Read more

Pollinger et al., (2025). DisCoTec: Distributed higher-dimensional HPC simulations with the sparse grid combination technique. Journal of Open Source Software, 10(106), 7018, https://doi.org/10.21105/joss.07018

Pollinger, A. V. Craen, P. Offenhaeuser and D. Pflueger, “Realizing Joint Extreme-Scale Simulations on Multiple Supercomputers-Two Superfacility Case Studies”, SC24: International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA, 2024, pp. 1–17, doi: 10.1109/SC41406.2024.00104.

Theresa Pollinger, Alexander Van Craen, Christoph Niethammer, Marcel Breyer, and Dirk Pflueger. 2023. Leveraging the Compute Power of Two HPC Systems for Higher-Dimensional Grid-Based Simulations with the Widely-Distributed Sparse Grid Combination Technique. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’23). Association for Computing Machinery, New York, NY, USA, Article 84, 1–14. https://doi.org/10.1145/3581784.3607036

About the scientists

Dirk Pflueger is Professor of Scientific Computing and Head of the Institute for Parallel and Distributed Systems at the University of Stuttgart. He also studied computer science with a minor in music theory there. He was particularly inspired by the opportunity to work with real artists in the algorithmic composition seminar. Pflueger developed his expertise in sparse grid technology through his doctoral research at the Technical University of Munich. To build on this research, he later led a DFG-funded project within the SPPEXA program (Software for Exascale Computing) together with his doctoral advisor Hans Bungartz (TU Munich), mathematician Michael Griebel (University of Bonn), and physicist Frank Jenko (Max Planck Institute for Plasma Physics). 

The aim of the program was to develop numerical methods that would enable large-scale simulations such as plasma fusion on supercomputers. In 2012, Pflueger began his appointment as the first SimTech Junior Professor for Scientific Computing at the University of Stuttgart. He has since advanced beyond his junior professorship. His research focuses on scientific computing, high-performance computing, high-dimensional approximation, and numerical machine learning.

Theresa Pollinger holds a bachelor’s degree in mechatronics and a master’s degree in computational engineering. She earned her doctorate in computer science and considers herself both an engineer and a computer scientist. She studied at FAU Erlangen–Nuremberg and earned her doctorate in Stuttgart under Dirk Pflueger, whom she first met at the joint summer academy of the University of Stuttgart, FAU Erlangen–Nuremberg, and TU Munich, where he taught a course combining hiking and programming. She now lives in Okinawa Prefecture, Japan, where she works in the Supercomputing Performance Research Team at the RIKEN Center for Computational Science. 

As part of her research, she explores the interplay among simulations, high-performance computing hardware, communication networks, algorithms, numerical models, predictive methods, and software along with their underlying concepts.

To the top of the page