Knowledge Shared is Knowledge Gained: the 1st CEEC Community Workshop
This past December 13th, CEEC held its first annual community workshop at our consortium partner Friedrich-Alexander-Universität in Erlangen, Germany and online. At least once a year, CEEC holds community workshops where we share our progress and gained knowledge from our past year of work. This year’s workshop was titled “Energy-Efficient, Fault Resilient, and Scalable Solvers for CFD Codes”.
Among the 19 in person and 26 virtual attendees were representatives from every CEE partner institution and researchers across the European CFD ecosystem. Topics covered included the progress we’ve made using optimised multi-grid methods to maximise parallelism, mixed precision algorithms to decrease time to solution, monitoring to increase fault resilience, and more. The full slides and a summary of each talk are below along with the starting timestamp for watching each talk in the embedded video!
It’s not quite the same as being there with us, but if you like what you see here, we encourage you to sign up for the next community workshop on March 25, 2024!
Watch the Workshop
- Parallel Multigrid Solvers
- Optimizing Multigrid Solvers
- Mixed Precision
- Fault-Resilient Algorithms
- Adaptive Mesh Refinement
- Topology Optimization
Scalable task-parallel multigrid solvers on GPUs
- Starting at 0:00
We present a novel formulation of an additive overlapping Schwarz multigrid method. The new formulation exploits all available task parallelism for increased GPU utilization and improved sustained strong scalability.
Optimizing multigrid solvers using grammar-guided genetic programming
- Starting at 24:00
Multigrid methods despite being known to be asymptotically optimal algorithms, depend on the careful selection of their individual components for efficiency. Also, they are mostly restricted to standard cycle types like V-, F-, and W-cycles. We use grammar rules to generate arbitrary-shaped cycles, wherein the smoothers and their relaxation weights are chosen independently at each step within the cycle. We observe that the optimized flexible cycles provide higher efficiency and better performance than the standard cycle types.
A tool-driven approach toward mixed-precision and sustainable solvers for CFD codes
- Starting at 48:00
In scientific computing, we solve problems expressed by equations using computers. Often, these are supercomputers built of hundreds or thousands of processors to solve larger, challenging problems. Generally, we operate with finite representations of numbers and operations that lack associativity. To compensate, we strive to use max double precision. In CEEC, we propose a sustainable and reliable approach for energy-efficient CFD codes: scan the code with an arithmetic tool to determine its actual precision needs, re-design the algorithm with mixed-precision, and verify the improved code in applications.
Fault-resilient algorithms for ExascaleCFD
- Unfortunately not recorded
To assure fault resilient runs of CFD codes, we implement a dynamic checkpointing method, which writes a checkpoint file when there is a significant likelihood of a system failure as detected by in-band hardware metrics. The performance monitoring tool LIKWID gathers these metrics as the simulation runs and they are then assessed by a Slurm dependency job. If either the performance or temperature fails to meet our criteria, it is flagged as an indication of a potential future failure. Subsequently, a signal is sent to Neko to promptly create a restart file and terminate the process. Concurrently, the dependency job is queued, ready to resume the simulation from the restart file.
h-type adaptive mesh refinement in spectral-element solvers: Application to Nek5000 and Neko
- Starting at 1:15:07
CEEC and Excellerat P2 collaborate on implementing h-type adaptive mesh refinement (AMR) framework in a spectral element method (SEM) CFD solver Neko. This feature is vital for the extreme-scale simulations, as accuracy of the result depends mostly on the mesh quality, and the mesh structure may not be known a priori. Various aspects of AMR for SEM solvers are discussed based on our previous work with Nek5000 code, that could be seen as a predecessor of Neko.
Topology optimization in spectral-element codes
- Starting at 1:32:42
We give a general introduction to topology optimization, showing the basic steps required to achieve a successful design, showcase selected major milestones by CEEC researchers (GPU-based topology optimization 2009; Giga-voxel computational morphogenesis 2017, 2020), present preliminary work (in NEK5000) delaying the laminar-turbulent transition by optimizing superhydrophobic surfaces, and discuss current steps to get acquainted with NEKO, identifying what needs to be reimplemented and redesigned, respectively.