When fluid dynamics meets European hardware
1. Advancing Technological Sovereignty in Europe
One of the main goals of the European Processor Initiative (EPI )and European PILOT (EUPILOT) projects is to build a robust supercomputing ecosystem in Europe. This includes the development of two chip families: a general-purpose CPU—similar to what we find in personal computers, capable of handling generic computations—and a specialized accelerator, comparable to today’s GPUs, designed to accelerate scientific computing and artificial intelligence tasks.
The broader ambition of EPI is to lay the foundation for a European technology stack for high-performance computing, thereby reducing Europe’s dependency on non-European technologies. Currently, nearly all general-purpose CPUs are produced by Intel or AMD, and the vast majority of accelerators (GPUs) are developed by NVIDIA or AMD—companies based in North America.
Beyond strategic and geopolitical considerations, building such infrastructures requires more than just hardware. It also demands software development and testing to ensure compatibility with the scientific codes that have been developed over the past decades in areas such as computational fluid dynamics. For this reason, the CEEC project includes a dedicated work package focused on future technologies such as the co-design of CEEC’s scientific codes for the EPAC vector accelerator being developed within EPI.
2. RISC-V and Vector Processing: The Foundation of EPAC
The EPAC accelerator, developed as part of the EPI project, is built on the RISC-V architecture. RISC-V is an open Instruction Set Architecture (ISA) that aims to provide a flexible, extensible, and standardized foundation for processor design, making it particularly suitable for research, innovation, and industry adoption.
EPAC also features a powerful vector processing unit capable of executing a single instruction on vectors containing up to 256 elements. This architectural approach allows for significant acceleration of computations involving large datasets—such as the millions of cells used in fluid dynamics simulations—by packing data into 256-element vectors and processing them using just one instruction, rather than issuing 256 separate scalar operations.
For a refresher on CEEC and EPI work, check out our podcast episode from fall of 2024!
A critical enabler for achieving high performance on such architectures is the compiler. It must be capable of compiling general-purpose scientific codes (like those used in CEEC) and automatically inserting vector instructions that utilize the underlying vector unit. While it is technically possible to manually write low-level vector instructions, doing so would make the code architecture-specific and difficult to port. One of the key goals of the co-design process is to preserve portability, ensuring that the same code can run efficiently across multiple architectures. Otherwise, users would need a different version of each code for each processor type e.g. AMD, Intell, NVIDIA, and developers would need to maintain each of these versions seperately, which is clearly impractical.
3. Cross-Collaboration Between CEEC and EPI: A Co-Design Effort
The CEEC work package brings together three essential components:
- The EPAC vector accelerator developed in EPI,
- The system software, including the compiler and libraries that enable code vectorization for EPAC, and
- The fluid dynamics application codes from the CEEC scientific community.
The co-design process within CEEC focuses on studying how these codes can be vectorized for the EPAC architecture. This effort produces three critical types of feedback :
- For hardware developers: Kernels from CEEC codes have been used to verify and debug the second generation of the EPAC accelerator.
- For compiler developers: The complexity of CEEC codes has helped improve the compiler’s auto-vectorization capabilities by uncovering bugs and leading to the implementation of new features that will benefit not only CEEC but all future applications.
- For scientific code developers: While portability is a priority, co-design often reveals areas where the code is not fully generic or vector-friendly. In collaboration with CEEC developers, such issues are addressed to retain functionality while enhancing vectorization potential.
This co-design methodology involves several iterative steps:
- Code and input preparation for EPAC prototypes: These machines are often limited in memory and frequency, so simplified yet physically meaningful test cases are selected. This step includes code structure analysis and identifying computation-intensive sections likely to benefit from vectorization (e.g., loops).
- Compilation and execution on generic RISC-V systems: There are already RISC-V-based systems on the market. Although they lack EPAC’s wide vector units, they implement the base RISC-V ISA and can be used to verify that the scientific codes are RISC-V compatible and don’t rely on architecture-specific libraries.
- Compilation with the EPI compiler and analysis in emulation: The EPI compiler translates relevant loop structures into vector instructions tailored for the EPAC architecture. These binaries are then run in an emulator that provides insight into how efficiently the architecture is being used—for instance, the percentage of vector elements utilized. Often, small code changes can significantly improve utilization.
- Execution on EPAC hardware: While emulation reveals functional behavior, execution on real hardware allows precise performance measurement through hardware counters. Steps 3 and 4 are typically repeated across different code sections to incrementally optimize overall execution.
- Performance portability analysis: Although the co-design process involves agreed-upon code modifications, the final versions must still perform well on other high-performance computing architectures. Therefore, performance is measured across several systems to ensure that optimizations made for EPAC do not degrade portability.
4. Final Thoughts and Scientific Impact
Although it may seem like a purely engineering effort, analyzing the behavior of complex applications on novel architectures like EPAC is essential for advancing co-design practices. For this reason, such efforts are often concluded with contributions to the scientific community in the form of research papers, presentations, or posters.
During CEEC’s second year, BSC experts completed the vectorization study of Alya, one of the CEEC applications, and published the results, presenting them at the IPDPS conference in 2024.
The paper is available here:
Blancafort, Marc, et al. “Exploiting long vectors with a CFD code: a co-design showcase.” 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2024.
The vectorization efforts for the remaining CEEC codes are ongoing and will be included in the final project reports, as well as in future publications resulting from the collaboration between EPAC architecture experts and CEEC’s fluid dynamics code developers.
