Concurrent parallel processing on Graphics and Multicore Processors with OpenACC and OpenMP
Presenter
Event Type
Workshop

Accelerators
Compilers
Parallel Programming Languages, Libraries, Models and Notations
Runtime Systems
TimeMonday, November 13th2:30pm - 3pm
Location712
DescriptionHierarchical parallel computing is rapidly becoming ubiquitous in high performance computing (HPC) systems. Programming models used commonly in turbomachinery and other engineering simulation codes have traditionally relied upon distributed memory parallelism with MPI and have ignored thread and data parallelism. This paper presents methods for programming multi-block codes for concurrent computational on host multicore CPUs and many-core accelerators such as graphics processing units. Portable and standardized language directives are used to expose data and thread parallelism within the hybrid shared- and distributed-memory simulation system. A single-source, multiple-object strategy is used to simplify code management and allow for heterogeneous computing. Automated load balancing is implemented to determine what portions of the domain are computed by the multicore CPUs and GPUs. Benchmark results show that significant parallel speed-up is attainable on multicore CPUs and many-core devices such as the Intel Xeon Phi Knights Landing using OpenMP SIMD and thread parallel directives. Modest speed-up, relative to a CPU core, was achieved with OpenACC offloading to NVIDIA GPUs. Combining both GPU offloading with multicore host parallelism improved the single-device performance by 30% but further speed-up was not realized when more heterogeneous CPU-GPU device pairs were included.