Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices
Author/Presenter
Event Type
Workshop

Accelerators
Applications
Compiler Analysis and Optimization
Compilers
Parallel Programming Languages, Libraries, Models and Notations
Runtime Systems
TimeMonday, November 13th4:30pm - 5pm
Location712
DescriptionAccelerator devices are increasingly used to build large supercomputers and current installations usually include more than one accelerator per system node. To keep all devices busy, kernels have to be executed concurrently which can be achieved via asynchronous kernel launches. This work compares the performance for an implementation of the Conjugate Gradient method with CUDA, OpenCL, and OpenACC on NVIDIA Pascal GPUs. Furthermore, it takes a look at Intel Xeon Phi coprocessors when programmed with OpenCL and OpenMP. In doing so, it tries to answer the question whether the higher abstraction level of directive based models is inferior to lower level paradigms in terms of performance.
Author/Presenter