Author/Presenter
Event Type
Workshop

Accelerators
Applications
Compiler Analysis and Optimization
Compilers
Parallel Programming Languages, Libraries, Models
and Notations
Runtime Systems
TimeMonday, November 13th4:30pm -
5pm
Location712
DescriptionAccelerator devices are increasingly used to build
large supercomputers and current installations usually
include more than one accelerator per system node. To
keep all devices busy, kernels have to be executed
concurrently which can be achieved via asynchronous
kernel launches. This work compares the performance for
an implementation of the Conjugate Gradient method with
CUDA, OpenCL, and OpenACC on NVIDIA Pascal GPUs.
Furthermore, it takes a look at Intel Xeon Phi
coprocessors when programmed with OpenCL and OpenMP. In
doing so, it tries to answer the question whether the
higher abstraction level of directive based models is
inferior to lower level paradigms in terms of
performance.
Author/Presenter