P85: GPU Mekong: Simplified Multi-GPU Programming Using Automated Partitioning
Abstract: GPU accelerators are pervasively used in the HPC community, because they provide excellent computational performance at a reasonable power efficiency. While programming single-GPU applications is comparatively productive, programming multiple GPUs using data-parallel languages is tedious and error prone as the user has to manually orchestrate data movements and kernel launches.
The Mekong research project is driven by the motivation to improve productivity of multi-GPU systems by compiler based partitioning of single-device data-parallel programs. Key to scalable performance improvement is the resolution of data dependencies between kernels and the orchestration of these kernels. Mekong relies on polyhedral compilation to identify memory access patterns in order to compile a single-GPU application into a multi-GPU application.
In this work, the Mekong project is introduced and its components explained. While the tool is still under development, preliminary results are available and are shortly discussed demonstrating the potential of this approach.
Award: Best Poster Finalist (BP): no
Two-page extended abstract: pdf