Snowpack: Efficient Parameter Choice for GPU Kernels via Static Analysis and Statistical Prediction
Author/Presenters
Event Type
Workshop

Algorithms
Exascale
Resiliency
SIGHPC Workshop
TimeMonday, November 13th11:30am - 11:50am
Location607
DescriptionThe running time of GPU kernels depends on an invocation parameter, the number of threads in each thread block. Sometime the dependence is quite strong leading to 50-100% change in execution time for long-running kernels. Until now, it has been an art form to decide on the optimal setting for this parameter. Nvidia provides a tool for CUDA kernels, called OCC, that guides a developer toward this goal. In this paper, we show that OCC maximizes occupancy of GPU cores but does not meet the performance goal in a wide class of applications. We develop a solution called Snowpack that uses static features in a statistical learning framework to choose the optimal block size parameter. It does this without needing to execute the kernel multiple times, as a possible alternate solution Autotuner does. We evaluate our solution, Snowpack, on 89 kernels of 10 applications.