Snowpack: Efficient Parameter Choice for GPU Kernels via Static Analysis and Statistical Prediction
Workshop: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Authors: Ignacio Laguna (Lawrence Livermore National Laboratory)
Abstract: The running time of GPU kernels depends on an invocation parameter, the number of threads in each thread block. Sometime the dependence is quite strong leading to 50-100% change in execution time for long-running kernels. Until now, it has been an art form to decide on the optimal setting for this parameter. Nvidia provides a tool for CUDA kernels, called OCC, that guides a developer toward this goal. In this paper, we show that OCC maximizes occupancy of GPU cores but does not meet the performance goal in a wide class of applications. We develop a solution called Snowpack that uses static features in a statistical learning framework to choose the optimal block size parameter. It does this without needing to execute the kernel multiple times, as a possible alternate solution Autotuner does. We evaluate our solution, Snowpack, on 89 kernels of 10 applications.