Parallel Programming Languages, Libraries, Models and Notations
TimeMonday, November 13th4pm - 4:30pm
DescriptionStencil kernels are important, iterative computation patterns heavily used in scientific simulations and other operations such as image processing. The performance of stencil kernels is usually bound by memory bandwidth, and the common method of overcoming this is to apply Temporal Blocking (TB) as a form of bandwidth reducing algorithm. However, applying TB to existing code incurs high programming cost due to real-life codes embodying complex loop structures, and moreover, multitudes of parameters and blocking schemes involved in TB complicating the tuning process. We propose an automated, directive-based compiler approach for TB by extending the polyhedral compilation in the Polly/LLVM framework, significantly reducing programming cost as well as being easily subject to auto-tuning. Evaluation of the performance of our generated stencil codes on Core i7 and Xeon Phi show that the auto-generated stencil kernels achieve performance that is close to and often on par with hand TB-converted and optimized codes.