P01: Cache-Blocking Tiling of Large Stencil Codes at Runtime
SessionPoster Reception
Event Type
ACM Student Research Competition
Poster
Reception

TimeTuesday, November 14th5:15pm - 7pm
LocationFour Seasons Ballroom
DescriptionStencil codes on structured meshes are well-known to be bound by memory bandwidth. Previous research has shown that compiler techniques that reorder loop schedules to improve temporal locality across loop nests, such as tiling, work particularly well. However in large codes the scope of such analysis is limited by the large number of code paths, compilation units, and run-time parameters. We present how, through run-time analysis of data dependencies across stencil loops enables the OPS domain specific language to tile across a large number of different loops. This lets us tackle much larger applications than previously studied: we demonstrate 1.7-3.5x performance improvement on CloverLeaf 2D, CloverLeaf 3D, TeaLeaf and OpenSBLI, tiling across up to 650 subsequent loopnests accessing up to 30 different state variables per gridpoint with up to 46 different stencils. We also demonstrate excellent strong and weak scalability of our approach on up to 4608 Broadwell cores.