P80: Adaptive Loop Scheduling with Charm++ to Improve Performance of Scientific Applications
SessionPoster Reception
Event Type
ACM Student Research Competition
Poster
Reception

TimeTuesday, November 14th5:15pm - 7pm
LocationFour Seasons Ballroom
DescriptionSupercomputers today employ a large number of cores on each node. The Charm++ parallel programming system provides an intelligent runtime which has been highly effective at providing dynamic load balancing across nodes of a supercomputer. Modern multi-core nodes present new challenges and opportunities for Charm++. The large degree of over-decomposition required may lead to high overhead. We modified the Charm++ Runtime System (RTS) to assign Charm++ objects to nodes, thus reducing over-decomposition, and spreading work across cores via parallel loops. We modify a library of the Charm++ software suite that supports loop parallelism by adding to it a loop scheduling strategy that maximizes load balance across cores while minimizing data movement. We tune parameters of the RTS and the loop scheduling strategy to improve performance of benchmark codes run on a variety of architectures. Our technique improves performance of a Particle-in-Cell code run on the Blue Waters supercomputer by 17.2%.