Author/Presenters
Event Type
Workshop
Accelerators
Deep Learning
Exascale
GPU
Parallel Application Frameworks
Parallel Programming Languages, Libraries, Models
and Notations
SIGHPC Workshop
System Software
TimeSunday, November 12th11:30am -
12pm
Location505
DescriptionThe recent trend of rapid increase in the number of
cores per chip has resulted in vast amounts of on-node
parallelism. These high core counts result in hardware
variability that introduces imbalance. Applications are
also becoming more complex themselves, resulting in
dynamic load imbalance. Load imbalance of any kind can
result in loss of performance and decrease in system
utilization. In this paper, we propose a new integrated
runtime system that adds OpenMP shared-memory
parallelism to the Charm++ distributed programming model
to improve load balancing on distributed systems. Our
proposal utilizes an infrequent periodic assignment of
work to cores based on load measurement, in combination
with tasks created via OpenMP’s parallel loop construct
from each core to handle load imbalance. We demonstrate
the benefits of using this integrated runtime system on
the LLNL ASC proxy application Lassen, achieving
speedups of 50% over runs without any load balancing and
10% over existing distributed-memory-only balancing
schemes in Charm++.