Graph500 on OpenSHMEM: Using a Practical Survey of Past Work to Motivate Novel Algorithmic Developments
Author/Presenters
Event Type
Workshop

Applications
Effective Application of HPC
Parallel Programming Languages, Libraries, Models and Notations
Performance
Programming Systems
SIGHPC Workshop
Scientific Computing
TimeMonday, November 13th11am - 11:30am
Location702
DescriptionGraph500 is an open specification of a graph-based benchmark for high-performance computing (HPC). The core computational kernel of Graph500 is a breadth-first search of an undirected graph. Unlike many other HPC benchmarks, Graph500 is therefore characterized by heavily irregular and fine-grain computation, memory accesses, and network communication. Therefore, it can serve as a more realistic stress test of modern HPC hardware, software, and algorithmic techniques than other benchmarking efforts.
On the other hand, OpenSHMEM is an open, PGAS, and SPMD specification of a communication model for communicating across large numbers of processing elements. OpenSHMEM explicitly focuses on applications characterized by fine-grain communication, of which Graph500 is one example.
Therefore, there is a natural synergy between the communication patterns of Graph500 and the capabilities of OpenSHMEM. In this work we explore that synergy by developing several novel implementations of Graph500 on various OpenSHMEM implementations. We contribute a review of the state-of-the-art in distributed Graph500 implementations, as well as a performance and programmability comparison between the state-of-the-art and our own OpenSHMEM-based implementations. Our results demonstrate improved scaling of Graph500's BFS kernel out to 1,024 nodes of the Edison supercomputer, achieving 2.5x performance improvement relative to the highest performing reference implementation at that scale.
On the other hand, OpenSHMEM is an open, PGAS, and SPMD specification of a communication model for communicating across large numbers of processing elements. OpenSHMEM explicitly focuses on applications characterized by fine-grain communication, of which Graph500 is one example.
Therefore, there is a natural synergy between the communication patterns of Graph500 and the capabilities of OpenSHMEM. In this work we explore that synergy by developing several novel implementations of Graph500 on various OpenSHMEM implementations. We contribute a review of the state-of-the-art in distributed Graph500 implementations, as well as a performance and programmability comparison between the state-of-the-art and our own OpenSHMEM-based implementations. Our results demonstrate improved scaling of Graph500's BFS kernel out to 1,024 nodes of the Edison supercomputer, achieving 2.5x performance improvement relative to the highest performing reference implementation at that scale.