A18: Understanding the Impact of Fat-Tree Network Locality on Application Performance
Supervisor: Ian Karlin (Lawrence Livermore National Laboratory)
Abstract: Network congestion can be a significant cause of performance loss and variability for many message passing programs. However, few studies have used a controlled environment with virtually no other extraneous sources of network traffic to observe the impact of application placement and multi-job interactions on overall performance. We study different placements and pairings for three DOE applications. We observe that for a job size typical for an LLNL commodity cluster, the impact of congestion and poor placement is typically less than 2%, which is less dramatic than on torus networks. In addition, in most cases, the cyclic MPI task mapping strategy increases performance and reduces placement sensitivity despite also increasing total network traffic. We also found that the performance difference between controlled placements and runs scheduled through the batch system was less than 3%.
ACM-SRC Semi-Finalist: no
Two-page extended abstract: pdf