SC17 Denver, CO

P73: HPC Production Job Quality Assessment

Authors: Omar Aaziz (New Mexico State University), Jonathan Cook (New Mexico State University)

Abstract: Users of HPC systems would benefit from more feedback about the quality of their application runs, such as knowing whether or not the performance of a particular run was good, or whether the resources requested were enough, or too much. Such feedback requires more information to be kept regarding production application runs, and requires some analytics to assess any new runs. In this research, we assess the practicality of using job data, system data, and hardware performance counters in a near-zero overhead manner to assess job performance, in particular whether or not the job runtime was in line with expectations from historical application performance. We show over four proxy applications and two real application that our assessment is within 10% of actual performance.
Award: Best Poster Finalist (BP): no

Poster: pdf
Two-page extended abstract: pdf

Poster Index