A Scalable Analytical Memory Model for CPU Performance Prediction
Workshop: The 8th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computer Systems (PMBS17)
Authors: Gopinath Chennupati (Los Alamos National Laboratory)
Abstract: As the US Department of Energy (DOE) invests in exascale computing, performance modeling of physics codes on CPUs remain a challenge in computational co-design due to the complex design of processors including memory hierarchies, instruction pipelining, and speculative execution. We present Analytical Memory Model (AMM), a model of cache memory hierarchy, embedded in the Performance Prediction Toolkit (PPT) -- a suite of discrete-event-simulation-based co-design hardware and software models. AMM enables PPT to significantly improve the quality of its runtime predictions of scientific codes.
AMM uses a computationally efficient, stochastic method to predict the reuse distance profiles of codes, where reuse distance is a hardware architecture-independent measure of the patterns of virtual memory accesses. AMM relies on a stochastic, static basic block-level analysis of reuse profiles measured from the memory traces of applications on small instances. The analytical reuse distribution is useful to estimate the effective latency and throughput of memory access, which in turn are used to predict the overall runtime of a scientific application.
Our experimental results demonstrate the scalability of AMM, where the predicted and actual runtimes of three scientific mini-applications are similar.