Short Introduction: This talk reviews the history of the changing “balances” between computation, memory latency, and memory bandwidth in deployed HPC systems, then discusses how the underlying technology changes led to these market shifts. Key metrics are the increasing relative “cost” of memory accesses and the massive increases in concurrency that are required to obtain increasing memory throughput.
Invited Speaker Dr. John D. McCalpin is a Research Scientist in the High Performance Computing Group and Co-Director of the Advanced Computing Evaluation Laboratory (ACElab) at the Texas Advanced Computing Center (TACC) of the University of Texas at Austin.
The “Attack of the Killer Micros” began approximately 25 years ago as microprocessor-based systems began to compete with supercomputers (in some
application areas). It became clear that peak arithmetic rate was not an adequate measure of system performance for many applications, so in 1991
Dr. McCalpin introduced the STREAM Benchmark to estimate “sustained memory bandwidth” as an alternative performance metric.
STREAM apparently embodied a good compromise between generality and ease of use and quickly became the “de facto” standard for measuring and reporting sustained memory bandwidth in High Performance Computing systems.
Since the initial “attack”, Moore’s Law and Dennard Scaling have led to astounding increases in the computational capabilities of microprocessors.
The technology behind memory subsystems has not experienced comparable performance improvements, causing sustained memory bandwidth to fall behind.
This talk reviews the history of the changing balances between computation, memory latency, and memory bandwidth in deployed HPC systems, then discusses how the underlying technology changes led to these market shifts. Key metrics are the exponentially increasing relative performance cost of memory accesses and the massive increases in concurrency that are required to obtain increased memory throughput.
New technologies (such as stacked DRAM) allow more pin bandwidth per package, but do not address the architectural issues that make high memory bandwidth expensive to support. Potential disruptive technologies include near-memory-processing and application-specific system implementations, but all foreseeable approaches fail to provide software compatibility with current architectures.
Due to the absence of practical alternatives, in the near term we can expect systems to become increasingly complex and unbalanced, with constant or slightly increasing per-node prices. These systems will deliver the best rate of performance improvement for workloads with increasingly high compute intensity and increasing available concurrency.
About the Speaker:
Dr. John D. McCalpin is a Research Scientist in the High Performance Computing Group and Co-Director of ACElab at TACC of the University of Texas at Austin. At TACC, he works on performance analysis and performance modeling in support of both current users and future system acquisitions.
McCalpin joined TACC in 2009 after a 12-year career in performance analysis and system architecture in the computer industry. This included three years at SGI (performance analysis and optimization on the SGI Origin 2000 and performance lead on the architecture team for the SGI Altix 3000), six years at IBM (performance analysis for HPC, processor and system design for Power4/4+ and Power5/5+, high-level architecture for the Power7-based PERCS Prototype system), and three years at AMD (technology lead for the “Torrenza” program enabling third-party accelerated computing technologies).
Prior to his industrial career, McCalpin was an oceanographer (Ph.D., Florida State), spending six years as an assistant professor at the University of
Delaware engaged in research and teaching on numerical simulation of the large-scale circulation of the oceans.
In 1991, he developed the STREAM Benchmark to provide a simple way to measure and report sustainable memory bandwidth, and for 25 years has been an advocate for a multidimensional approach to understanding performance in HPC.
In 2015 he was named an “Intel Black Belt Software Developer” in recognition of his contributions to the Intel Software Developer communities.