Modeling Large Compute Nodes with Heterogeneous Memories with the Cache-Aware Roofline Model
Event Type
Workshop

Accelerators
Benchmarks
Compiler Analysis and Optimization
Deep Learning
Effective Application of HPC
Energy
Exascale
GPU
I/O
Parallel Application Frameworks
Parallel Programming Languages, Libraries, Models and Notations
Performance
Simulation
Storage
TimeMonday, November 13th3:30pm - 4pm
Location704-706
DescriptionIn order to fulfill modern applications needs, computing systems become more powerful, heterogeneous, and complex. NUMA platforms and emerging high bandwidth memories offer new opportunities for performance improvements. However, they also increase hardware and software complexity, thus making application performance analysis and optimization an even harder task. The Cache-Aware Roofline Model (CARM) is an insightful, yet simple model designed to address this issue. It provides feedback on potential applications bottlenecks and shows how far is the application performance from the achievable hardware upper-bounds. However, it does not encompass NUMA systems and next generation processors with heterogeneous memories. Yet, some application bottlenecks belong to those memory subsystems and would benefit from the CARM insights. In this presentation, we fill the missing requirements to scope recent large shared memory systems with the CARM. We provide the methodology to instantiate, and validate the model on a NUMA system as well as on the latest Xeon Phi processor equipped with configurable hybrid memory. Finally, we show the model ability to exhibits several bottlenecks of such systems, which were not supported by CARM.
Author/Presenters