SC17 Denver, CO

A28: Exploring Use Cases for Non-Volatile Memories in Support of HPC Resilience


Student: Onkar Patil (North Carolina State University)
Supervisor: Frank Mueller (North Carolina State University)

Abstract: Improving resilience and creating resilient architectures is one of the major goals of exascale computing. With the advent of Non-volatile memory technologies, memory architectures with persistent memory regions will be a significant part of future architectures. There is potential to use them in more than one way to benefit different applications. We look to take advantage of this technology to enable more fine-grained and novel methodology that will improve resilience and efficiency of exascale applications. We have developed three modes of memory usage for persistent memory to enable efficient checkpointing in HPC applications. We have developed a simple API that is evaluated with the DGEMM benchmark on a 16-node cluster with independent SSDs on every node. Our aim is to build on this work and enable static and dynamic runtime systems that will inherently make the HPC applications more fault-tolerant and resistant to errors.
ACM-SRC Semi-Finalist: no

Poster: pdf
Two-page extended abstract: pdf


Poster Index