Jim Brandt's research interests are in resource-aware computing. He is the HPC Monitoring team lead at Sandia National Laboratories and leads the OVIS project. OVIS seeks to determine dynamic behavioral charaterizations of resource state and workload demands and to use these characterizations for understanding performance issues and failures and for driving decision support and automated response.
OVIS's data collection and transport component, Lightweight Distributed Metric Service (LDMS), is the first platform-independent monitoring tool providing near-real-time, synchronized, high fidelity, system-wide awareness down to sub-second intervals across tens of thousands of nodes without adverse impact on running applications. LDMS is included in the NNSA Tri-lab operating stack (TOSS) and is used for continuous production monitoring on Tri-lab platforms and on NCSA’s Blue Waters (27,648 nodes). LDMS and the OVIS log file analysis tool, Baler, are being installed on the ACES Trinityplatform (20,000 nodes). LDMS is a 2015 R&D 100 Award Winner.