Tracking and Analyzing Job-level Activity Using Open XDMoD, XALT and OGRT
Authors: Dr. Robert McLay (University of Texas; Texas Advanced Computing Center, University of Texas)
Abstract: This BoF is for those interested in the increasingly important need to track and analyze activity on large-scale systems: usage, performance, and impact, down to the level of each individual job. Open XDMoD primarily displays aggregated data: it provides your own web portal to view, summarize, and analyze this data. We will discuss recent developments and improvements in XDMoD. XALT and OGRT are about collections: these tools are battle-tested tools focused on job-level usage data. They track executables and libraries with the lowest possible overhead. Join us for demos, discussions, and a wide-ranging exchange of information!
Long Description: Let's continue talking about real, high value cluster analytics at the
level of each job. We're interested in what users are actually doing:
from applications and libraries, to prevent things in the way of
successful research. Moreover, we want to do this for every single job
running on our systems. This year we're especially interested in some
of the next challenges, including (1) understanding the needs of
non-MPI workflows that comprise half the user community; (2) putting
usage data in the hands of end users interested in records of their
own job-level activity. Among the emerging needs: tracking individual
usage within other frameworks such as Python; (3) combining job level
activity with job level metrics to enable users and operations
personnel to understand the resource requirements of applications and
use resources more efficiently.
XALT (xalt.readthedocs.org) is a battle-tested tool focused on
job-level usage data; it enjoys a history of helping
decision makers manage and improve their operations. A current list of
centers that run XALT includes CSCS, NCSA, UK NCC, KAUST, NICS and
TACC. Version 2.0 is now ready to begin tracking non-MPI workflows.
OGRT (https://github.com/georg-rath/ogrt) started out as a way to apply the
capabilities of XALT to non-traditional workloads, using non-traditional technologies.
Its focus is on real-time tracking of job level activity with lowest possible overhead.
Join us a far-ranging discussion that will begin with an overview of
new XALT and OGRT capabilities before it ventures into broader
strategic and technical issues related to job-level activity tracking.
Conference Presentation: pdf
Birds of a Feather Index