SC17 Denver, CO

Tracking and Analyzing Job-level Activity Using Open XDMoD, XALT and OGRT


Authors: Dr. Robert McLay (University of Texas; Texas Advanced Computing Center, University of Texas)

BP
Abstract: This BoF is for those interested in the increasingly important need to track and analyze activity on large-scale systems: usage, performance, and impact, down to the level of each individual job. Open XDMoD primarily displays aggregated data: it provides your own web portal to view, summarize, and analyze this data. We will discuss recent developments and improvements in XDMoD. XALT and OGRT are about collections: these tools are battle-tested tools focused on job-level usage data. They track executables and libraries with the lowest possible overhead. Join us for demos, discussions, and a wide-ranging exchange of information!

Long Description: Let's continue talking about real, high value cluster analytics at the level of each job. We're interested in what users are actually doing: from applications and libraries, to prevent things in the way of successful research. Moreover, we want to do this for every single job running on our systems. This year we're especially interested in some of the next challenges, including (1) understanding the needs of non-MPI workflows that comprise half the user community; (2) putting usage data in the hands of end users interested in records of their own job-level activity. Among the emerging needs: tracking individual usage within other frameworks such as Python; (3) combining job level activity with job level metrics to enable users and operations personnel to understand the resource requirements of applications and use resources more efficiently.

XALT (xalt.readthedocs.org) is a battle-tested tool focused on job-level usage data; it enjoys a history of helping decision makers manage and improve their operations. A current list of centers that run XALT includes CSCS, NCSA, UK NCC, KAUST, NICS and TACC. Version 2.0 is now ready to begin tracking non-MPI workflows.

OGRT (https://github.com/georg-rath/ogrt) started out as a way to apply the capabilities of XALT to non-traditional workloads, using non-traditional technologies. Its focus is on real-time tracking of job level activity with lowest possible overhead.

Join us a far-ranging discussion that will begin with an overview of new XALT and OGRT capabilities before it ventures into broader strategic and technical issues related to job-level activity tracking.

Conference Presentation: pdf


Birds of a Feather Index