State of the Practice: Energy and Power Aware Job Scheduling and Resource Management (EPA-JSRM)
Authors: Natalie Bates (Energy Efficient HPC Working Group)
Abstract: Supercomputing centers are beginning a transition to “dynamic power and energy management.” Cost control is a major factor. Supercomputer systems have increasingly rapid, unpredictable and large power fluctuations. In addition, electricity service providers may request supercomputing centers to change their timing and/or magnitude of demand to help address electricity supply constraints. To adapt to this new landscape, centers may employ JSRM strategies to dynamically and in real-time control their electricity demand. This BoF presents results of a global survey of supercomputing centers using these JSRM strategies and seeks those interested in sharing experiences with dynamic power and energy management.
Long Description: **Motivation
This is one of the first efforts within the HPC community that attempts to bring together representatives of multiple supercomputing centers across the globe with the sole purpose of discussing energy and power-aware job scheduling and resource management (EPA-JSRM). As a preparation for this work, the organizers have carried out an extensive survey of multiple sites that include - CEA, Cineca, KAUST, LRZ, Riken, STFC, Tokyo Institute of Technology, University of Tokyo and University of Tsukuba (JCAHPC), LANL and Sandia National Laboratories (Trinity).The goal of this survey has been to collect and analyze information regarding:
--Motivation behind investing in EPA-JSRM related activities
--Target infrastructure that is expected to be controlled by JSRM frameworks (e.g. site-wide power budget, cooling capacity, etc.)
--Workload characteristics of the systems
--Adopted design for EPA-JSRM
--Application/task level and topology-aware solutions
--Results and challenges
--Next steps including system procurement
For a two-way interaction between the BOF attendees and the survey topics, we plan on plan on inviting the representatives of the organizations that participated in the survey. The purpose is to provide the audience an opportunity to learn about the different mechanisms being funded and adopted by different global sites. This will also give the site representatives an opportunity for gauging the level of interest and acceptance of their proposed techniques by the general public - the end users of their systems.
A few weeks before the actual event, the organizers plan on publishing the information collected during the survey in the form of a white paper. The hope is to advertise the existence of this white paper on the website of the EE-HPC WG mailing list, website, and other PDLs including those that target the SC attendees. The hope is that the SC17 attendees will be aware of this effort before the actual start of the event. This post-event survey will direct the following questions to the attendees:
-- Do you know any other site, which is using or interested in dynamic power management in a production environment?
-- Are you aware of any other strategies for dynamic power management?
-- Do you have plans to use dynamic power management strategies and if yes, what are your main driving forces for this choice?
-- Please rate as high, medium or low the real-time responsiveness and megawatt impact of the above listed dynamic power management strategies
In gain additional feedback, the organizers plan on sharing a survey form to collect additional information from the audience attendees that did not have an opportunity of sharing their experiences during the original site-wide survey. The purpose of this exercise is to incorporate input from the audience who haven’t had an opportunity to participate in the first round of survey. The results of this questionnaire will be used by the EE_HPC_WG Demand Response Team and will be posted on the website (summary format) for the EE_HPC_WG.
We require a projector for laptops to display presentations and microphone/speakers for audience participation.
Conference Presentation: pdf
Birds of a Feather Index