Cross-Layer Allocation and Management of Hardware Resources in Shared Memory Nodes

Authors: Dr. Emmanuel Jeannot (French Institute for Research in Computer Science and Automation (INRIA))

BP
Abstract: The goal of this BoF is to gather the community (from runtime system to compilers) working in the area of hardware resource allocation for threads. We will discuss this problem, share visions, propose solutions, and coordinate a worldwide effort.

We will consider all the resources of a shared memory node and discuss how to coordinate resource sharing by different parts of the software stack to avoid competition for these resources.

Participants will be able to provide their own vision through discussions. The goal is to come up with a document specifying possible solutions and discuss possible implementations.

Long Description: Even if supercomputers are composed of distributed memory nodes, efficiently managing each node is a key issue for performance. As we are facing a significant increase of the number of cores on each node and a deep memory hierarchy, allocating and managing the threads that are executed on the cores is a challenge that requires cooperation and coordination between the different components of the software stack. The goal is to consider all shareable resources of a node: cores, memory, power, cache, etc.

For instance, at the application level some part of the application may use pthreads to program its concurrency. However, it may also rely on computational libraries (e.g. MKL) that are also multi-threaded. Moreover, other parts of the application may use OpenMP. Furthermore, an MPI runtime system might have internal parallelism (e.g. use a progress thread for communication). Currently, each component of the application is unaware that other components are also using threads, causing potential over-subscription and poor performance. Beyond Linux affinity masks, there is no common mechanism to allow the different components to be aware of each other and co-operate in their use of HW resources.

Tools like hwloc provide a portable, static, view of the node topology but do not provide any intelligent strategy to share resources between different application components. Therefore, even if most of the element of the HPC software stack that query topology information use hwloc, we still lack a mechanism to allocate and manage HW resource allocation.

Many research team have identified this problem. This may be called application composition, dynamic topology management or topology-aware core selection, etc. In any case, the basic problem is the same. While there are a variety of proposed solutions, there is no agreement, and since the whole problem is one of co-operation, a common solution is required. We think it would be of great interest for the HPC community to meet and discuss the issue and possible solutions. The goal is to share the different visions, and ideas and then to coordinate a worldwide effort. The ultimate goal is to see whether we can come up with some common way to address the problem and standardize how information related to HW resoucre usage can be managed and expressed.

We think that SC is the perfect venue for hosting this first BoF on the topic as it is the only place where all of those who need to be involved will naturally be present. To succeed, we need to involve end users, MPI implementers, OpenMP implementers, implementers of other parallel runtimes (C++17, ...) as well as those implementing HW resource detection code, and others. Many people from the community have already shown their interest in this BOF. They span the whole community from the US (ANL, Sandia, UTK), Europe (Inria, CEA, RTW Aachen, LMU München) and Asia (Tokyo Riken) as well as industry (Intel, Bull), etc.

For more details, see: http://www.labri.fr/perso/ejeannot/SC_BOF/SC17_BOF.html

Conference Presentation: pdf

Birds of a Feather Index