Charting the PMIx Roadmap
Authors: Dr. Ralph Castain (Intel Corporation)
Abstract: The PMI Exascale (PMIx) community will be concluding its third year of existence this fall that included release of the PMIx v2.0 standard. We'll discuss what PMIx has accomplished over the past year and present a proposed roadmap for next year.
The PMIx community includes viewpoints from across the HPC runtime community. To that end, we solicit feedback and suggestions on the roadmap in advance of the session, and will include time for a lively discussion at the meeting.
So please join us at the BoF to plan the roadmap. New contributors are welcome!
Long Description: The Process Management Interface (PMI) has been used for quite some time
as a means of supporting HPC programming models, both for wireup
of communication channels and exchange of general application-level
information. However, the evolving effort to achieve exascale performance
and beyond has placed new strains on that support, and has introduced an ever
expanding set of requirements for interactions between applications and
the host resource manager.
PMI Exascale (PMIx) represents an attempt to address these issues by providing
an extended version of the PMI definition specifically designed to support clusters
up to and including exascale sizes.
PMIx supports the current PMI-1 and PMI-2 APIs and also:
(a) augments the APIs to eliminate current restrictions that impact scalability;
(b) extends the capability of applications to interact with the RM; and
(c) provides an optional standalone "convenience" library (including both client and server support) to
ease adoption of the desired capabilities while removing licensing issues that
exist in some current implementations.
New application-level features in v2.0 include:
* Reduced memory footprint:
- distributed approach to database organization;
- data scoping feature providing several levels of locality to describe a set of
processes that may be interested in the particular information;
- one instance of database per node with "zero-message" data access using shared-memory.
* Reduced amount of communication:
- data scoping helps to exclude local-only data from inter-node communication;
- Flexible collectives scoping: PMIx Fence operation can be performed under arbitrary subset
of processes in Job allocation, spanning several job steps.
- "direct modex" feature well-suited for applications with sparse communication graphs.
* notification of errors at the application and/or system level, including
warning of predicted failures for preemptive response;
* error response actions, including allocation of replacement resources and
launch of replacement processes;
* dynamic modification of allocations, including expansion and/or partial
release of the existing allocation, and new allocations for subsequent spawn
The proposed roadmap includes support in v3.0 for:
* positioning of files for use by the application or another job step
within the same allocated session;
* storage policies such as hot/warm/cold locations, burst buffer management,
and persistence of files and/or shared memory regions across job steps within
the same allocated session; and
* fabric QoS and security constraints, plus information on network topology.
In this BOF, we will present the current state of the PMIx effort,
describe its planned directions, and stimulate a discussion regarding
desired features and other elements of the roadmap. Here are some of
the highlights of what will be covered during the discussions:
- An overview of PMIx
- State-of-the-Union of RM and programming model support
- PMIx v2.0 release status and feature list
- PMIx v3.0 status and plans
We consider community interaction
vital to the future of PMIx and the development of the project's roadmap.
Be part of the discussion: come contribute to the roadmap and see how you can (and should!) join
Conference Presentation: pdf
Birds of a Feather Index