SC17 Denver, CO

P17: Fully Non-Blocking Communication-Computation Overlap Using Assistant Cores toward Exascale Computing

Authors: Motoki Nakata (National Institute for Fusion Science), Masanori Nunami (National Institute for Fusion Science), Shinsuke Satake (National Institute for Fusion Science), Yoshihiro Kasai (Fujitsu Ltd), Shinya Maeyama (Nagoya University), Tomo-Hiko Watanabe (Nagoya University), Yasuhiro Idomura (Japan Atomic Energy Agency)

Abstract: A fully non-blocking optimized Communication-Computation overlap technique using assistant cores (AC), which are independent from the calculation cores, is proposed for the application to the five-dimensional plasma turbulence simulation code with spectral (FFT) and finite-difference schemes, toward exascale supercomputing. The effects of optimization are examined in Fujitsu FX100 (2.62PFlop/s) with 32 ordinary cores and 2 Assistant cores/node, where AC enables us to employ the fully non-blocking MPI communications overlapped by the thread-parallelized calculations with OpenMP Static scheduling with much less overheads. It is clarified that the combination of the non-blocking communications by AC and the static scheduling leads to not only reduction in OpenMP overhead, but also improved load/store and cash performance, where about 22.5% improved numerical performance is confirmed in comparison to the conventional overlap by the master thread communications with dynamic scheduling.
Award: Best Poster Finalist (BP): no

