P17: Fully Non-Blocking Communication-Computation Overlap Using Assistant Cores toward Exascale Computing
SessionPoster Reception
Authors
Event Type
ACM Student Research Competition
Poster
Reception

TimeTuesday, November 14th5:15pm - 7pm
LocationFour Seasons Ballroom
DescriptionA fully non-blocking optimized Communication-Computation overlap technique using assistant cores (AC), which are independent from the calculation cores, is proposed for the application to the five-dimensional plasma turbulence simulation code with spectral (FFT) and finite-difference schemes, toward exascale supercomputing. The effects of optimization are examined in Fujitsu FX100 (2.62PFlop/s) with 32 ordinary cores and 2 Assistant cores/node, where AC enables us to employ the fully non-blocking MPI communications overlapped by the thread-parallelized calculations with OpenMP Static scheduling with much less overheads. It is clarified that the combination of the non-blocking communications by AC and the static scheduling leads to not only reduction in OpenMP overhead, but also improved load/store and cash performance, where about 22.5% improved numerical performance is confirmed in comparison to the conventional overlap by the master thread communications with dynamic scheduling.