Application of a Communication-Avoiding Generalized Minimal Residual Method to a Gyrokinetic Five Dimensional Eulerian Code on ManyCore Platforms
Workshop: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Abstract: A communication-avoiding generalized minimal residual (CA-GMRES) method is applied to the gyrokinetic toroidal five dimensional Eulerian code GT5D, and its performance is compared against the original code with a generalized conjugate residual (GCR) method on the JAEA ICEX (Haswell), the Plasma Simulator (FX100), and the Oakforest-PACS (KNL). Although the CA-GMRES method dramatically reduces the number of data reduction communications, computation is largely increased compared with the GCR method. To resolve this issue, we propose a modified CA-GMRES method, which reduces both computation and memory access by ~30% with keeping the same CA property as the original CA-GMRES method. The modified CA-GMRES method has ~3.8x higher arithmetic intensity than the GCR method, and thus, is suitable for future Exa-scale architectures with limited memory and network bandwidths. The CA-GMRES solver is implemented using a hybrid CA approach, in which we apply CA to data reduction communications and use communication overlap for halo data communications, and is highly optimized for distributed caches on KNL. It is shown that compared with the GCR solver, its computing kernels are accelerated by 1.47x ~ 2.39x, and the cost of data reduction communication is reduced from 5% ~ 13% to ~1% of the total cost at 1,280 nodes.