Facing the Big Data Challenge in the Fusion Code XGC

Workshop: The 2nd International Workshop on Data Reduction for Big Scientific Data (DRBSD-2)
Authors: Choongseok Chang (Princeton University)

Abstract: Boundary plasma of a magnetic fusion reactor is far from a thermodynamic equilibrium, with the physics dominated by nonlinear multiscale multiphysics interactions in a complicated geometry, and requires extreme-scale computing for first-principles based understanding. The scalable particle-in-cell code XGC has been developed for this purpose, in partnership with the computer science and applied mathematics communities over the last decade. XGC’s extreme-scale capability has been recognized by being awarded several hundred million hours of computing time on all US three leadership class computers, and by being selected into all three pre-exascale/exascale programs: CAAR, NESAP, and AURORA ESP. The physics data size produced from a 1-day XGC run of ITER plasma on the present ~20PF computer is ~100PB, which is much above the limit imposed by the present technology. We are thus losing most of the valuable physics data in order to keep the data flow within the limits imposed by the I/O rate and the file-system size. Since the problem size will increase in proportion to the parallel computer capability, the challenge will grow at least 100-fold as the exascale computers arrive. Reduction of the data size by several orders of magnitude is required that can still preserve the accuracy for enabling various levels of scientific discoveries. On-the-fly, in-memory data analysis and visualization must occur at the same time. These issues, as well as the necessity to collaborate tightly with the applied mathematics and computer science communities, will be discussed from the application driver point of view.

Workshop Index