DescriptionWe refactor and optimize the entire Community Atmosphere Model (CAM) to the full system of the Sunway TaihuLight, and provide a petascale climate modeling performance. We scale the CAM to 1.5 million cores with a simulation speed of 2.81 simulated years per day using OpenACC directives at the first stage. We then apply a more aggressive and challenging finer-grained redesign of the HOMME dynamical core, to achieve finer memory control, more efficient vectorization and overlap between computation and communication. Besides, a register communication based parallelism scheme is proposed to minimize the data dependencies in the modules. By doing so, our optimized kernels running on a 260-core Sunway processor outperform the established HOMME kernels on a platform with up to 184 Intel Xeon E5-2680V3 CPU cores. And our implementation has achieved a sustainable double-precision performance around 2.5 Pflops for a 0.75 km global simulation when using 8,519,680 cores.