P08: Performance Optimization of Matrix-free Finite-Element Algorithms within deal.II
Abstract: We present a performance comparison of highly tuned matrix-free finite element kernels from the deal.II finite element library on three contemporary computer architectures, an NVIDIA P100 GPU, an Intel Knights Landing Xeon Phi, and two multi-core Intel CPUs. The algorithms are based on fast integration on hexahedra using sum factorization techniques. On Cartesian meshes with a relatively high arithmetic intensity, the four architectures provide a surprisingly similar computational throughput. On curved meshes, the kernel is heavily memory bandwidth limited which reveals distinct differences between the architectures: the P100 is twice as fast as KNL, and almost four times as fast as the Haswell and Broadwell CPUs, effectively leveraging the higher memory bandwidth and the favorable shared memory programming model on the GPU.
Award: Best Poster Finalist (BP): no
Two-page extended abstract: pdf