A01: GEMM-Like Tensor-Tensor Contraction (GETT)
Supervisor: Paolo Bientinesi (RWTH Aachen University)
Abstract: Tensor contractions (TC) are a performance critical component in numerous scientific computations. Despite the close connection between matrix-matrix products (GEMM) and TCs, the performance of the latter is in general vastly inferior to that of an optimized GEMM. To close such a gap, we propose a novel approach: GEMM-like Tensor-Tensor multiplication (GETT). GETT mimics the design of a high-performance GEMM implementation; as such, it systematically reduces an arbitrary tensor contractions to a highly-optimized "macro-kernel". This macro-kernel operates on suitably "packed" sub-tensors that reside in specified levels of the cache hierarchy. GETT's decisive feature is its ability to pack subtensors via tensor transpositions, yielding efficient packing routines. In contrast to previous approaches to TCs, GETT attains the same I/O cost as an equally-sized GEMM, making GETT especially well-suited for bandwidth-bound TCs. GETT's excellent performance is highlighted across a wide range of random tensor contractions.
ACM-SRC Semi-Finalist: no
Two-page extended abstract: pdf