SC17 Denver, CO

P31: Understanding the Performance of Small Convolution Operations for CNN on Intel Architecture

Authors: Alexander Heinecke (Intel Corporation), Evangelos Georganas (Intel Corporation), Kunal Banerjee (Intel Corporation), Dhiraj Kalmakar (Intel Corporation), Narayanan Sundaram (Intel Corporation), Anand Venkat (Intel Corporation), Greg Henry (Intel Corporation), Hans Pabst (Intel Corporation)

Abstract: Convolution layers are prevalent in many classes of deep neural networks, including Convolutional Neural Networks (CNNs) which provide state-of-the-art results for tasks like image recognition, natural language processing, and speech recognition. The computationally expensive nature of a convolution operation has led to the proliferation of implementations including matrix-matrix multiplication formulation, FFT-formulation, Winograd transformation, and direct convolution primarily targeting GPUs. In this paper, we optimize a direct convolution and Winograd implementation for x86 architectures, in particular for Xeon Phi systems, via a dynamic compilation approach. We then show how these JIT optimizations can be integrated in a high-level domain-specific language setting. We shed light on what is possible and what is not possible based on different data-formats and blocking techniques. Our JIT-based Ninja implementation shows close to theoretical peak results on modern x86 architectures, depending on setting and the CPU architecture at hand.
Award: Best Poster Finalist (BP): no

Poster: pdf
Two-page extended abstract: pdf

Poster Index