P05: ooc_cuDNN : A Deep Learning Library Supporting CNNs over GPU Memory Capacity

Authors: Yuki Ito (Tokyo Institute of Technology), Ryo Matsumiya (Tokyo Institute of Technology), Toshio Endo (Tokyo Institute of Technology)

Abstract: GPUs are widely used to accelerate deep learning with convolutional neural network (CNN). However, since GPU memory capacity is limited, it is difficult to implement efficient programs that compute large CNN on GPU. This poster describes the design and implementation of out-of-core cuDNN (ooc_cuDNN) library, which supports to compute CNN exceeding GPU memory capacity using capacity of CPU memory. ooc_cuDNN is an extension of cuDNN, which is high performance and popular deep learning library. ooc_cuDNN divides CNN computation based on its performance model for better performance. In addition, ooc_cuDNN provides fused functions combined some computation to reduce extra communication. With ooc_cuDNN, we successfully computed CNN requiring more than 60 GB memory on a single GPU with 16 GB memory. Compared with an in-core case using cuDNN, performance degradation was 13 %.
Award: Best Poster Finalist (BP): no

Poster: pdf
Two-page extended abstract: pdf

Poster Index