SC17 Denver, CO

P41: OpenCL-Based High-Performance 3D Stencil Computation on FPGAs

Authors: Hamid Reza Zohouri (Tokyo Institute of Technology), Artur Podobas (Tokyo Institute of Technology), Naoya Maruyama (RIKEN), Satoshi Matsuoka (Tokyo Institute of Technology)

Abstract: With the recent advancements in OpenCL-based High-Level Synthesis, FPGAs are now more attractive choices for accelerating High Performance Computing workloads. Despite their power efficiency advantage, FPGAs usually fall short in terms of sheer performance against GPUs due to having multiple times lower memory bandwidth and compute performance. In this work, we show that due to the architectural advantage of FPGAs for stencil computation, apart from power efficiency, these devices can also offer comparable performance to high-end GPUs. We achieve this goal using a parameterized OpenCL-based implementation that employs both spatial and temporal blocking, and multiple advanced FPGA-specific optimizations to maximize performance. We show that it is possible to achieve up to 60 GBps and 230 GBps of effective throughput for 3D stencil computation on Intel Stratix V and Arria 10 FPGAs, respectively, which is comparable to a highly-optimized implementation on high-end GPUs.
Award: Best Poster Finalist (BP): no

Poster: pdf
Two-page extended abstract: pdf

Poster Index