DescriptionFast Fourier Transform (FFT) is one of the most important numerical algorithms widely used in numerous scientic and engineering computations. With the emergence of big data problems, however, it is challenging to acquire, process and store a sufficient amount of data to compute the FFT in the first place. Recently developed sparse FFT (sFFT) algorithm provides a solution to this problem. sFFT computes a compressed Fourier transform by using only a small subset of the input data, thus achieving significant performance improvement.
While the increase in the number of cores and memory bandwidth on modern architectures provide an opportunity to improve the performance through sophisticated parallel algorithm design, sFFT is inherently complex, and numerous challenges need to be addressed. Among all the challenges, sFFT falls into the category of irregular applications in which memory access patterns are indirect and irregular that exhibit poor data locality. In this paper, we explore data layout transformation algorithms to tackle the challenge. Our approach shows that an optimized and locality-aware parallel sFFT can perform 7x faster than the original sequential sFFT library on a multicore platform. This optimized locality-aware parallel sFFT is also approximately 10x faster than the parallel FFTW.