DescriptionWith the explosion of big data, finding ways of compressing large datasets with multi-way relationship - i.e., tensors - quickly and efficiently has become critical in HPC.
High-order singular value decomposition (HOSVD) method provides us with the means to attain both extremely high compression ratio and low error rate through low-rank approximation.
However, parallelizing HOSVD efficiently on GPUs remains a challenging problem, largely due to the lack of a fast SVD implementation that can stream data to the limited GPU memory through the PCIe bottleneck.
Our work studies, optimizes, and then contrasts four different methods for calculating singular vectors for performance, weak/strong scalability and accuracy in the context of HOSVD. We also discuss ways of load balancing the problem across multiple GPUs on a single node, and discuss the pros and cons of these different algorithms for GPU acceleration.