SessionDeep Learning
Authors
Event Type
Paper

Accelerators
Deep Learning
Machine Learning
TimeTuesday, November 14th11:30am -
12pm
Location402-403-404
DescriptionTraining neural networks has become a big bottleneck.
For example, training ImageNet dataset on one Nvidia K20
GPU needs 21 days. To speed up the training process, the
current deep learning systems heavily rely on the
hardware accelerators. However, these accelerators have
limited on-chip memory compared with CPUs.
We use both self-host Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters.
We redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster than existing counterpart methods (i.e. Async SGD, Async MSGD, and Hogwild SGD) in all comparisons. Our Sync EASGD achieves 5.3X speedup over original EASGD on the same platform. We achieve 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.
We use both self-host Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters.
We redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster than existing counterpart methods (i.e. Async SGD, Async MSGD, and Hogwild SGD) in all comparisons. Our Sync EASGD achieves 5.3X speedup over original EASGD on the same platform. We achieve 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.
Download PDF:
here