P62: How To Do Machine Learning on Big Clusters

Authors: Thomas Ashby (IMEC), Tom Vander Aa (IMEC), Stanislav Bohm (Technical University of Ostrava), Vojtech Cima (Technical University of Ostrava), Jan Martinovic (Technical University of Ostrava), Vladimir Chupakhin (Janssen Global Services LLC)

Abstract: Scientific pipelines, such those in chemogenomics machine learning applications, often compose of multiple interdependent data processing tasks. We are developing HyperLoom - a platform for defining and executing workflow pipelines in large-scale distributed environments. HyperLoom users can easily define dependencies between computational tasks and create a pipeline which can then be executed on HPC systems. The high-performance core of HyperLoom dynamically orchestrates the tasks over available resources respecting task requirements. The entire system was designed to have a minimal overhead and to efficiently deal with varying computational times of the tasks. HyperLoom allows to execute pipelines that contain basic built-in tasks, user-defined Python tasks, tasks wrapping third-party applications or a combination of those.
Award: Best Poster Finalist (BP): yes

Poster: pdf
Two-page extended abstract: pdf

Poster Index