A Machine Learning Approach for Modular Workflow Performance Prediction
Author/Presenters
Event Type
Workshop

TimeMonday, November 13th3:50pm - 4:15pm
Location501
DescriptionScientific workflows provide an opportunity for declarative computational experiment design in an intuitive and efficient way. A distributed workflow is typically executed on a variety of resources and it uses a variety of computational algorithms or tools to achieve the desired outcomes. Such a variety imposes additional complexity in scheduling these workflows on large scale computers. As computation becomes more distributed, insights into expected workload that a workflow presents become critical for effective resource allocation. In this paper, we present a modular framework that leverages Machine Learning for creating precise performance predictions of a workflow. The central idea is to partition a workflow in such a way that makes the task of forecasting each atomic unit manageable and gives us a way to combine the individual predictions efficiently. We recognize a combination of an executable and a specific physical resource as a single module. This gives us a handle to characterize workload and machine power as a single unit of prediction. The modular approach of the presented framework allows it to adapt to highly complex nested workflows and scale to new scenarios. We present performance estimation results of independent workflow modules executed on the XSEDE SDSC Comet cluster using various Machine Learning algorithms. The results provide insights into the behavior and effectiveness of different algorithms in the context of scientific workflow performance prediction.