SC17 Denver, CO

SKA: The Data Domino Enabled by DALiuGE

Workshop: The 2nd International Workshop on Data Reduction for Big Scientific Data (DRBSD-2)
Authors: Andreas Wicenec (University of Western Australia, International Centre for Radio Astronomy Research)

Abstract: The Square Kilometre Array (SKA) will pose interesting new challenges on the way scientific computing is carried out. The processing will require to connect the antenna arrays in South Africa and Australia to dedicated 200PF scale HPC centres over some 700km WAN connections. The first part of the processing will be carried out on a sub-second cadence on data streams of about 1TB/s. The further processing will collect the data of a 6-12 hour long observation and perform an iterative image reconstruction. With current algorithms the bottleneck seems to be in memory bandwidth, and the level of data parallelism and inherent concurrency reaches levels of several tens of millions of tasks and data items to be scheduled and managed during a single processing run. The design of the SKA processing system includes an execution framework detailing the concepts of an architecture enabling the processing at SKA scale. Along with working on the design of this execution framework, we have also implemented a prototype to prove the viability of the proposed design decisions and extract the detailed requirements. The result of the prototyping work is called DALiuGE, which stands for 'Data Activated Flow Graph Engine’. DALiuGE implements most of the concepts required to perform the various radio astronomy workflows, while completely avoiding any unnecessary features. DALiuGE is completely generic and can be adopted to any kind of similar workflow problems. This talk will highlight the key concepts and solutions of DALiuGE and also present the results of test runs at scale.

Workshop Index