Data sets and software are important by-products products of research in fields that depend upon data-intensive and high performance computing. But these elements are typically absent when research results are recorded in a journal article or conference proceedings. There is a growing sense in the computational community that this gap needs to be filled if we are to create a stable base of research upon which reliable advances may be built. In short, we need to ensure that computational results are as reproducible as those from experiments.
I am part of a small team at SC16 that over the next several years will work to promote and support replication and reproducibility of computational results. One of the approaches we are exploring is to use SC’s Student Cluster Competition (SCC) to test the reproducibility of results presented in papers at previous SC conferences. This is the first time that students have been challenged to reproduce results from a recent computational paper rather than run more traditional benchmark applications.
After evaluating submissions from past SC paper authors, the SCC selected “A parallel connectivity algorithm for de Bruijn graphs in metagenomic applications” by Flick, Jain, Pan, and Aluru for the inaugural reproducibility initiative at SC16 (the paper is available from the ACM Digital Library). Results from this paper will be reproduced by the 14 teams taking part in the cluster competition in November.
We hope our student teams will be especially engaged by the idea that they aren’t running just a sterile benchmark application but rather, by attempting to reproduce previously reported application performance, they are actively taking part in the scientific process. We believe this kind of early engagement will be very important to efforts to make reproducibility a standard part of the computational research process.
One of the paper’s co-authors, Chirag Jain, will be helping the SCC committee to create a challenging competition task for the students and will serve as a judge during the competition.
About the Author:
Michela Taufer is the SC16 Panels Chair and a SC16 SCC Reproducibility Committee Member. She is the David and Beverly J.C. Mills Career Development Chair from the University of Delaware. Michela has a long history of interdisciplinary work with high-profile computational biophysics groups in several research and academic institutions.
Her research interests include software applications and their advance programmability in heterogeneous computing (i.e., multi-core platforms and GPUs); cloud computing and volunteer computing; and performance analysis, modeling and optimization of multi-scale applications. She has been serving as the principal investigator of several NSF collaborative projects. She also has significant experience in mentoring a diverse population of students on interdisciplinary research.