Practical Reproducibility by Managing Experiments Like Software

Authors: Ivo Jimenez (University of California, Santa Cruz)

BP
Abstract: This BoF is for HPC community members who want to make sharing and re-executing scientific experiments more practical. As a starting point, we will introduce the Popper Protocol (http://falsifiable.us), a set of guidelines for managing scientific explorations as software projects. We will moderate discussions, and conduct a survey among attendants on tools/services that can be used to “Popperize” scientific explorations, i.e. fully automate/document experiments by scripting all aspects of their execution/analysis and version-control their evolution using DevOps tools. Our goal: create a list of tool-chain templates that domain-specific communities can use to incentivize researchers/students to generate easily re-executable/shareable experiments.

Long Description: The lack of reproducibility in computational sciences, computer science, and computer systems has led to a credibility crisis. We urgently need to identify tools that help document dependencies on data products, methodologies, and computational environments; that safely archive data products; and that reliably share data products so that scientists can rely on their availability. Over the last decade software engineering and systems administration communities such as the DevOps movement have developed sophisticated tools to ensure “software reproducibility”, e.g. the reproducibility of errors caused by software bugs or the reproducibility of a successful deployments using versioning, dependency management, and containerization. DevOps tools enjoy wide popularity and industry adoption and leverage today’s ubiquity of open-source software and cloud services. We propose to make reproducibility practical by applying these mature and well-maintained tools in scientific explorations.

This BoF is intended for HPC community members interested in getting involved in practical reproducibility that leverages the techniques and strategies of software reproducibility. As a starting point for discussions, we will give an overview of the Popper Protocol (http://falsifiable.us), a series of simple, easy-to-follow steps for generating experiments that are easy to re-execute. The key idea behind the Popper Protocol is to manage scientific explorations as a software project, using tools and services that are readily available now and enjoy wide popularity. Yet, the Popper Protocol does not mandate a particular set of tools. Instead it facilitates the following principles:

1. Make use of DevOps tools for implementing all aspects (code, deployment, validation, etc.) of an experiment. 2. Create scripts for these tools and manage them in a version-control system. 3. Document changes to experiments in the form of commits (the commit log is the lab notebook).

A command-line tool (Popper CLI, available at https://github.com/systemslab/popper) helps users compose versioning, dependency management, containerization, orchestration, analysis, and documentation to improve sharing and reproducibility of scientific explorations. Feedback from early adopters within the distributed systems, numerical weather prediction, genomics, and computational media communities are very encouraging: adopters felt that following the Protocol significantly increased their personal productivity, in part because it made experiments much easier to share and leverage. Feedback also highlights the importance of the ability to select tool chains and services that best fit a particular community of practice.

An important goal of this BoF is for participants to determine common templates for domain-specific tool chains and services that enable fast and easy setup for “Popperized” explorations. Concretely, we will conduct a survey to ask the audience for an example of an experiment that would exemplify the work in their domain, and provide a list of tools and services that they think would be suitable to use in order to do code management, software packaging, experiment orchestration, input and output data management, analysis and visualization, and documentation. These examples can be used as the basis of collaboration or for students to learn how experiments are carried out in a particular domain. We will summarize our findings in a report.

Conference Presentation: pdf

Birds of a Feather Index