Toward Preserving Results Confidentiality in Cloud-Based Scientific Workflows
Workshop: WORKS 2017 (12th Workshop on Workflows in Support of Large-Scale Science)
Abstract: Cloud computing has established itself as a solid computational model that allows for scientists to deploy their simulation-based experiments on distributed virtual resources to execute a wide range of scientific experiments. These experiments can be modeled as scientific workflows. Many of these workflows are data-intensive and produce a large volume of data, which is also stored in the cloud using storage services by Scientific Workflow Management Systems (SWfMS). One main issue regarding cloud storage services is confidentiality of stored data, i.e. if unauthorized people access data files they can infer knowledge about the results or even about the workflow structure. Encryption is a possible solution, but it may not be be sufficient and a new level of security can be added to preserve data confidentiality: data dispersion. In order to reduce this risk, generated data files cannot be stored in the same bucket, or at least sensitive data files have to be distributed across many cloud storage. In this paper, we present IPConf, an approach to preserve workflow results confidentiality in cloud storage. IPConf generates a distribution plan for data files generated during a workflow execution. This plan disperses data files in several cloud storage to preserve confidentiality. This distribution plan is then sent to the SWfMS that effectively stores generated data into specific buckets during workflow execution. Experiments performed using real data from SciPhy workflow executions indicate the potential of the proposed approach.