Author/Presenters
Event Type
Workshop

Applications
Clouds and Distributed Computing
SIGHPC Workshop
TimeSunday, November 12th9:10am -
9:40am
Location507
DescriptionLarge scientific facilities provide researchers with
instrumentation, data, and data products that can
accelerate scientific discovery. However, increasing
data volumes coupled with limited local computational
power prevents researchers from taking full advantage of
what these facilities can offer. Many researchers looked
into using commercial and academic cyberinfrastructure
(CI) to process this data. Nevertheless, there remains a
disconnect between large facilities and
cyberinfrastructure that requires researchers to be
actively part of the data processing cycle. The
increasing complexity of cyberinfrastructure and data
scale necessitates new data delivery models, those that
can autonomously integrate large-scale scientific
facilities and cyberinfrastructure to deliver real-time
data and insights. In this paper, we present our initial
efforts using the Ocean Observatories Initiative project
as a use case. In particular, we present a
subscription-based data streaming service for data
delivery that leverages the Apache Kafka data streaming
platform. We also show how our solution can
automatically integrate large-scale facilities with
cyberinfrastructure services for automated data
processing.