DescriptionFile and block storage are well-defined concepts in
computing and have been used as common components of
computer systems for decades. Big data has led to new
types of storage. The predominant data model in cloud
storage is the object-based storage and it is highly
successful. Object stores follow a simpler API with
get() and put() operations to interact with the data. A
wide variety of data analysis software have been
developed around objects using their APIs. However,
object storage and traditional file storage are designed
for different purpose and for different applications.
Organizations maintain file-based storage clusters and a
high volume of existing data are stored in files.
Moreover, many new applications need to access data from
both types of storage.
In this paper, we
first explore the key differences between object-based
and the more traditional file-based storage systems. We
have designed and implemented several file-to-object
mapping algorithms to bridge the semantic gap between
these data models. Our evaluation shows that by
achieving an efficient such mapping, our library can
grant 2x-27x higher performance against a naive
one-to-one mapping and with minimal overheads. Our study
exposes various strengths and weaknesses of each mapping
strategy and frames the extended potential of a unified
data access system.