Presentation

· Presenters · Organizations · Search Program

Workshop

: Reliable Access to Massive Restricted Texts: Experience-Based Evaluation

SessionDataCloud 2017: The Eighth International Workshop on Data-Intensive Computing in the Clouds

Author/Presenters

Zong Peng

Beth Plale

Event Type

Workshop

Tags

TimeSunday, November 12th11:40am - 12:10pm

Location507

DescriptionLibraries are seeing growing numbers of digitized textual corpora with restrictions on their content. Probing and mining these massive corpora, of interest to scholars, can be cumbersome because of size, granularity, access restrictions, and organization. Efficient management of such a collection especially under failures depends on the primary storage system. In this paper, we identify the requirements for managing a massive text corpus based on experience in managing the 5.5 billion pages of the HathiTrust digital library. Using the requirements, we compare candidate storage solutions, and using a combination of experimental evaluation and comparison, to identify an optimum choice.

Author/Presenters

Zong Peng

Indiana University

Beth Plale

Indiana University

Navigation