Reliable Access to Massive Restricted Texts: Experience-Based Evaluation
Author/Presenters
Event Type
Workshop

Applications
Clouds and Distributed Computing
SIGHPC Workshop
TimeSunday, November 12th11:40am - 12:10pm
Location507
DescriptionLibraries are seeing growing numbers of digitized textual corpora with restrictions on their content. Probing and mining these massive corpora, of interest to scholars, can be cumbersome because of size, granularity, access restrictions, and organization. Efficient management of such a collection especially under failures depends on the primary storage system. In this paper, we identify the requirements for managing a massive text corpus based on experience in managing the 5.5 billion pages of the HathiTrust digital library. Using the requirements, we compare candidate storage solutions, and using a combination of experimental evaluation and comparison, to identify an optimum choice.
Author/Presenters