A11: Finding a Needle in a Field of Haystacks: Lightweight Metadata Search for Large-Scale Distributed Research Repositories
Author
Event Type
ACM Student Research Competition
Poster


TimeWednesday, November 15th4:05pm - 4:15pm
Location701
DescriptionFast, scalable, and distributed search services are commonly available for single nodes, but lead to high infrastructure costs when scaled across tens of thousands of filesystems and repositories, as is the case with Globus. Endpoint-specific indexes may instead be stored on their respective nodes, but while this distributes storage costs between users, it also creates significant query overhead. Our solution provides a compromise by introducing two levels of indexes: a single centralized "second-level index" (SLI) that aggregates and summarizes terms from each endpoint; and many endpoint-level indexes that are referenced by the SLI and used only when needed. We show, via experiments on Globus-accessible filesystems, that the SLI reduces the amount of space needed on central servers by over 96% while also reducing the set of endpoints that need to execute user queries.