A12: Applying Image Feature Extraction to Cluttered Scientific Repositories
Author
Event Type
ACM Student Research Competition
Poster


TimeWednesday, November 15th4:15pm - 4:25pm
Location701
DescriptionOver time many scientific repositories and file systems become disorganized, containing poorly described and error-ridden data. As a result, it is often difficult for researchers to discover crucial data. In this poster, we present a collection of image processing modules that collectively extract metadata from a variety of image formats. We implement these modules in Skluma—a system designed to automatically extract metadata from structured and semi-structured scientific formats. Our modules apply several image metadata extraction techniques that include processing file system metadata, header information, color content statistics, extracted text, feature-based clusters, and predicting tags using a supervised learning model. Our goal is to collect a large number of metadata that may then be used to organize, understand, and analyze data stored in a repository.