DescriptionIn recent times, geospatial datasets are growing in terms of size, complexity and heterogeneity. High performance systems are needed to analyze such data to produce actionable insights in an efficient manner. For polygonal a.k.a vector datasets, operations such as I/O, data partitioning, and communication becomes challenging in a cluster environment.
In this work, we present MPI-GIS equipped with MPI-Vector-IO, a parallel I/O library that we have designed using MPI-IO specifically for irregular polygonal (vector) data formats such as Well Known Text, XML, etc. Our system can perform spatial in-memory indexing and join efficiently for an order of magnitude larger datasets compared to our previous work. It makes MPI aware of spatial data and spatial primitives and provides support for spatial data types embedded within collective computation and communication using MPI message-passing library. It takes less than 2 minutes to scan through 2.7 billion geometries in 96GB file using 160 processes.