Implementing a scalable geospatial operation in MongoDB
Implementing a scalable geospatial operation in MongoDB Summary In this note I document an initial test implementation of a spatial join involving 22 millions of points to nearly 16 thousands polygons using MongoDB. I document the necessary steps to run the operation. My results took more time that I expected, a total of more than 12 hours. My conclusion is that the approach can be scalable if combined with other approaches such as the simplification of polygons. Intro In this post, I am sharing an implementation of a spatial join type of analysis at scale using MongoDB. MongoDB is a Non-SQL database system, which is extensively used in industry to store large databases distributed over multiple (cloud) machines for storing files. My case is the analysis of a large database over 22 million geo-located tweets. My first objective is to implement a spatial join kind of analysis, that essentially counts tweets in censal radiuses, which are spatial polygons. In this case I have 15,700 polygons. Such an operation is standardly implemented in geospatial packages such as Arcgis or Qgis, and in Python, for example, using Geopandas. But my ultimate objective is finding a solution that is scalable with large amounts…
Mapping with geopandas and basemapping with contextily
I find the geopandas library to be really useful for mapping with layers. Contextily is also a nice library that allows adding a background basemap. Using them together makes it fairly simple to visualize shapes such as polygons and points, together with contextual mapping information, such as in the following figure: Basemaps are drawn from OpenStreetMap under CC BY SA and map tiles are from Stamen Design, under CC BY 3.0. There are some options for tile design. View the code on Gist. If embedded notebook does not render try here