Installing H3 on a Linux Subsystem in Windows 10
Installing H3 on a Linux Subsystem in Windows 10 Installing H3 on a Linux Subsystem in Windows 10 H3 is a library developed by the Uber team, that implements spatial analysis based on indexed hexagons. Hexagons have especially interesting properties for spatial operations. See more here. My experience installing H3 on Windows was not good. H3 did not work, and errors were not clear on the causes either. After some googling I reached the conclusion that crafting that installation could become a headache. So I decided to test the alternative way of using Linux as a subsystem of Windows. This turn out very well. I collect in this post the steps for the complete setup: 1. Install the Linux subsystem The procedure for this installation is simple and [explained here] (https://docs.microsoft.com/en-us/windows/wsl/install-win10). The steps: Using Windows Powershell (wich is available on Windows), enter the command that authorizes the execution of the subsystem: Restart Install a Linux distribution. The distributions can be found at Microsoft Store. In my case I downloaded and installed Ubuntu 18.04 LTS. Just download and install. No special configurations required here. Once installed we will have access to Ubuntu through a shortcut: 2. Preparing the installation of…
Implementing a scalable geospatial operation in MongoDB
Implementing a scalable geospatial operation in MongoDB Summary In this note I document an initial test implementation of a spatial join involving 22 millions of points to nearly 16 thousands polygons using MongoDB. I document the necessary steps to run the operation. My results took more time that I expected, a total of more than 12 hours. My conclusion is that the approach can be scalable if combined with other approaches such as the simplification of polygons. Intro In this post, I am sharing an implementation of a spatial join type of analysis at scale using MongoDB. MongoDB is a Non-SQL database system, which is extensively used in industry to store large databases distributed over multiple (cloud) machines for storing files. My case is the analysis of a large database over 22 million geo-located tweets. My first objective is to implement a spatial join kind of analysis, that essentially counts tweets in censal radiuses, which are spatial polygons. In this case I have 15,700 polygons. Such an operation is standardly implemented in geospatial packages such as Arcgis or Qgis, and in Python, for example, using Geopandas. But my ultimate objective is finding a solution that is scalable with large amounts…