Implementing a scalable geospatial operation in MongoDB
Implementing a scalable geospatial operation in MongoDB Summary In this note I document an initial test implementation of a spatial join involving 22 millions of points to nearly 16 thousands polygons using MongoDB. I document the necessary steps to run the operation. My results took more time that I expected, a total of more than 12 hours. My conclusion is that the approach can be scalable if combined with other approaches such as the simplification of polygons. Intro In this post, I am sharing an implementation of a spatial join type of analysis at scale using MongoDB. MongoDB is a Non-SQL database system, which is extensively used in industry to store large databases distributed over multiple (cloud) machines for storing files. My case is the analysis of a large database over 22 million geo-located tweets. My first objective is to implement a spatial join kind of analysis, that essentially counts tweets in censal radiuses, which are spatial polygons. In this case I have 15,700 polygons. Such an operation is standardly implemented in geospatial packages such as Arcgis or Qgis, and in Python, for example, using Geopandas. But my ultimate objective is finding a solution that is scalable with large amounts…
Instalacion y consultas a Google Big Query desde Jupyter
Instalacion y consultas a Google Big Query desde Jupyter Instalación y consultas a Google Big Query desde Jupyter Algunas notas para hacer un pedido a google big query. En este caso el objetivo es consultar la base de datos de Properati, y llevarla a un pandas. Agrego al final unos ultimos pasos para persistir la data en un mongo local. Instalación Google Cloud Voy a crear un ambiente virtual especifico usando conda. En este caso le agrego python 3.6. Le llamo bigquery xxxxxxxxxx conda create -n bigquery python=3.6 Activar el ambiente xxxxxxxxxx C:\Users\Richard>activate bigquery Dentro del ambiente puedo entrar a python, y voy a chequear desde donde python se esta ejecutando xxxxxxxxxx (bigquery) C:\Users\Richard>python Python 3.6.7 (default, Jul 2 2019, 02:21:41) [MSC v.1900 64 bit (AMD64)] on win32Type “help”, “copyright”, “credits” or “license” for more information. >>> import sys \>>> sys.executable’C:\\Users\\Richard\\AppData\\Local\\conda\\conda\\envs\\bigquery\\python.exe’>>> exit() El siguiente paso es instalar google-cloud en el ambiente. Lo instalo tambien desde conda. Lo siguiente no va a funcionar: xxxxxxxxxx (bigquery) C:\Users\Richard>conda install google-cloud Solving environment: failed PackagesNotFoundError: The following packages are not available from current channels: \- google-cloud La forma correcta es especificando conda-forge: xxxxxxxxxx (bigquery) C:\Users\Richard>conda…