Ricardo A. Pasquini
  • Research / Investigacion
  • Teaching / Docencia
  • About
  • github
  • CV
  • Blog
January 21, 2020 by admin 0
Coding Notes, Uncategorized

Implementing a scalable geospatial operation in MongoDB

Implementing a scalable geospatial operation in MongoDB Summary In this note I document an initial test implementation of a spatial join involving 22 millions of points to nearly 16 thousands polygons using MongoDB. I document the necessary steps to run the operation. My results took more time that I expected, a total of more than 12 hours. My conclusion is that the approach can be scalable if combined with other approaches such as the simplification of polygons. Intro In this post, I am sharing an implementation of a spatial join type of analysis at scale using MongoDB. MongoDB is a Non-SQL database system, which is extensively used in industry to store large databases distributed over multiple (cloud) machines for storing files. My case is the analysis of a large database over 22 million geo-located tweets. My first objective is to implement a spatial join kind of analysis, that essentially counts tweets in censal radiuses, which are spatial polygons. In this case I have 15,700 polygons. Such an operation is standardly implemented in geospatial packages such as Arcgis or Qgis, and in Python, for example, using Geopandas. But my ultimate objective is finding a solution that is scalable with large amounts…

Read more

geopandas Geospatial analysis MongoDB pymongo python scalability spatial join

November 1, 2019 by admin 0
Coding Notes, Uncategorized

Instalacion y consultas a Google Big Query desde Jupyter

Instalacion y consultas a Google Big Query desde Jupyter Instalación y consultas a Google Big Query desde Jupyter Algunas notas para hacer un pedido a google big query. En este caso el objetivo es consultar la base de datos de Properati, y llevarla a un pandas. Agrego al final unos ultimos pasos para persistir la data en un mongo local. Instalación Google Cloud Voy a crear un ambiente virtual especifico usando conda. En este caso le agrego python 3.6. Le llamo bigquery xxxxxxxxxx conda create -n bigquery python=3.6 Activar el ambiente xxxxxxxxxx C:\Users\Richard>activate bigquery Dentro del ambiente puedo entrar a python, y voy a chequear desde donde python se esta ejecutando xxxxxxxxxx (bigquery) C:\Users\Richard>python ​ ​ Python 3.6.7 (default, Jul  2 2019, 02:21:41) [MSC v.1900 64 bit (AMD64)] on win32Type “help”, “copyright”, “credits” or “license” for more information. ​ >>> import sys \>>> sys.executable’C:\\Users\\Richard\\AppData\\Local\\conda\\conda\\envs\\bigquery\\python.exe’>>> exit() El siguiente paso es instalar google-cloud en el ambiente. Lo instalo tambien desde conda. Lo siguiente no va a funcionar: xxxxxxxxxx   (bigquery) C:\Users\Richard>conda install google-cloud   Solving environment: failed   PackagesNotFoundError: The following packages are not available from current channels:     \- google-cloud La forma correcta es especificando conda-forge: xxxxxxxxxx   (bigquery) C:\Users\Richard>conda…

Read more

Coding Conda Environments Google Big Query Jupyter Properati pymongo python

2/2

Categories

  • Causal Inference (4)
  • Coding Notes (8)
  • Defi (1)
  • Economics (8)
  • Machine Learning (1)
  • Uncategorized (19)
  • Urban Economics (3)

Recent Posts

  • Note on the Impact of Liquidation on Health Factor in Overcollateralized Loans
  • Econometrics with simulations 📚
  • Explicando Inferencia por Aleatorización a un futbolero
  • Optimal calibration of a ML classifier based on business knowledge
  • Note on AMMs “picked-off” risk

Tag Cloud

AMM apps bienes publicos causal inference classification conda COVID-19 criptomonedas cryptocurrencies defi econometrics economía de mercados Ethereum Exportar resultados Export output Financial inclusion financiamiento cuadrático fraud geopandas Geospatial analysis Gitcoin h3 hexagons Households Finance Indebtedness Jupyter Loops machinelearning Mercado de Alquileres MongoDB negocios precision proyectos ingeniería public goods pymongo python quadratic funding recall Regression roc-curve scalability Stata Tablas Tables ubuntu
Rife WordPress Theme ♥ Proudly built by Apollo13Themes - Edit this text

Recent Posts

  • Note on the Impact of Liquidation on Health Factor in Overcollateralized Loans
  • Econometrics with simulations 📚
  • Explicando Inferencia por Aleatorización a un futbolero
  • Optimal calibration of a ML classifier based on business knowledge
  • Note on AMMs “picked-off” risk
  • Un atlas de deudas para Argentina
  • Bienes públicos, Gitcoin, y financiamiento cuadrático
  • An indebtedness atlas for Argentina

Tags

AMM apps bienes publicos causal inference classification conda COVID-19 criptomonedas cryptocurrencies defi econometrics economía de mercados Ethereum Exportar resultados Export output Financial inclusion financiamiento cuadrático fraud geopandas Geospatial analysis Gitcoin h3 hexagons Households Finance Indebtedness Jupyter Loops machinelearning Mercado de Alquileres MongoDB negocios precision proyectos ingeniería public goods pymongo python quadratic funding recall Regression roc-curve scalability Stata Tablas Tables ubuntu