Ricardo A. Pasquini
  • Research / Investigacion
  • Teaching / Docencia
  • About
  • github
  • CV
  • Blog
January 21, 2020 by admin 3
Coding Notes, Uncategorized

Reshape of lat and lon coordinates in MongoDB, using the aggregate pipeline

Reshape of lat and lon coordinates in MongoDB, using the aggregate pipeline Summary To transform a large number of documents in a MongoDB collection with spatial data, for example: {lat: -58.1, lon: -34.2} to a GeoJson format, recognizable by MongoDB spatial analysis functions, for example: {type: ”Point”, location: [- 58.1, -34.2]} It seems advisable to use the aggregation framework: db.tweets.aggregate ([{$project: {location: {type: "Point", coordinates: ["$lon", "$lat"]}}}, $ out: {$out: "newcollectionname"}] );   The problem An usual task in the database in MongoDB may be to prepare the data for spatial tasks. As I described in the previous post, it is necessary to have the data in the compatible format, in this case as a GeoJson. For example, if we are working with points: {type: ”Point”, location: [- 58.1, -34.2]} {type: ”Point”, location: [- 58.1, -34.2]} That is, specifying a "type" key that specifies that it is a point, and then a "location" key with an array of the coordinates pair: longitude and latitude (in that order!) If working with polygons: {type: "Polygon", coordinates: [[[0, 0], [3, 6], [6, 1], [0, 0]]]} The problem with my data is that it was in the following format and I needed to do…

Read more

aggregation pipeline data reshape MongoDB scalability

January 21, 2020 by admin 0
Coding Notes, Uncategorized

Implementing a scalable geospatial operation in MongoDB

Implementing a scalable geospatial operation in MongoDB Summary In this note I document an initial test implementation of a spatial join involving 22 millions of points to nearly 16 thousands polygons using MongoDB. I document the necessary steps to run the operation. My results took more time that I expected, a total of more than 12 hours. My conclusion is that the approach can be scalable if combined with other approaches such as the simplification of polygons. Intro In this post, I am sharing an implementation of a spatial join type of analysis at scale using MongoDB. MongoDB is a Non-SQL database system, which is extensively used in industry to store large databases distributed over multiple (cloud) machines for storing files. My case is the analysis of a large database over 22 million geo-located tweets. My first objective is to implement a spatial join kind of analysis, that essentially counts tweets in censal radiuses, which are spatial polygons. In this case I have 15,700 polygons. Such an operation is standardly implemented in geospatial packages such as Arcgis or Qgis, and in Python, for example, using Geopandas. But my ultimate objective is finding a solution that is scalable with large amounts…

Read more

geopandas Geospatial analysis MongoDB pymongo python scalability spatial join

2/2

Categories

  • Causal Inference (4)
  • Coding Notes (8)
  • Defi (1)
  • Economics (8)
  • Machine Learning (1)
  • Uncategorized (19)
  • Urban Economics (3)

Recent Posts

  • Note on the Impact of Liquidation on Health Factor in Overcollateralized Loans
  • Econometrics with simulations 📚
  • Explicando Inferencia por Aleatorización a un futbolero
  • Optimal calibration of a ML classifier based on business knowledge
  • Note on AMMs “picked-off” risk

Tag Cloud

AMM apps bienes publicos causal inference classification conda COVID-19 criptomonedas cryptocurrencies defi econometrics economía de mercados Ethereum Exportar resultados Export output Financial inclusion financiamiento cuadrático fraud geopandas Geospatial analysis Gitcoin h3 hexagons Households Finance Indebtedness Jupyter Loops machinelearning Mercado de Alquileres MongoDB negocios precision proyectos ingeniería public goods pymongo python quadratic funding recall Regression roc-curve scalability Stata Tablas Tables ubuntu
Rife WordPress Theme ♥ Proudly built by Apollo13Themes - Edit this text

Recent Posts

  • Note on the Impact of Liquidation on Health Factor in Overcollateralized Loans
  • Econometrics with simulations 📚
  • Explicando Inferencia por Aleatorización a un futbolero
  • Optimal calibration of a ML classifier based on business knowledge
  • Note on AMMs “picked-off” risk
  • Un atlas de deudas para Argentina
  • Bienes públicos, Gitcoin, y financiamiento cuadrático
  • An indebtedness atlas for Argentina

Tags

AMM apps bienes publicos causal inference classification conda COVID-19 criptomonedas cryptocurrencies defi econometrics economía de mercados Ethereum Exportar resultados Export output Financial inclusion financiamiento cuadrático fraud geopandas Geospatial analysis Gitcoin h3 hexagons Households Finance Indebtedness Jupyter Loops machinelearning Mercado de Alquileres MongoDB negocios precision proyectos ingeniería public goods pymongo python quadratic funding recall Regression roc-curve scalability Stata Tablas Tables ubuntu