Reshape of lat and lon coordinates in MongoDB, using the aggregate pipeline
Reshape of lat and lon coordinates in MongoDB, using the aggregate pipeline Summary To transform a large number of documents in a MongoDB collection with spatial data, for example: {lat: -58.1, lon: -34.2} to a GeoJson format, recognizable by MongoDB spatial analysis functions, for example: {type: ”Point”, location: [- 58.1, -34.2]} It seems advisable to use the aggregation framework: db.tweets.aggregate ([{$project: {location: {type: "Point", coordinates: ["$lon", "$lat"]}}}, $ out: {$out: "newcollectionname"}] ); The problem An usual task in the database in MongoDB may be to prepare the data for spatial tasks. As I described in the previous post, it is necessary to have the data in the compatible format, in this case as a GeoJson. For example, if we are working with points: {type: ”Point”, location: [- 58.1, -34.2]} {type: ”Point”, location: [- 58.1, -34.2]} That is, specifying a "type" key that specifies that it is a point, and then a "location" key with an array of the coordinates pair: longitude and latitude (in that order!) If working with polygons: {type: "Polygon", coordinates: [[[0, 0], [3, 6], [6, 1], [0, 0]]]} The problem with my data is that it was in the following format and I needed to do…
Implementing a scalable geospatial operation in MongoDB
Implementing a scalable geospatial operation in MongoDB Summary In this note I document an initial test implementation of a spatial join involving 22 millions of points to nearly 16 thousands polygons using MongoDB. I document the necessary steps to run the operation. My results took more time that I expected, a total of more than 12 hours. My conclusion is that the approach can be scalable if combined with other approaches such as the simplification of polygons. Intro In this post, I am sharing an implementation of a spatial join type of analysis at scale using MongoDB. MongoDB is a Non-SQL database system, which is extensively used in industry to store large databases distributed over multiple (cloud) machines for storing files. My case is the analysis of a large database over 22 million geo-located tweets. My first objective is to implement a spatial join kind of analysis, that essentially counts tweets in censal radiuses, which are spatial polygons. In this case I have 15,700 polygons. Such an operation is standardly implemented in geospatial packages such as Arcgis or Qgis, and in Python, for example, using Geopandas. But my ultimate objective is finding a solution that is scalable with large amounts…
Efectos de la ley de Alquileres CABA
Efectos de la Ley de Alquileres Noviembre Aquí encontrarán una primera versión1 de un documento de trabajo que analiza los efectos de la ley de alquileres de CABA, promulgada en 2017. Bienvenidos los comentarios. La ley de CABA de 2017 introdujo algunas modificaciones a la operación del mercado, incluyendo la transferencia de la obligación del pago de la comisión del inquilino al propietario, entre otras medidas que mencioné en un post anterior. En el documento, me propuse medir los efectos de la ley de alquileres de la Ciudad de Buenos Aires de 2017 en los valores de alquiler. La principal motivación detrás de la metodología propuesta es evitar confundir la medición del efecto con otros factores que pudieran afectar los valores de manera contemporánea, como podría ser el efecto de la inflación, las modificaciones en el valor que ocurren a raíz de las tendencias de transformación urbana, y evitar sesgos de selección en la comparación con otras ciudades. Todo el análisis está basado en datos abiertos, de ofertas de alquiler publicadas en Properati. El código completo también estará próximamente disponible. El análisis es posible gracias a algunas particularidades de la Ciudad de Buenos Aires, más precisamente el hecho de estar…
Instalacion y consultas a Google Big Query desde Jupyter
Instalacion y consultas a Google Big Query desde Jupyter Instalación y consultas a Google Big Query desde Jupyter Algunas notas para hacer un pedido a google big query. En este caso el objetivo es consultar la base de datos de Properati, y llevarla a un pandas. Agrego al final unos ultimos pasos para persistir la data en un mongo local. Instalación Google Cloud Voy a crear un ambiente virtual especifico usando conda. En este caso le agrego python 3.6. Le llamo bigquery xxxxxxxxxx conda create -n bigquery python=3.6 Activar el ambiente xxxxxxxxxx C:\Users\Richard>activate bigquery Dentro del ambiente puedo entrar a python, y voy a chequear desde donde python se esta ejecutando xxxxxxxxxx (bigquery) C:\Users\Richard>python Python 3.6.7 (default, Jul 2 2019, 02:21:41) [MSC v.1900 64 bit (AMD64)] on win32Type “help”, “copyright”, “credits” or “license” for more information. >>> import sys \>>> sys.executable’C:\\Users\\Richard\\AppData\\Local\\conda\\conda\\envs\\bigquery\\python.exe’>>> exit() El siguiente paso es instalar google-cloud en el ambiente. Lo instalo tambien desde conda. Lo siguiente no va a funcionar: xxxxxxxxxx (bigquery) C:\Users\Richard>conda install google-cloud Solving environment: failed PackagesNotFoundError: The following packages are not available from current channels: \- google-cloud La forma correcta es especificando conda-forge: xxxxxxxxxx (bigquery) C:\Users\Richard>conda…
Efectos ley de alquileres CABA #1
[latexpage]In englishComparto aquí un análisis preliminar de los efectos de la ley de alquileres, sancionada en 2017, sobre los valores de alquiler. Esta ley, declarada inconstitucional en Mayo de este año, libró a los inquilinos del pago de la comisión inmobiliaria, al tiempo que también impuso una comisión máxima a ser pagada a las inmobiliarias.
Identification with DAGs: Introduction with simple simulations
En español
In this post I want to share with you some introductory ideas on how Directed Acyclical Graphs (DAGs) are used for causal identification. I am also sharing a few (Stata based) numerical simulations (here), that can be illustrative of their use in a regression application.
The DAG approach has been around for at least a decade now, and is described in extent in the excellent book by Pearl and Mackenzie (2018)’s “The Book of Why”. There’s so much going on in the book that I will be writing more about it in a future post.
Notes on Matching in Entrepreneurial Finance Networks
español We are glad our paper (with Virginia Sarria Allende and Gabriela Robiolo) went out in Venture Capital : A International Entrepreneurial Finance Journal. A working paper version is available here. Here are some brief comments on the ideas, the econometrics, and data approach: In the paper we study the “matching” between investors and startups in the entrepreneurial finance market. Broadly speaking, we are concerned with the question of who will invest in who, and in the role played by (social, professional) networks in the explanation. Specifically, we show evidence on a simple idea: due to information related frictions in the entrepreneurial finance market, being closer in the network of connections actually matters for matching. Being closer increases not only the attractiveness of a prospective match, but also makes observable attributes more attractive. But “being closer” has a particular interpretation here. Our measured networked connections, are not the typical social (or follower) style of connections. We recognize a link if there is information that you have worked, invested, mentored, etc. a common startup or organization in the past. So we could say that these are really costly (or “signally meaningful”, in Spence’s sense) connections. For founders (or prospective investors alike)…
Mapping with geopandas and basemapping with contextily
I find the geopandas library to be really useful for mapping with layers. Contextily is also a nice library that allows adding a background basemap. Using them together makes it fairly simple to visualize shapes such as polygons and points, together with contextual mapping information, such as in the following figure: Basemaps are drawn from OpenStreetMap under CC BY SA and map tiles are from Stamen Design, under CC BY 3.0. There are some options for tile design. View the code on Gist. If embedded notebook does not render try here
Tanchella @CIBSE 2017
In english Hoy presentamos nuestra herramienta Tanchella en CIBSE 2017. Tanchella es una herramienta para la recolección y el análisis de datos de redes complejas (sociales, profesionales y financieras) que conforman los ecosistemas de emprendimientos. Este es un desarrollo con fin académico, que nos permite investigar los distintos efectos que tienen estas redes en los mercados de financiamiento de startups. Tanchella es el resultado de un trabajo multidisciplinario, donde participan investigadores y estudiantes de la Universidad Austral, incluyendo la Facultad de Ingeniería, el IAE Business School, y la Facultad de Ciencias Empresariales. Esta presentación hace foco en las cuestiones tecnológicas y algunos resultados. Aquí están los slides.
Table of Differences in Means Tests / Tabla de Tests de Diferencias de Medias
español In this post I leave you a simple Stata code that generates a table of means differences (between 2 groups) for a set of variables. It looks like this: A table of this type will be useful, for example, when the aim is to compare a treatment group and a control group across a series of variables. Stata has the ttest command to perform tests of this kind, but does not incorporate, as far as I know, a functionality for exporting a table of multiple tests. This code tests a large number of variables, with the advantage that it generates and exports a publication-style table. The table is saved in a text (.txt) file. Then, I usually import this table into Excel (insert> data> text) for final retouching before copying it to the final document. I leave you the Excel template as well. For simplicity, asterisks for significant statistics, parentheses and brackets are added – also automatically – by the Excel template. From the statistical point of view, it might be worth mentioning the subject of false-discovery rates, which could be relevant in an application of this type. I will leave it for for a future post. You can test the code…