Installing H3 on a Linux Subsystem in Windows 10
Installing H3 on a Linux Subsystem in Windows 10 Installing H3 on a Linux Subsystem in Windows 10 H3 is a library developed by the Uber team, that implements spatial analysis based on indexed hexagons. Hexagons have especially interesting properties for spatial operations. See more here. My experience installing H3 on Windows was not good. H3 did not work, and errors were not clear on the causes either. After some googling I reached the conclusion that crafting that installation could become a headache. So I decided to test the alternative way of using Linux as a subsystem of Windows. This turn out very well. I collect in this post the steps for the complete setup: 1. Install the Linux subsystem The procedure for this installation is simple and [explained here] (https://docs.microsoft.com/en-us/windows/wsl/install-win10). The steps: Using Windows Powershell (wich is available on Windows), enter the command that authorizes the execution of the subsystem: Restart Install a Linux distribution. The distributions can be found at Microsoft Store. In my case I downloaded and installed Ubuntu 18.04 LTS. Just download and install. No special configurations required here. Once installed we will have access to Ubuntu through a shortcut: 2. Preparing the installation of…
Reshape of lat and lon coordinates in MongoDB, using the aggregate pipeline
Reshape of lat and lon coordinates in MongoDB, using the aggregate pipeline Summary To transform a large number of documents in a MongoDB collection with spatial data, for example: {lat: -58.1, lon: -34.2} to a GeoJson format, recognizable by MongoDB spatial analysis functions, for example: {type: ”Point”, location: [- 58.1, -34.2]} It seems advisable to use the aggregation framework: db.tweets.aggregate ([{$project: {location: {type: "Point", coordinates: ["$lon", "$lat"]}}}, $ out: {$out: "newcollectionname"}] ); The problem An usual task in the database in MongoDB may be to prepare the data for spatial tasks. As I described in the previous post, it is necessary to have the data in the compatible format, in this case as a GeoJson. For example, if we are working with points: {type: ”Point”, location: [- 58.1, -34.2]} {type: ”Point”, location: [- 58.1, -34.2]} That is, specifying a "type" key that specifies that it is a point, and then a "location" key with an array of the coordinates pair: longitude and latitude (in that order!) If working with polygons: {type: "Polygon", coordinates: [[[0, 0], [3, 6], [6, 1], [0, 0]]]} The problem with my data is that it was in the following format and I needed to do…
Implementing a scalable geospatial operation in MongoDB
Implementing a scalable geospatial operation in MongoDB Summary In this note I document an initial test implementation of a spatial join involving 22 millions of points to nearly 16 thousands polygons using MongoDB. I document the necessary steps to run the operation. My results took more time that I expected, a total of more than 12 hours. My conclusion is that the approach can be scalable if combined with other approaches such as the simplification of polygons. Intro In this post, I am sharing an implementation of a spatial join type of analysis at scale using MongoDB. MongoDB is a Non-SQL database system, which is extensively used in industry to store large databases distributed over multiple (cloud) machines for storing files. My case is the analysis of a large database over 22 million geo-located tweets. My first objective is to implement a spatial join kind of analysis, that essentially counts tweets in censal radiuses, which are spatial polygons. In this case I have 15,700 polygons. Such an operation is standardly implemented in geospatial packages such as Arcgis or Qgis, and in Python, for example, using Geopandas. But my ultimate objective is finding a solution that is scalable with large amounts…
Instalacion y consultas a Google Big Query desde Jupyter
Instalacion y consultas a Google Big Query desde Jupyter Instalación y consultas a Google Big Query desde Jupyter Algunas notas para hacer un pedido a google big query. En este caso el objetivo es consultar la base de datos de Properati, y llevarla a un pandas. Agrego al final unos ultimos pasos para persistir la data en un mongo local. Instalación Google Cloud Voy a crear un ambiente virtual especifico usando conda. En este caso le agrego python 3.6. Le llamo bigquery xxxxxxxxxx conda create -n bigquery python=3.6 Activar el ambiente xxxxxxxxxx C:\Users\Richard>activate bigquery Dentro del ambiente puedo entrar a python, y voy a chequear desde donde python se esta ejecutando xxxxxxxxxx (bigquery) C:\Users\Richard>python Python 3.6.7 (default, Jul 2 2019, 02:21:41) [MSC v.1900 64 bit (AMD64)] on win32Type “help”, “copyright”, “credits” or “license” for more information. >>> import sys \>>> sys.executable’C:\\Users\\Richard\\AppData\\Local\\conda\\conda\\envs\\bigquery\\python.exe’>>> exit() El siguiente paso es instalar google-cloud en el ambiente. Lo instalo tambien desde conda. Lo siguiente no va a funcionar: xxxxxxxxxx (bigquery) C:\Users\Richard>conda install google-cloud Solving environment: failed PackagesNotFoundError: The following packages are not available from current channels: \- google-cloud La forma correcta es especificando conda-forge: xxxxxxxxxx (bigquery) C:\Users\Richard>conda…
Mapping with geopandas and basemapping with contextily
I find the geopandas library to be really useful for mapping with layers. Contextily is also a nice library that allows adding a background basemap. Using them together makes it fairly simple to visualize shapes such as polygons and points, together with contextual mapping information, such as in the following figure: Basemaps are drawn from OpenStreetMap under CC BY SA and map tiles are from Stamen Design, under CC BY 3.0. There are some options for tile design. View the code on Gist. If embedded notebook does not render try here
Table of Differences in Means Tests / Tabla de Tests de Diferencias de Medias
español In this post I leave you a simple Stata code that generates a table of means differences (between 2 groups) for a set of variables. It looks like this: A table of this type will be useful, for example, when the aim is to compare a treatment group and a control group across a series of variables. Stata has the ttest command to perform tests of this kind, but does not incorporate, as far as I know, a functionality for exporting a table of multiple tests. This code tests a large number of variables, with the advantage that it generates and exports a publication-style table. The table is saved in a text (.txt) file. Then, I usually import this table into Excel (insert> data> text) for final retouching before copying it to the final document. I leave you the Excel template as well. For simplicity, asterisks for significant statistics, parentheses and brackets are added – also automatically – by the Excel template. From the statistical point of view, it might be worth mentioning the subject of false-discovery rates, which could be relevant in an application of this type. I will leave it for for a future post. You can test the code…
Using loops to run (and export) many regressions / Usando loops para correr (y exportar) múltiples regresiones
español The use of loops becomes essential when needing to perform repetitive calculations. Looping has many advantages, for example, when needing to do corrections in all the calculations specifications. So here are some interesting features that you would like to do when implementing a loop to run many regressions, and export their outputs: Choose the appropriate method for the regression according to the type of dependent variable. For instance, you might want to estimate the model using OLS (regress) when the dependent variable is continuous and or a probit or a logit model when it is discrete (a dummy variable). Progressively add explanatory variables to the model and export all the output in a single table. This can be done using outreg2 ‘s replace and append options, but if you want instead to write a single command line inside a loop you will have to make the appropriate changes. So assume that you want to estimate a number of econometric models that are quite similar in terms of the explanatory variables that are incorporated, but differ between them in terms of the dependent variables, for example : Model 1: outcome1=b1*x1+b2*x2+b3*X3+b4*X4+e Model 2: outcome2=b1*x1+b2*x2+b3*X3+b4*X4+e In addition you also want to progressively add sets of explanatory variables. So for instance you…
Importing text files into Excel Part II / Importar archivos de texto al Excel Parte II
español For those who find useful to transfer the output of their statistics and regressions to Excel, here is another macro that might be useful. As in the last case, imagine that you are dealing with many tables of statistics and regressions that you have computed with Stata, and you will find useful to take them all to Excel. Such a thing might be useful for visualization, comparing statistics (robustness checks), formatting the tables for presentations or publications, elaborating further graphics, and so on. Building on the macro presented in the previous post, this time I built another one to deal with importing multiple files simultaneously. To see how it works, imagine, as an example, that you have 5 key variables that you are analysing and for each of them you have produced 4 tables (corresponding, for example to different estimation methods) and exported each respectively to a text file. That gives a total of 20 tables to be imported from Excel. First, it might be useful to organize the text files information in a table in Excel as shown in the table below: Using the information in the previous table, the multipletextload macro will : i. Generate a new…