Table of Differences in Means Tests / Tabla de Tests de Diferencias de Medias


In this post I leave you a simple Stata code that generates a table of means differences (between 2 groups) for a set of variables. It looks like this:

A table of this type will be useful, for example, when the aim is to compare a treatment group and a control group across a series of variables.

Stata has the ttest command to perform tests of this kind, but does not incorporate, as far as I know, a functionality for exporting a table of multiple tests.

This code tests a large number of variables, with the advantage that it  generates and exports a publication-style table. The table is saved in a text (.txt) file. Then, I usually import this table into Excel (insert> data> text) for final retouching before copying it to the final document. I leave you the Excel template as well. For simplicity, asterisks for significant statistics, parentheses and brackets are added – also automatically – by the Excel template.

From the statistical point of view, it might be worth mentioning the subject of false-discovery rates, which could be relevant in an application of this type. I will leave it for for a future post.

You can test the code with this database, which contains the results of a survey for two groups (one “treatment” and one “control”) of individuals on urban attributes’ quality .

If you find it useful, or for comments and suggestions please leave me a comment!

*Define folder for text file output
cd "C:\Users\Ricardo Pasquini\Documents\otros"
* A global macro handles the names of the multiple variables 
global TTESTVARS "veredalimpia_nota verdes_nota ilum_nota nocheseg_nota veredallovio_nota arbol_nota ordentrans"
* You can also specify labels. Be shure of keeping word connected by underlines
global TTESTLABELS "veredas_limpias espacios_verdes iluminacion seguridad_de_noche veredas_cuando_llueve arbolado orden_transito"
* Define here the name of the binary variable that signals the two groups. In this case is called "intervencion"
gen groups=intervencion

/* Define el nombre de la tabla*/
capture: erase descstats.txt
 file open fh using descstats.txt, write replace
 file write fh _n _tab "Difference in Means Test" 
 file write fh _n _tab "Variable" _tab "Obs Control" _tab "Media Control" _tab "Obs Tratamiento" _tab "Mean Treatment" _tab "Diff" _tab "t" 
 local i=1
 foreach var of global TTESTVARS {
 local etiqueta : word `i' of $TTESTLABELS
 capture quietly ttest `var', by(groups) 
 local se_1 = (r(sd_1))/(r(N_1))^0.5
 local se_2 = (r(sd_2))/(r(N_2))^0.5
 local dif =(r(mu_1)-r(mu_2))
 file write fh _n _tab "`etiqueta'" _tab %12.0fc (r(N_1)) _tab %12.3fc (r(mu_1)) _tab %12.0fc (r(N_2)) _tab %12.3fc (r(mu_2)) _tab %12.3fc (`dif') _tab %12.3fc (r(t)) 
 file write fh _n _tab _tab _tab %12.3fc (`se_1') _tab _tab %12.3fc (`se_2') _tab %12.3fc (r(se)) _tab %12.3fc (r(p)) 
 local i=`i'+1
 file close fh
 type descstats.txt

Using loops to run (and export) many regressions / Usando loops para correr (y exportar) múltiples regresiones


The use of  loops becomes essential when needing to perform repetitive calculations. Looping has many advantages, for example, when needing to do corrections in all the calculations specifications. So here are some interesting features that you would like to do when implementing a loop to run many regressions, and export their outputs:

  • Choose the appropriate method for the regression according to the type of dependent variable. For instance,  you might want to estimate the model using OLS (regress) when the dependent variable is continuous and or a probit or a logit model when it is discrete (a dummy variable).
  • Progressively add explanatory variables to the model and export all the output in a single table. This can be done using outreg2 ‘s  replace and append options, but if you want instead to write a single command line inside a loop you will have to make the appropriate changes.

So assume that you want to estimate a number of econometric models that are quite similar in terms of the explanatory variables that are incorporated, but differ between them in terms of the dependent variables, for example :

Model 1: outcome1=b1*x1+b2*x2+b3*X3+b4*X4+e
Model 2: outcome2=b1*x1+b2*x2+b3*X3+b4*X4+e

In addition you also want to progressively add sets of explanatory variables. So for instance you want to estimate:

Model 1: outcome1=b1*x1+b2*x2+e
Model 2: outcome1=b1*x1+b2*x2+b3*X3+b4*X4+e
Model 3: outcome2=b1*x1+b2*x2+e
Model 4: outcome2=b1*x1+b2*x2+b3*X3+b4*X4+e

Of course the code should work for n number of explanatory variables and k number of explanatory vars. Let’s do it.

What I am going to do is to first define the sets of explanatory variables using separate globals, for instance:

global CONTROLGROUP1 "x1 x2"
global CONTROLGROUP2 "x3 x4"

* And add the outcome variables into a global

global OUTCOMES "outcomevar1 outcomevar2 "

/*Start by looping on the types of explanatory variables, in this case we have discrete and continuous variables
for each discrete variable we are running a probit model, and for each ordinal variable we will run an ordinal probit model
global TIPO "disc ord"
foreach tipo of global TIPO {

/*Here is an interesting feature: using vartyp to separate the outcome variables into groups of discrete and ordinal variables.
In this case, the first loop will assign all discrete variables to the
global OUTCOMEBYTYPE, the second loop will assign the ordinal variables to the global OUTCOMEBYTYPE and so on.
	vartyp $OUTCOMES, list(`tipo')
	global OUTCOMEBYTYPE "`r(varlist)'"

/*Now assign the appropiate regression methodology to each type of explanatory variable*/

	if "`tipo'"=="disc" {
	local method "dprobit"
	else {
	local method "oprobit"

/*Start the loop of outcomes to be regressed*/
	foreach outcome of global OUTCOMEBYTYPE {

		/*We are not using replace and append when exporting output to tables, so we need to erase the output file
		if already exists*/
		capture confirm file reg_`outcome'.txt
					if !_rc {
							erase reg_`outcome'.txt
					else {
					*no action when file not found

		/*Start the loop that accumulates explanatory variables. Using while instead of foreach will allow adding more
		sets of explanatory variables later*/
		global i = 1

		while $i < 4 {
		/*Finally, the lines for running the regression and exporting the output. It is important to use capture when doing this
		since if any of the regression ends in an error (for example due to no observations), the loop will continue*/

		    capture noisily xi: `method' `outcome' $CONTROLS if respondiente==1 [fweight=pesosf]
			capture noisily outreg2 using reg_`outcome'.txt, append bdec(6) e(all)

		*The following lines do the trick to add the explanatory sets of variables
		global i = $i + 1

		*Close explanatory variables loop
			/* Use this space to add other alternative specifications*/

		macro drop CONTROLS
		macro drop i

	*Close outcomes loop
*Close type loop

Importing text files into Excel Part II / Importar archivos de texto al Excel Parte II


For those who find useful to transfer the output of their statistics and regressions to Excel, here is another macro that might be useful.

As in the last case, imagine that you are dealing with many tables of statistics and regressions that you have computed with Stata, and you will find useful to take them all to Excel. Such a thing might be useful for visualization, comparing statistics (robustness checks), formatting the tables for presentations or publications, elaborating further graphics, and so on.

Building on the macro presented in the previous post, this time I built another one to deal with importing multiple files simultaneously.

To see how it works, imagine, as an example, that you have 5 key variables that you are analysing and for each of them you have produced 4 tables (corresponding, for example to different estimation methods) and exported each respectively to a text file. That gives a total of 20 tables to be imported from Excel.

First, it might be useful to organize the text files information in a table in Excel as shown in the table below:

Using the information in the previous table, the multipletextload macro will :

i. Generate a new sheet for each variable (one for each row in the reference table) andname the sheet with the name of the corresponding variable.
ii. Paste the contents of all the tables corresponding to a same variable (in columns) within a same sheet, one table next to the right to each other.
iii. In case the macro do not find a specific file (might be the case that you did not want to apply a method for a given variable), will skip it and jump to the next file to import.

The macro uses the macro “Textload” that I have posted in the previous post, so you will have to generate the former one in order to run this one.

In order to run it, just select the contents of the table (in my example first select cells A2:E6 and then run the macro)

Sub multipletextload()
  Dim rgFiles As Range
  Dim i As Long, j As Long

  Dim sName As String
  Dim fName As String
  Dim strWork As String

 'Dim LastRow As Long
 Dim LastCol As Long
  strWork = ThisWorkbook.Name
  Set rgFiles = Selection
'Set rgFiles = Range("B2:F4")

  For i = 1 To rgFiles.Rows.Count
    'Row i Column 1 has the name of the variable. generate and name new sheet
    sName = rgFiles.Cells(i, 1).Value
    'sName = ActiveCell.Value
    Sheets.Add(After:=ActiveSheet).Name = sName
    'Add After:=ActiveSheet,
    For j = 2 To rgFiles.Columns.Count
    ' Row i Columns 2 onwards have the names of the files to import
    fName = rgFiles.Cells(i, j).Value
    'If there is no name in the cell, not do anything
    If fName = "" Then

      If j = 2 Then
      ' Only the file in Column 2 will be imported into cell  (1,1)
        Worksheets(sName).Cells(1, 1).Activate
        ActiveCell.FormulaR1C1 = fName
        Application.Run "Textload"
'        Application.Run "'reg results_vtest.xlsm'!Textload"
      ' The remaining will be imported to cell location (1,LastCol)
        Worksheets(sName).Cells(1, 1).Activate
        Set rgLast = Worksheets(sName).Range("A1").SpecialCells(xlCellTypeLastCell)
        'LastRow = rgLast.Row
        LastCol = rgLast.Column
        Worksheets(sName).Range("A1").Cells(1, LastCol).FormulaR1C1 = fName
        Worksheets(sName).Range("A1").Cells(1, LastCol).Activate
        Application.Run "Textload"
      End If
    End If

    Next j
  Next i

End Sub