Table of Differences in Means Tests / Tabla de Tests de Diferencias de Medias


In this post I leave you a simple Stata code that generates a table of means differences (between 2 groups) for a set of variables. It looks like this:

A table of this type will be useful, for example, when the aim is to compare a treatment group and a control group across a series of variables.

Stata has the ttest command to perform tests of this kind, but does not incorporate, as far as I know, a functionality for exporting a table of multiple tests.

This code tests a large number of variables, with the advantage that it  generates and exports a publication-style table. The table is saved in a text (.txt) file. Then, I usually import this table into Excel (insert> data> text) for final retouching before copying it to the final document. I leave you the Excel template as well. For simplicity, asterisks for significant statistics, parentheses and brackets are added – also automatically – by the Excel template.

From the statistical point of view, it might be worth mentioning the subject of false-discovery rates, which could be relevant in an application of this type. I will leave it for for a future post.

You can test the code with this database, which contains the results of a survey for two groups (one “treatment” and one “control”) of individuals on urban attributes’ quality .

If you find it useful, or for comments and suggestions please leave me a comment!

*Define folder for text file output
cd "C:\Users\Ricardo Pasquini\Documents\otros"
* A global macro handles the names of the multiple variables 
global TTESTVARS "veredalimpia_nota verdes_nota ilum_nota nocheseg_nota veredallovio_nota arbol_nota ordentrans"
* You can also specify labels. Be shure of keeping word connected by underlines
global TTESTLABELS "veredas_limpias espacios_verdes iluminacion seguridad_de_noche veredas_cuando_llueve arbolado orden_transito"
* Define here the name of the binary variable that signals the two groups. In this case is called "intervencion"
gen groups=intervencion

/* Define el nombre de la tabla*/
capture: erase descstats.txt
 file open fh using descstats.txt, write replace
 file write fh _n _tab "Difference in Means Test" 
 file write fh _n _tab "Variable" _tab "Obs Control" _tab "Media Control" _tab "Obs Tratamiento" _tab "Mean Treatment" _tab "Diff" _tab "t" 
 local i=1
 foreach var of global TTESTVARS {
 local etiqueta : word `i' of $TTESTLABELS
 capture quietly ttest `var', by(groups) 
 local se_1 = (r(sd_1))/(r(N_1))^0.5
 local se_2 = (r(sd_2))/(r(N_2))^0.5
 local dif =(r(mu_1)-r(mu_2))
 file write fh _n _tab "`etiqueta'" _tab %12.0fc (r(N_1)) _tab %12.3fc (r(mu_1)) _tab %12.0fc (r(N_2)) _tab %12.3fc (r(mu_2)) _tab %12.3fc (`dif') _tab %12.3fc (r(t)) 
 file write fh _n _tab _tab _tab %12.3fc (`se_1') _tab _tab %12.3fc (`se_2') _tab %12.3fc (r(se)) _tab %12.3fc (r(p)) 
 local i=`i'+1
 file close fh
 type descstats.txt

Using loops to run (and export) many regressions / Usando loops para correr (y exportar) múltiples regresiones


The use of  loops becomes essential when needing to perform repetitive calculations. Looping has many advantages, for example, when needing to do corrections in all the calculations specifications. So here are some interesting features that you would like to do when implementing a loop to run many regressions, and export their outputs:

  • Choose the appropriate method for the regression according to the type of dependent variable. For instance,  you might want to estimate the model using OLS (regress) when the dependent variable is continuous and or a probit or a logit model when it is discrete (a dummy variable).
  • Progressively add explanatory variables to the model and export all the output in a single table. This can be done using outreg2 ‘s  replace and append options, but if you want instead to write a single command line inside a loop you will have to make the appropriate changes.

So assume that you want to estimate a number of econometric models that are quite similar in terms of the explanatory variables that are incorporated, but differ between them in terms of the dependent variables, for example :

Model 1: outcome1=b1*x1+b2*x2+b3*X3+b4*X4+e
Model 2: outcome2=b1*x1+b2*x2+b3*X3+b4*X4+e

In addition you also want to progressively add sets of explanatory variables. So for instance you want to estimate:

Model 1: outcome1=b1*x1+b2*x2+e
Model 2: outcome1=b1*x1+b2*x2+b3*X3+b4*X4+e
Model 3: outcome2=b1*x1+b2*x2+e
Model 4: outcome2=b1*x1+b2*x2+b3*X3+b4*X4+e

Of course the code should work for n number of explanatory variables and k number of explanatory vars. Let’s do it.

What I am going to do is to first define the sets of explanatory variables using separate globals, for instance:

global CONTROLGROUP1 "x1 x2"
global CONTROLGROUP2 "x3 x4"

* And add the outcome variables into a global

global OUTCOMES "outcomevar1 outcomevar2 "

/*Start by looping on the types of explanatory variables, in this case we have discrete and continuous variables
for each discrete variable we are running a probit model, and for each ordinal variable we will run an ordinal probit model
global TIPO "disc ord"
foreach tipo of global TIPO {

/*Here is an interesting feature: using vartyp to separate the outcome variables into groups of discrete and ordinal variables.
In this case, the first loop will assign all discrete variables to the
global OUTCOMEBYTYPE, the second loop will assign the ordinal variables to the global OUTCOMEBYTYPE and so on.
	vartyp $OUTCOMES, list(`tipo')
	global OUTCOMEBYTYPE "`r(varlist)'"

/*Now assign the appropiate regression methodology to each type of explanatory variable*/

	if "`tipo'"=="disc" {
	local method "dprobit"
	else {
	local method "oprobit"

/*Start the loop of outcomes to be regressed*/
	foreach outcome of global OUTCOMEBYTYPE {

		/*We are not using replace and append when exporting output to tables, so we need to erase the output file
		if already exists*/
		capture confirm file reg_`outcome'.txt
					if !_rc {
							erase reg_`outcome'.txt
					else {
					*no action when file not found

		/*Start the loop that accumulates explanatory variables. Using while instead of foreach will allow adding more
		sets of explanatory variables later*/
		global i = 1

		while $i < 4 {
		/*Finally, the lines for running the regression and exporting the output. It is important to use capture when doing this
		since if any of the regression ends in an error (for example due to no observations), the loop will continue*/

		    capture noisily xi: `method' `outcome' $CONTROLS if respondiente==1 [fweight=pesosf]
			capture noisily outreg2 using reg_`outcome'.txt, append bdec(6) e(all)

		*The following lines do the trick to add the explanatory sets of variables
		global i = $i + 1

		*Close explanatory variables loop
			/* Use this space to add other alternative specifications*/

		macro drop CONTROLS
		macro drop i

	*Close outcomes loop
*Close type loop