Skip to Main Content

Data Analysis with Stata

estout Package

Using the estout package, you can automate the creation of publication-quality tables for summary statistics and regression results. To install estout, enter ssc install estout in the command line.

Tabulating Summary Statistics

This tutorial shows you how to produce a publication-quality table of summary statistics in Stata. The goal is to produce Table 2 below. Follow the steps outlined below, or if you prefer, refer to this do-file.

Image fo Table 2

Step 1

Load the following dataset into Stata using the sysuse command.

sysuse nlsw88, clear

The sysuse command loads into memory an example dataset from Stata's default directory in your computer's hard drive. It is very useful for experimenting with commands. To see a list of all available example datasets, type sysuse dir.

Step 2

It is always a good idea to browse and describe the dataset before you delve into it.

To obtain summary statistics of all the variables in the dataset, simply type summarize.

Note that in the table above, there are three race variables, whereas the dataset as provided only contains a single, categorical race variable called race.

We need to generate dummy or indicator variables for each value of race, and this is easily done using an extended version of the tabulate command:

tabulate race, gen(dum_race)

The above command creates three dummy variables, one for each race category.

It is useful to cross-tabulate the original race variable and the three derivative race dummies, using the tabulate command, to see for yourself that the variables were generated correctly:

tabulate race dum_race1
tabulate race dum_race2
tabulate race dum_race3

Step 3

Table 2 above shows summary statistics for only a subset of variables. To obtain these statistics, simply type summarize followed by the names of the variables for which you want summary statistics.

summarize age wage dum_race1 dum_race2 dum_race3 collgrad

Step 4

The naive way to insert these results into a table would be to copy the output displayed in the Stata results window and paste them in a word processor or spreadsheet. Stata offers a way to bypass this tedium.

First, install an add-on package called estout from Stata's servers. (Skip this step if estout is already installed on your computer.) Simply type

ssc install estout

Step 5

Modify the summarize command described in Step 3 by preceding it with the estpost command:

estpost summarize age wage dum_race1 dum_race2 dum_race3 collgrad

Store the calculated summary statistics in a macro using the eststo command. The macro can take on any name; let's call it summstats.

eststo summstats

Finally, print the statistics stored in summstats using the esttab command:

esttab summstats using table2.rtf, replace main(mean %6.2f) aux(sd)

This produces a nice enough table. For a little more bells and whistles, refer to this do-file.

More...

What if you want the table of summary statistics to look like this?

Image for Table 4

That is, you need separate columns for various subsamples. The basic code is:

eststo summstats: estpost summarize age wage dum_race1 dum_race2 dum_race3 collgrad
eststo grad: estpost summarize age wage dum_race1 dum_race2 dum_race3 if collgrad==1
eststo nograd: estpost summarize age wage dum_race1 dum_race2 dum_race3 if collgrad==0
esttab summstats grad nograd using table4.rtf, replace main(mean %6.2f) aux(sd) mtitle("Full sample" "College graduates" "Non-college graduates")

What if you want to display standard deviations in their own columns?

Image for Table 4 with std dev in own column

esttab summstats grad nograd using table4.rtf, replace cell("mean sd") mtitle("Full sample" "College graduates" "Non-college graduates")

Refer to this do-file for more detail.

You can even display differences in means from t-tests, plus associated statistics like standard errors:

Image for Table 5

eststo groupdiff: estpost ttest age wage dum_race1 dum_race2 dum_race3, by(collgrad)

esttab summstats grad nograd groupdiff using table5.rtf, replace cell("mean(pattern(1 1 1 0) fmt(2)) b(pattern(0 0 0 1) fmt(2)) se(pattern(0 0 0 1) fmt(2))") mtitle("Full sample" "College graduates" "Non-college graduates" "Difference (3)-(2)") nogaps compress

Tabulating Regression Results

This tutorial shows you how to produce a publication-quality table of regression results in Stata. The goal is to produce Table 3 below. Follow the steps outlined below, or if you prefer, refer to this do-file.

Image for Table 3 regression results

Step 1

Load the following dataset into Stata using the sysuse command.

sysuse nlsw88, clear

The sysuse command loads into memory an example dataset from Stata's default directory in your computer's hard drive. It is very useful for experimenting with commands. To see a list of all available example datasets, type sysuse dir.

Step 2

It is always a good idea to browse and describe the dataset before you delve into it.

To obtain summary statistics of all the variables in the dataset, simply type summarize.

Note that in the table above, there are three race variables, whereas the dataset as provided only contains a single, categorical race variable called race.

We need to generate dummy or indicator variables for each value of race, and this is easily done using an extended version of the tabulate command:

tabulate race, gen(dum_race)

The above command creates three dummy variables, one for each race category.

It is useful to cross-tabulate the original race variable and the three derivative race dummies, using the tabulate command, to see for yourself that the variables were generated correctly:

tabulate race dum_race1
tabulate race dum_race2
tabulate race dum_race3

Step 3

Regress wage on age, whether a person graduated college, and dummy variables for race:

regress wage age collgrad dum_race2 dum_race3

Step 4

The naive way to insert these results into a table would be to copy the output displayed in the Stata results window and paste them in a word processor or spreadsheet. Stata offers a way to bypass this tedium.

First, install an add-on package called estout from Stata's servers. (Skip this step if estout is already installed on your computer.) Simply type

ssc install estout

Step 5

Run the regression specified in Step 3.

Using the eststo command, store the regression results in a macro, call it example:

eststo example

IMPORTANT: eststo must come immediately after regress.

Finally, using the esttab command, print the regression results to a table:

esttab example

The table can be saved in an external file for use by a Word processor:

esttab example using table3.rtf

This produces a nice enough table. For a little more bells and whistles, refer to this do-file.

More...


The estout package makes it easy to display results from more than one regression in a single table, with each regression occupying one column. For example,

regress wage age
eststo model1
regress wage age collgrad
eststo model2
esttab model1 model2

produces the following table:

--------------------------------------------
                      (1)             (2)   
                     wage            wage   
--------------------------------------------
age               -0.0680         -0.0643   
                  (-1.71)         (-1.68)   
collgrad                            3.612***
                                  (13.12)   
_cons               10.43***        9.430***
                   (6.69)          (6.27)   
--------------------------------------------
N                    2246            2246   
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001