- Overview
- Manipulation
- Graphing
- Tables
- merge/append
- reshape
- egen
- by, _n, _N
- Date/Time
- Loops
- Programs
- preserve/restore
- Temporary files
- Maps
- FAQs

Michael Deike

Thomas Mahaffey, Jr. Business Library

L001D Mendoza College of Business

University of Notre Dame

Notre Dame, IN 46556

mdeike@nd.edu

This tutorial shows you how to produce a publication-quality table of summary statistics in Stata. The goal is to produce Table 2 below. Follow the steps outlined below, or if you prefer, refer to this do-file.

Load the following dataset into Stata using the sysuse command.

sysuse nlsw88, clear

The sysuse command loads into memory an example dataset from Stata's default directory in your computer's hard drive. It is very useful for experimenting with commands. To see a list of all available example datasets, type sysuse dir.

It is always a good idea to browse and describe the dataset before you delve into it.

To obtain summary statistics of all the variables in the dataset, simply type summarize.

Note that in the table above, there are three race variables, whereas the dataset as provided only contains a single, categorical race variable called race.

We need to generate dummy or indicator variables for each value of race, and this is easily done using an extended version of the tabulate command:

tabulate race, gen(dum_race)

The above command creates three dummy variables, one for each race category.

It is useful to cross-tabulate the original race variable and the three derivative race dummies, using the tabulate command, to see for yourself that the variables were generated correctly:

tabulate race dum_race1

tabulate race dum_race2

tabulate race dum_race3

Table 2 above shows summary statistics for only a subset of variables. To obtain these statistics, simply type summarize followed by the names of the variables for which you want summary statistics.

summarize age wage dum_race1 dum_race2 dum_race3 collgrad

The naive way to insert these results into a table would be to copy the output displayed in the Stata results window and paste them in a word processor or spreadsheet. Stata offers a way to bypass this tedium.

First, install an add-on package called estout from Stata's servers. (Skip this step if estout is already installed on your computer.) Simply type

ssc install estout

Modify the summarize command described in Step 3 by preceding it with the estpost command:

estpost summarize age wage dum_race1 dum_race2 dum_race3 collgrad

Store the calculated summary statistics in a macro using the eststo command. The macro can take on any name; let's call it summstats.

eststo summstats

Finally, print the statistics stored in summstats using the esttab command:

esttab summstats using table2.rtf, replace main(mean %6.2f) aux(sd)

This produces a nice enough table. For a little more bells and whistles, refer to this do-file.

What if you want the table of summary statistics to look like this?

That is, you need separate columns for various subsamples. The basic code is:

eststo summstats: estpost summarize age wage dum_race1 dum_race2 dum_race3 collgrad

eststo grad: estpost summarize age wage dum_race1 dum_race2 dum_race3 if collgrad==1

eststo nograd: estpost summarize age wage dum_race1 dum_race2 dum_race3 if collgrad==0

esttab summstats grad nograd using table4.rtf, replace main(mean %6.2f) aux(sd) mtitle("Full sample" "College graduates" "Non-college graduates")

What if you want to display standard deviations in their own columns?

esttab summstats grad nograd using table4.rtf, replace cell("mean sd") mtitle("Full sample" "College graduates" "Non-college graduates")

Refer to this do-file for more detail.

You can even display differences in means from t-tests, plus associated statistics like standard errors:

eststo groupdiff: estpost ttest age wage dum_race1 dum_race2 dum_race3, by(collgrad)

esttab summstats grad nograd groupdiff using table5.rtf, replace cell("mean(pattern(1 1 1 0) fmt(2)) b(pattern(0 0 0 1) fmt(2)) se(pattern(0 0 0 1) fmt(2))") mtitle("Full sample" "College graduates" "Non-college graduates" "Difference (3)-(2)") nogaps compress

This tutorial shows you how to produce a publication-quality table of regression results in Stata. The goal is to produce Table 3 below. Follow the steps outlined below, or if you prefer, refer to this do-file.

Load the following dataset into Stata using the sysuse command.

sysuse nlsw88, clear

The sysuse command loads into memory an example dataset from Stata's default directory in your computer's hard drive. It is very useful for experimenting with commands. To see a list of all available example datasets, type sysuse dir.

It is always a good idea to browse and describe the dataset before you delve into it.

To obtain summary statistics of all the variables in the dataset, simply type summarize.

Note that in the table above, there are three race variables, whereas the dataset as provided only contains a single, categorical race variable called race.

We need to generate dummy or indicator variables for each value of race, and this is easily done using an extended version of the tabulate command:

tabulate race, gen(dum_race)

The above command creates three dummy variables, one for each race category.

It is useful to cross-tabulate the original race variable and the three derivative race dummies, using the tabulate command, to see for yourself that the variables were generated correctly:

tabulate race dum_race1

tabulate race dum_race2

tabulate race dum_race3

Regress wage on age, whether a person graduated college, and dummy variables for race:

regress wage age collgrad dum_race2 dum_race3

The naive way to insert these results into a table would be to copy the output displayed in the Stata results window and paste them in a word processor or spreadsheet. Stata offers a way to bypass this tedium.

First, install an add-on package called estout from Stata's servers. (Skip this step if estout is already installed on your computer.) Simply type

ssc install estout

Run the regression specified in Step 3.

Using the eststo command, store the regression results in a macro, call it example:

eststo example

IMPORTANT: eststo must come immediately after regress.

Finally, using the esttab command, print the regression results to a table:

esttab example

The table can be saved in an external file for use by a Word processor:

esttab example using table3.rtf

This produces a nice enough table. For a little more bells and whistles, refer to this do-file.

The estout package makes it easy to display results from more than one regression in a single table, with each regression occupying one column. For example,

regress wage age

eststo model1

regress wage age collgrad

eststo model2

esttab model1 model2

produces the following table:

--------------------------------------------

(1) (2)

wage wage

--------------------------------------------

age -0.0680 -0.0643

(-1.71) (-1.68)

collgrad 3.612***

(13.12)

_cons 10.43*** 9.430***

(6.69) (6.27)

--------------------------------------------

N 2246 2246

--------------------------------------------

t statistics in parentheses

* p<0.05, ** p<0.01, *** p<0.001

- Last Updated: Feb 7, 2023 2:56 PM
- URL: https://libguides.library.nd.edu/data-analysis-stata
- Print Page