Skip to Main Content

Data Analysis with Stata

About This Guide

Learn how to start using Stata on this page. Other pages, accessible from the menu bar at left, are set up as stand-alone tutorials on how to accomplish specific tasks.

About Stata

Stata is a commercial statistical software package widely used by quantitative social scientists (e.g. economists, sociologists, political scientists). It has an extensive collection of commands that can be used to easily accomplish practically any manipulation and analysis of data that one would need. It also allows relatively easy access to programming features. You do not need to know a programming language to start using Stata, although an understanding of basic programming concepts is helpful.

The Stata Environment

Image of Stata environment

Box A: This is the command line interface. This is where you type in your commands, similar to what you would do on any Unix-type terminal.

Box B: This is where all the variables in the dataset that is currently in memory are displayed.

Box C: This is where a history of all commands you have entered is displayed. This history vanishes when you terminate Stata.

Box D: This is where the result of each command entered is displayed.

Reading Data

You can think of data loaded into Stata as a spreadsheet, with each row containing an observation, and each column containing a variable.

Here are all the commands for loading data into Stata. Which command you use depends on the file type that is being read.

Command File Type File Extension Tip
use Stata format *.dta When using the use command to read a dataset that is in Stata format (*.dta), the file extension can be omitted.
infix Fixed format ASCII *.dat, *.raw, *.fix, or none At the very least, the data provider must provide the necessary information that lets you decipher a fixed format file. In practice, data providers frequently complement fixed format data with the appropriate do-file (sometimes called a setup file) to read the data into Stata. Here is an example of a do-file from a data provider that reads in fixed format data.
infile (version 1) Free format ASCII *.dat, *.raw, *.fix, or none  
infile (version 2) Fixed format ASCII, with a dictionary *.dat, *.raw, *.fix, or none  
import delimited Text-delimited (e.g. comma, tab, space) ASCII *.dat, *.raw, *.fix, *.csv, or none Use the drop-down menu: File > Import > Text data (delimited, *.csv, ...)
import excel Excel *.xls, *.xlsx Use the drop-down menu: File > Import > Excel spreadsheet

Sending Commands

You instruct Stata to accomplish specific tasks by entering commands. There are three ways to enter commands:

  • Point and click.
  • Enter commands in the command line interface.
  • Write commands in a "do-file" and execute them from the do-file.

Do-Files

Anytime you expect to work on a project in more than one sitting, you should use a do-file.

Why should you work with do-files? A do-file contains every command that you ever used for your project, from the very first step (loading data) to the very last (exporting your results). It documents every step you took in the process of manipulating and analyzing data. If you need to modify or repeat certain steps, you simply modify your do-file appropriately instead of redoing everything.

To create a do-file, in Stata click on the icon on the toolbar that looks like this:  (If you hover over the icon in Stata, a label saying "New Do-file Editor" appears.)

Alternatively, on the top menu bar, choose Window > Do-file Editor > New Do-file Editor.


Screencast produced by the Office of Digital Learning

Basic Rules

Variable Names

Variable names are case sensitive. For example, a variable named AGE is distinct from a variable named age.

Variable names can contain only alphanumeric characters and cannot start with a number. Exception to the rule: the underscore is allowed anywhere in variable names.

Operators

Assignment operator
Assignment =
Relational operators
Equals ==
Not equals != (alternatively, ~=)
Less than <
Less than or equal to <=
Greater than >
Greater than or equal to >=
Logical operators
And &
Or |
Not ! (alternatively ~)

Conditional Statements

The if conditional statement can be used two ways:

  • As a qualifier at the end of a command. if at the end of a command means that the command is to use only the observations specified. The if qualifier is allowed with most Stata commands. For example, the following command regresses gdp on happy using observations between the years 1975 and 1997:
    regress gdp happy if year>=1975 & year<=1997

     

  • As a programming command, like you would in any other language. The syntax for if is strict; for illustrative examples, see Stata's help file for if (enter help ifcmd in the command line).

Other Resources

Other resources to help you learn Stata: