Navari Family Center for Digital Scholarship
250 Hesburgh Library
University of Notre Dame
Notre Dame, IN 46556
Learn how to start using Stata on this page. Other pages, accessible from the menu bar at left, are set up as stand-alone tutorials on how to accomplish specific tasks.
Stata is a commercial statistical software package widely used by quantitative social scientists (e.g. economists, sociologists, political scientists). It has an extensive collection of commands that can be used to easily accomplish practically any manipulation and analysis of data that one would need. It also allows relatively easy access to programming features. You do not need to know a programming language to start using Stata, although an understanding of basic programming concepts is helpful.
Box A: This is the command line interface. This is where you type in your commands, similar to what you would do on any Unix-type terminal.
Box B: This is where all the variables in the dataset that is currently in memory are displayed.
Box C: This is where a history of all commands you have entered is displayed. This history vanishes when you terminate Stata.
Box D: This is where the result of each command entered is displayed.
You can think of data loaded into Stata as a spreadsheet, with each row containing an observation, and each column containing a variable.
Here are all the commands for loading data into Stata. Which command you use depends on the file type that is being read.
Command | File Type | File Extension | Tip |
---|---|---|---|
use | Stata format | *.dta | When using the use command to read a dataset that is in Stata format (*.dta), the file extension can be omitted. |
infix | Fixed format ASCII | *.dat, *.raw, *.fix, or none | At the very least, the data provider must provide the necessary information that lets you decipher a fixed format file. In practice, data providers frequently complement fixed format data with the appropriate do-file (sometimes called a setup file) to read the data into Stata. Here is an example of a do-file from a data provider that reads in fixed format data. |
infile (version 1) | Free format ASCII | *.dat, *.raw, *.fix, or none | |
infile (version 2) | Fixed format ASCII, with a dictionary | *.dat, *.raw, *.fix, or none | |
import delimited | Text-delimited (e.g. comma, tab, space) ASCII | *.dat, *.raw, *.fix, *.csv, or none | Use the drop-down menu: File > Import > Text data (delimited, *.csv, ...) |
import excel | Excel | *.xls, *.xlsx | Use the drop-down menu: File > Import > Excel spreadsheet |
You instruct Stata to accomplish specific tasks by entering commands. There are three ways to enter commands:
Anytime you expect to work on a project in more than one sitting, you should use a do-file.
Why should you work with do-files? A do-file contains every command that you ever used for your project, from the very first step (loading data) to the very last (exporting your results). It documents every step you took in the process of manipulating and analyzing data. If you need to modify or repeat certain steps, you simply modify your do-file appropriately instead of redoing everything.
To create a do-file, in Stata click on the icon on the toolbar that looks like this: (If you hover over the icon in Stata, a label saying "New Do-file Editor" appears.)
Alternatively, on the top menu bar, choose Window > Do-file Editor > New Do-file Editor.
Variable names are case sensitive. For example, a variable named AGE is distinct from a variable named age.
Variable names can contain only alphanumeric characters and cannot start with a number. Exception to the rule: the underscore is allowed anywhere in variable names.
Assignment operator | |
---|---|
Assignment | = |
Relational operators | |
---|---|
Equals | == |
Not equals | != (alternatively, ~=) |
Less than | < |
Less than or equal to | <= |
Greater than | > |
Greater than or equal to | >= |
Logical operators | |
---|---|
And | & |
Or | | |
Not | ! (alternatively ~) |
The if conditional statement can be used two ways:
Other resources to help you learn Stata: