Thomas Mahaffey, Jr. Business Library
L001D Mendoza College of Business
University of Notre Dame
Notre Dame, IN 46556
Stata has two system variables that always exist as long as data is loaded, _n and _N. _n basically indexes observations (rows): _n = 1 is the first row, _n = 2 is the second, and so on. _N denotes the total number of rows.
To illustrate, let’s use stocks.dta. This portfolio contains 32 observations.
use "https://www3.nd.edu/~jng2/libguide/stocks.dta", clear
browse
The data aren't sorted in any particular order. Before we do anything else, it makes sense to first sort the data:
sort stockid year
This sorts observations first by stockid then by year, in ascending order.
To get a sense of what _n and _N can be used for, enter these two commands:
gen obsnum = _n
gen totnum = _N
_n and _N can also be used within subgroups. To generate counters like the above for each stockid, do this:
use "https://www3.nd.edu/~jng2/libguide/stocks.dta", clear
sort stockid year
by stockid: gen obsnum = _n
by stockid: gen totnum = _N
Equivalently, instead of sorting unsorted data prior to by, use bysort:
bysort stockid (year): gen obsnum = _n
bysort stockid (year): gen totnum = _N
Now that you have a sense of what _n and _N do, let's use _n in combination with by to perform a concrete task. We will fill in the blanks in the ticker variable (this assumes that the ticker symbols for these stocks did not change over time).
First, make sure that the data are sorted by stockid and year.
use "https://www3.nd.edu/~jng2/libguide/stocks.dta", clear
sort stockid year
To fill in row 6 of ticker with the appropriate value ("AMZN"), you could do this:
replace ticker = "AMZN" in 6
To fill in rows 6 through 10 with "AMZN", you could do this:
replace ticker = "AMZN" in 6/10
Clearly, with multiple observations to replace, replacing values line-by-line this way becomes cumbersome and prone to human error. We can automate the process using by and _n.
bysort stockid (year): replace ticker=ticker[1] if _n>1
The above command states that within each stockid group (that has been sorted by ascending order of year), all ticker observations except the first are to be replaced with the value from the first year.
Notice that we messed up since the tickers for stockid==5 and ==6 are now all blanks. We need to modify our algorithm to avoid overwriting ticker values with blanks. Reload the original data and start from scratch.
use "https://www3.nd.edu/~jng2/libguide/stocks.dta", clear
The years don't really matter for this task, so instead of sorting by stockid and year, let's sort in ascending order of stockid and then descending order of ticker:
gsort + stockid - ticker
As you can see, the first observation within each stockid now contains the correct ticker symbol, so filling in the blanks is now a simple matter of
by stockid: replace ticker=ticker[1] if _n>1