-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathtutorial.domd
180 lines (125 loc) · 5.87 KB
/
tutorial.domd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---
layout: post
title: Stata Tutorial
---
Stata is a commonly used tool for empirical research. Stata comes
with an extensive library of statistical methods, and there are
additional user written methods that extend the functionality of Stata
even further.
Stata stores data in memory as a single matrix. If you are familiar
with Microsoft Excel Workbooks, Stata stores a single Worksheet in
memory where each column has a name and each row is numbered from 1 to
the total number of rows in the dataset.
This tutorial aims to introduce you to the key features of Stata and
its documentation so you can start your own empirical work.
Typing Commands
---------------
The `display` command is useful for showing values at the command
line.
display 1 + 2
Use the `Page Up` key to recall the previous command evaluated. This
is particularly useful if you need to fix a typo.
Commands can be abbreviated, `di` is equivalent to `display`. I
prefer to use the whole command name because it makes code explicit.
Getting Help
------------
Use the `help` command if you know the name of the function and want
more details. Use the `findit` command if you want to find a
function. I end up using Google more than `findit`, but this may be a
mistake.
Unfortunately the help command opens a new window each time you use
it, use the `nonew` option to prevent this behavior,
`help help, nonew`.
Reading Data Into Stata
-----------------------
There are many different ways to read data into Stata. To get a good
overview of how to import data into Stata type `help import` in
Stata's Command window. The functions I use most are `import excel`
and `insheet`. `import excel` is great if you are working with an
Excel workbook, while `insheet` is great if you have a comma-separated
values (csv) file.
Stata datasets are generally stored in files with a `.dta` extension.
To read a Stata dataset use the `use` command. For the purpose of
this tutorial we will use a dataset shipped with Stata about
automobiles. Type in `sysuse auto` to load the dataset into memory.
sysuse auto, clear
Descriptive Statistics
----------------------
The `describe` command gives useful information about the variables in
the dataset and the number of rows in the dataset.
describe
The `summarize` command gives some useful summary statistics for each
variable.
summarize
You'll notice that 11 of 12 variables in the auto dataset are numeric
and the `make` variable is a string. To see what the make variable
looks like, we can list the first few observations.
list make if _n <= 5
To see if `make` uniquely identifies each row in the dataset we can
use the `isid` function.
isid make
When `isid` says nothing the variable list does uniquely identify each
row. Are cars uniquely identified by their weight and length?
duplicates report make
duplicates report weight length
Imagine we are interested in looking at how foreign and domestic cars
differ. As a first step, it would be good to examine some summary
statistics for foreign and domestic cars, the `tabstat` command makes
this fairly easy.
tabstat price mpg weight length, by(foreign) stat(mean sd)
You may have noticed from the output of the `summarize` command that
`rep78` has 5 missing values. We can look at those observations using
the list command:
list if missing(rep78)
Graphs
------
There are good graph galleries provided by [StataCorp][graphs1],
[UCLA][graphs2], and [Survey Design and Analysis Services][graphs3].
Below is a simple scatter plot of weight versus length:
graph twoway scatter weight length
graph export scatter.png, replace

Creating New Variables
----------------------
There are a number of ways to create new variables or modifying
existing variables. The most important command in this section is the
`generate` command. Imagine we are curious about cars that are heavy
for their length we could create a new variable
generate weight_per_length = weight / length
This creates a new column in the dataset, for each car we have
calculated the ratio of that car's weight to its length. Let's take a
look at the top five heaviest cars per length.
gsort -weight_per_length
list make weight_per_length if _n <= 5
Another very useful command for generating new variables is the `egen`
command. This is particularly useful is you want to merge summary
statistics for groups of cars back into the larger dataset. For
instance, we might be curious to see how a car's price compares to the
average price among foreign or domestic cars. We can find the average
price for foreign and domestic cars using tabstat, but how do we make
a column in the dataset with these values?
tabstat price, by(foreign)
egen ave_price = mean(price), by(foreign)
list foreign ave_price
Regressions
-----------
To further explore the relationship between weight and length we can
run a regression.
regress weight length
We see that on average, each additional inch is associated with 33
pounds. We can plot the predicted values from the regression on the
scatter plot from above.
graph twoway (scatter weight length) (lfit weight length)
graph export scatter_lfit.png, replace

Further Reading
---------------
[Germán Rodríguez's Stata Tutorial][1] is an excellent introduction to
Stata..
These [notes on writing code][booth] by Matthew Gentzkow and Jesse
Shapiro have excellent suggestions on how to program with Stata.
[1]: http://data.princeton.edu/stata/
[graphs1]: http://www.stata.com/support/faqs/graphics/gph/statagraphs.html
[graphs2]: http://www.ats.ucla.edu/stat/Stata/library/GraphExamples/default.htm
[graphs3]: http://www.survey-design.com.au/Usergraphs.html
[booth]: http://faculty.chicagobooth.edu/matthew.gentzkow/research/ra_manual_coding.pdf