-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathtopic.Rmd
More file actions
238 lines (185 loc) · 14.7 KB
/
Copy pathtopic.Rmd
File metadata and controls
238 lines (185 loc) · 14.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
# (PART) Initial Steps {-}
# WP-1 - Topic & Data Selection
```{block, type = "rmdgoal"}
**Goal:** Identify a data set and appropriate variables of interest.
```
```{block, type = "rmdpersonnel"}
**Personnel:** This waypoint should be completed by **all** students.
```
```{block, type = "rmdskills"}
**Skills:** There are no course-specific skills for this waypoint.
```
```{block, type = "rmddue"}
**Due Date:** Meeting 1-2 (January 31)
```
```{block, type = "rmddeliver"}
**Deliverables:** A [short memo](/the-memo.html) detailing your data and the specific topic you'll be exploring.
```
## Crafting a Research Question
The process of conducting a research project starts with defining a research question to explore. For many students, this can be daunting. Yale University Sociology's <a href="https://sociology.yale.edu/sites/default/files/files/Writing_Sociology_Senior_Thesis_Guide_Final_Latest_Update.pdf" target="_blank">Handbook for Undergraduate Thesis research</a> usefully breaks down the types of questions sociologists answer into three broad categories. I've provided these three categories below along with some sample research questions on undocumented immigration:
* "Questions about the meaning of certain activities, practices, or experiences for particular social groups."
- "How do classmates respond when a student discloses that they are an undocumented immigrant?"
- "How does participation in an immigrant advocacy group effect undocumented immigrants?"
- "How does disclosing undocumented status change the treatment of immigrants in social services settings?"
* "Questions about the ways that identification with larger social categories – race, ethnicity, religion, political identification, gender – affect aspects of social life."
- "What is the relationship between political conservatism and beliefs about immigration?"
- "How do views of immigration vary based on gender identity?"
- "What is the relationship between religiosity and support for undocumented immigrants?"
* "Questions about the influence of particular variables on other variables or outcomes, including questions that compare groups and track trends across a broader scale."
- "What is the relationship between levels of education and employment for undocumented immigrants?"
- "How does racial segregation impact the outcomes of undocumented immigrants in cities?"
- "What is the effect of DREAMer status on educational attainment?"
Each student should think about the overarching category they want to conduct research in. Like the research process as a whole (see "[Data Analysis is not Linear]"), you may find that crafting a research question is an iterative process. You'll start with a broad topic, make a first attempt at narrowing it down, and then subsequently update the question as you find some related sources (and, if you are in SOC 5650, conduct a full-fledged literature review). Research questions for this course will be necessarily *descriptive* - "what is the spatial distribution of *x*?"
## The Crime Data
```{r load-crime, include=FALSE}
library(dplyr)
library(knitr)
```
The St. Louis Metropolitan Police Department (SLMPD) is the primary law enforcement agency for the City of St. Louis. These data comprise all crimes reported by SLMPD for a five year period between 2015 and 2019.
### License
These data have been published without an explicit license. However, since they are the work of a government agency, they are considered in the public domain and are therefore available for reuse.
### Download
You can download these data from [GitHub](https://github.com/slu-soc5650/crime-data-sp22). Please read the `README.md` file carefully to learn which file you should use.
### Topic Categories
If you want to select a topic from these data, you should pick **one** crime category below. For example, you might pick `Arson` or `Vandalism`.
#### Part 1 Crimes
According to the FBI, Part 1 crimes are serious crimes:
> These offenses were chosen because they are serious crimes, they occur with regularity in all areas of the country, and they are likely to be reported to police.
With the exception of arson, data on these crimes has regularly been collected and reported by the FBI on a national basis since 1930. Arson was added to the list of Part 1 crimes (also known as "Index" crimes) in 1979.
```{r crimes, echo=FALSE}
crimes <- data.frame(
ucr = c(1:8),
category = c("Homicide", "Rape", "Robbery", "Aggravated Assault",
"Burgalry", "Larceny", "Motor Vehicle Theft", "Arson")
)
as_tibble(crimes) %>%
kable()
```
#### Part 2 Offenses
Crimes that are so-called "Part 2" offenses are typically less serious and may not be felonies.
```{r offenses, echo=FALSE}
offenses <- data.frame(
ucr = c(9:28),
category = c("Other Assaults", "Forgery and Counterfeiting", "Fraud",
"Embezzlement", "Stolen Property", "Vandalism", "Weapons",
"Prostitution and Commercialized Vice", "Sex Offenses",
"Drug Abuse Violations", "Gambling",
"Offenses Against the Family and Children", "Liquor Laws",
"Drunkeness", "Disorderly Conduct", "Vagrancy",
"All Other Offenses", "Suspicion",
"Curfew and Loitering Laws-Persons under 18",
"Runaways-Persons under 18")
)
as_tibble(offenses) %>%
kable()
```
## The Citizens' Service Bureau Data
```{r load-csb, include=FALSE}
csb <- readr::read_csv("csb-codes-sp22.csv")
```
The Citizens' Service Bureau (CSB) is a clearing house for the City of St. Louis. Requests for service come primarily from residents or business owners in the City through the CSB's website or by phone. Other City agencies also put requests for service into the CSB, and they take requests via fax, email, and Twitter. These data comprise all crimes reported by SLMPD for a five year period between 2015 and 2019.
### License
All of the data that the City of St. Louis makes public, including the CSB data, are available under the following [license](http://data.stlouis-mo.gov/terms.cfm):
***Description** *The City of St. Louis strives to enhance public access
to and use of data that it collects and publishes. As part of an
initiative to improve the accessibility, transparency, and
accountability of City government, this catalog offers access to a
repository of government-produced, machine-readable data sets. The
datasets are organized by originating department. The City provides
access to the information free of charge subject to the terms of this
agreement. Use of data derived from the datasets, which may appear in
formats such as tables and charts, is also subject to these terms.*
***Terms** *The City of St. Louis reserves the right at any time to
update, modify, or discontinue the release of the datasets. The City
does not warranty the completeness, accuracy, content, or fitness for
any particular purpose or use of any public dataset made available, nor
are any such warranties to be implied or inferred with respect to the
public datasets furnished therein. The City shall not be responsible or
liable for the accuracy, usefulness or availability of any data in the
datasets.*
***Accountability** *When working with the datasets, be aware that these
files are raw extracts derived from various data sources. The City of
St. Louis is aware that errors exist in these datasets. Contact the
originating department if questions/issues arise.*
### Download
You can download these data from [GitHub](https://github.com/slu-soc5650/csb-data-sp22). Please read the `README.md` file carefully to learn which file you should use.
### Topic Categories
If you want to select a topic from these data, you should pick several call types below. For example, I am interested in vacancy so I might select `Debris-Vacant Bldg` and `Debris-Vacant Lot` for my call types. My mapping would therefore focus on these two types of calls to the Citizens Service Bureau.
```{r csb, echo=FALSE}
kable(csb)
```
### Variables
The following are some basic descriptions of what each variable in the Citizens Service Bureau data are supposed to measure:
- `requestid` - CSB assigned identification number for each request
- `datetimeinit` - date request made
- `probaddress` - address where problem is located
- `problemcode` - general description of the problem associated with
the request
- `description` - (sometimes) more detailed description of the problem
associated with the request
- `status` - status of request
- `datetimeclosed` - date request completed
- `srx` - x coordinate for address
- `sry` - y coordinate for address
## Vacancy Data
The Vacancy Collaborative is "a coalition of partners dedicated to reducing the negative impact of vacant property in the city of St. Louis." As part of that work they have spent several years creating a data set of vacant properties that is updated each month. The data are drawn from a variety of public sources, including tax data, data from the Citizens' Service Bureau, and operational data from the Building and Forestry divisions among other sources.
### License
These data are not available under a specific license, but are provided for download from their [dashboard](https://www.stlvacancytools.com).
### Download
You can download these data from [GitHub](https://github.com/slu-soc5650/vacancy-data-sp22). Please read the `README.md` file carefully to learn which file you should use.
### Topic Categories
Unlike the other two pre-made data sets, it is not necessary to choose a specific category. Students are welcome to address vacancy as a whole, or focus on patterns that some of the variables highlight.
### Variables
- `Handle` - Handle for parcel (unique ID for joining with other datasets)
- `ParcelId` - Parcel ID
- `StAddrNum` - Street Address, Number
- `StNameFull` - Street Address, remaining consolidated information - direction, street name, suffix
- `Zip` - Zip Code
- `Type` - Property Type (e.g. Commercial, Single Family, Vacant Lot, etc.)
- `ParcelSqFt` - Square footage of the parcel
- `OwnerName` - Name of Owner
- `OwnerName2` - Additional names or notes for owner
- `OwnerState` - Owner’s Address State
- `OwnerZip` - Owner’s Address Zip Code
- `BldgAge` - Age of Building
- `Vacancy` - Vacancy Score (Number - 10 to 30 is indeterminate, 30 to 70 is possibly vacant, greater than 70 is very likely vacant)
- `VacancyCat` - Vacancy Score Category (e.g. Indeterminate, Definitely, etc.)
- `Burden` - Burden Score
- `BurdenCat` - Burden Score Category (e.g. High, Very High, etc.)
- `BoardUp` - Board Up Provided (true) since last ownership change and after 2016 (due to when processes were modernized)
- `IsLRA` - Whether the property is owned by LRA or not (true)
- `TaxYrsDel` - Number of years the property has been tax delinquent
- `VacRegMonths` - Number of months on the Building Division’s Vacant Building Registry
- `VioMajor` - Number of major building code violations
- `CSBVacancy` - Property has CSB calls for vacancy since last change in ownership (true)
- `CSBNuisance` - Property has CSB calls for nuisance since last change in ownership (true)
- `Forestry` - Forestry services provided since last change in ownership (true)
- `Condemned` - Building has been structurally condemned (true)
- `Lat` - Latitude
- `Lng` - Longitude
## Another Data Set
If the crime and non-emergency call data do not interest you, you may petition to choose a topic of your own provided you can find appropriate data.
### Characteristics of an Appropriate Data Set
Data sets for your final project will have a number of salient characteristics:
1. They should be *spatial* data.
2. They should numerous - there should be at least several hundred observations that vary over space and, ideally, over time as well. Several thousand observations will be ideal.
3. They must be countable - the number of 9-1-1 calls, restaurants, or terrorist attacks, for example.
4. They should be point data (or have the ability to be mapped as point data). Examples would be cities throughout a country or a region, restaurant locations across a city, state, or province, and crime scene locations.
5. They should be able to be aggregated up to some sort of higher order construct. Think about how restaurant locations could be aggregated up to neighborhoods, municipalities, or counties.
### Other Considerations
There are a few other considerations to take into account. If you are a graduate student and have already identified a possible thesis topic, you may want pick a data set that is either a possible candidate for inclusion in your thesis *or*, at the very least, is conceptually related. You want to maximize the impact that your coursework has, so even if you are not sure whether or not the data set itself will be helpful, picking something in the same topic area will mean that your literature search can be put to use on future assignments (such as in your Research Methods course).
### Pandemic Considerations
I am normally flexible about accepting other data sets for the final project. However, because of the nature of the Spring 2021 semester, I am going to be more stringent in approving these. If you want to petition use an outside data set, you may. However, you need to be prepared to do some of the leg work on your own without the benefit of in-person support from Chris.
### Finding an Appropriate Data Set
In general, you are free to use any resource to identify a suitable data set that meets the above criteria with a couple of caveats:
1. There is **not** time for you to collect your own data.
2. There is **not** time for you to go through the IRB process to gain access to confidential data (either data that is not publicly available or data collected by a thesis adviser or other faculty member).
3. The data you use should be licensed for re-use (it cannot be proprietary or otherwise restrictively licensed).
4. The data should be well documented - you want to be very sure what each variable represents. If there is no code book or documentation, the data set is probably not appropriate for this project. *See Chris if you have questions about this.*
## The Memo
Once you have completed **all** of the steps for identifying a suitable data set and outcome variable, you should create a new [issue](https://help.github.com/articles/about-issues/) on your **final project repository**. In your issue:
1. State what your research question is,
2. identify the data you are using to address that question,
3. and the main category within the data you would like to map.
If you are petitioning to use a different data set, please clearly note this in your memo and we can discuss your proposal in greater detail.
Once you have your memo drafted, open the issue and [assign it](https://help.github.com/articles/assigning-issues-and-pull-requests-to-other-github-users/) to Chris for review.