-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathfunctions.Rmd
119 lines (85 loc) · 3.12 KB
/
functions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
title: "Functions"
description: |
Repeating things? Let's functionalize it!
output:
distill::distill_article:
toc: true
toc_float: true
toc_depth: 4
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(here)
library(tidyverse)
```
When we see code being repeated more than once, functions are a great way to reduce duplication. Even if we call a function only once, they can be a nice way to break up large complicated processes.
# What's in a function?
1. The Formals
2. The Body
3. The Environment
To define a function here's the basic skeleton
```{r, eval=FALSE}
my_function_name <- function() {
}
```
# Let's create a custom function!
Here's a CHAS table. Each csv will look similar to this:
```{r}
file_01 <- read_csv(here('data', '050', 'Table9.csv'))
head(file_01, 10)
```
Suppose we'd like to do some cleaning to each CHAS table in the same manner. Let's create one that does the following:
- filter for WA state and PSRC counties
- pivot longer (so columns that start with 'T' are not across the table)
- create 3 more columns that dissect the column containing the former 'T...' headers:
- create 'table' field extracting `T` and the numbers before the underscore
- create a 'type' field to identify whether values are 'est' or 'moe'
- create a 'sort' field extracting the numeric digits at the end
```{r eval=FALSE}
# define the skeleton of our function
# add table as a parameter
clean_table <- function(table) {
# fill it in!
}
```
Fill in the body with the argument to clean
```{r eval=FALSE}
clean_table <- function(table) {
table %>%
filter(st == 53 & cnty %in% c('033', '035', '053', '061')) %>%
pivot_longer(cols = str_subset(colnames(table), "^T.*"),
names_to = 'header',
values_to = 'value') %>%
mutate(table = str_extract(header, "^T\\d*(?=_)"),
type = str_extract(header, "(?<=_)\\w{3}"),
sort = str_extract(header, "\\d+$"))
}
# Regex used:
# table: "^T\\d*(?=_)" string starting with T and numeric digits followed by _
# type: "(?<=_)\\w{3}" 3 letters preceded by _
# sort: "\\d+$" last numeric digits at the end of the string
```
<aside>
Functions will generally return the last evaluated expression. With the piping (`%>%`) in dplyr, our example is essentially a one liner expression. You can always add `return(<name of object>)` to explicitly return a specific object whenever your function is called.
</aside>
## Call the function
```{r eval=FALSE}
t9 <- clean_table(file_01)
```
Try with other files
```{r eval=FALSE}
file_02 <- read_csv(here('data', '050', 'Table10.csv'))
file_03 <- read_csv(here('data', '050', 'Table11.csv'))
t10 <- clean_table(file_02)
t11 <- clean_table(file_03)
```
If we forgot a step in the cleaning process, we can always edit the function and re-run our script
```{r eval=FALSE}
# Let's make this edit to our function that will convert the sort column from string to numeric
sort = as.numeric(str_extract(header, "\\d+$"))
```
# Benefits of creating functions
- Easier editing of code
- Reduce redundancy
- Break long processes into chunks