ChapSummarySC/ChapSummarySC.qmd at main · STAT-ATA-ASU/ChapSummarySC · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
title: "Chapter Summary of ..."
author: "Your Name Here"
date: last-modified
date-format: "[Last modified on] MMMM DD, YYYY HH:mm:ss zzz"
format:
  html: default
  pdf: default
---

## Bootstrapping to estimate a single parameter

- Bootstrapping works with complicated parameters.
- Bootstrap estimate of a parameter is no better than the "standard" estimate of the parameter.
- We bootstrap to get an idea of the "standard error" - bootstrap standard error/deviation.
- Bootstrapping is taking repeated samples of the same size as the original sample with replacement.

## Bootstrapping code

- Can use the `infer` package (hides all the work)
- Type `?infer` at the R prompt for help file.
- Should read [Getting to Know `infer`](https://infer.tidymodels.org/articles/infer.html)

```{r}
# Code
library(infer)
gss %>%
  specify(response = hours) %>%
  generate(reps = 10000, type = "bootstrap") %>%
  calculate(stat = "mean") -> bs_dist
visualize(bs_dist) # visualize the bootstrap distribution
# Compute a 90% Bootstrap Percentile CI
get_confidence_interval(bs_dist, level = 0.90, type = "percentile")
```

- Can use a basic `for()` loop to generate bootstrap samples.

```{r}
library(infer)
B <- 10^4
bs_mean <- numeric(B)
for(i in 1:B){
  bss <- sample(gss$hours, size = 500, replace = TRUE)
  bs_mean[i] <- mean(bss)
}
hist(bs_mean)
quantile(bs_mean, probs = c(0.05, 0.95))
```

## Testing a hypothesis with bootstrapping

- Must make the bootstrap distribution conform to the null hypothesis (Suppose we want to test $H_0: \mu = 41$ versus $H_A: \mu > 41$)

```{r}
library(infer)
mean(gss$hours) # In order for the null to be true,
# need to subtract 0.382 from every value in hours.
B <- 10^4
bs_mean <- numeric(B)
for(i in 1:B){
  bss <- sample(gss$hours, size = 500, replace = TRUE) - 0.382
  bs_mean[i] <- mean(bss)
}
hist(bs_mean)
pvalue <- mean(bs_mean >= mean(gss$hours))
pvalue
```

```{r}
library(infer)
gss %>%
  specify(response = hours) %>%
  hypothesize(null = "point", mu = 41) %>%
  generate(reps = 10000, type = "bootstrap") %>%
  calculate(stat = "mean") -> boot_test
get_p_value(boot_test, mean(gss$hours), direction = "right")
```