-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathstatistics2.html
156 lines (108 loc) · 4.43 KB
/
statistics2.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
layout: reveal_markdown
title: "Statistics review 2"
tags: slides
date: 2022-02-16
---
## {{ page.title }}
---
## Outline
- Statistical models
- Hypothesis testing
- Some useful tests
- P-values
- Multiple testing corrections
- Confusion matrix
- Receiver-operating characteristic (ROC)
---
## Data Science
<img src="images/statistics/Picture1.png" width="600">
---
## Computational Biology
<img src="images/statistics/Picture2.png" width="600">
---
## Why Statistics is important
- Statistics is the theoretical foundation of machine learning and data science
- Statistics is the bridge between experiments and theories
- Statistics links microscopic and macroscopic worlds
---
## What is a Model?
<iframe width="1024" height="576" src="https://www.youtube.com/embed/yQhTtdq_y9M" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
---
## Models
- Quantitative/mathematical models
- describe the relationship between quantities.
- Statistical models
- relationship between random variables.
---
### Hypothesis testing
- General thinking:
- Are they different?
- Is the difference "statistically significant"?
- Statistical thinking:
- Null hypothesis
- Alternative hypothesis
<img src="images/statistics/image580a.png" width="250">
---
## P-values
<iframe width="1024" height="576" src="https://www.youtube.com/embed/vemZtEM63GY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
---
## T-test
<img src="images/statistics/William_Sealy_Gosset.jpg" width="250"><br>
William Sealy Gosset (1876-1937)<br>
(pseudonym "Student")
---
## Fisher's Exact Test
<img src="images/statistics/ronald-fisher-4-uofa.jpg" width="250"><br>
Ronald Fisher (1890-1962)
---
## Wilcoxon Rank-Sum Test (Mann–Whitney U Test)
<img src="images/statistics/FrankWilcoxon.png" width="250"><br>
Frank Wilcoxon (1892–1965)
---
## K-S Test
<img src="images/statistics/kolmogorov.jpg" width="250"> <img src="images/statistics/Nikolai_Smirnov.jpg" width="200"><br>
Andrey Kolmogorov (1903-1987), Nikolai Smirnov (1900-1996)
---
Which of the following statements about P-values is true?
- A. P-values measure how big the difference is between the datasets compared.
- B. P-value is the probability of observing the data by random chance.
- C. P-value is the least probability of observing the data under the assumption that the null hypothesis is true.
---
### ASA Statement on Statistical Significance and P-Values
1. P-values can indicate how incompatible the data are with a specified statistical model.
2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
---
### ASA Statement on Statistical Significance and P-Values
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
---
## Multiple Testing Correction
---
### Bonferroni correction
<img src="images/statistics/Carlo_Emilio_Bonferroni.jpg" width="250"><br>
Carlo Emilio Bonferroni (1892-1960)<br>
$adjusted P = N * P$
---
### Benjamini-Hochberg (B-H) correction
<img src="images/statistics/Yoav_Benjamini.jpg" width="250"><br>
Yoav Benjamini (1949-) and Yosef Hochberg<br>
$adjusted P = \frac{P * N}{R}$
---
## Confusion Matrix
<img src="images/statistics/confusionmatrix.png" width="600">
---
## Sensitivity and Specificity
<iframe width="1024" height="576" src="https://www.youtube.com/embed/vP06aMoz4v8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
---
## ROC and AUC
<iframe width="1024" height="576" src="https://www.youtube.com/embed/4jRBRDbJemM" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
---
## Summary
<img src="images/statistics/summary.png" width="1200">
---
## Contact
- zanglab.org