-
Notifications
You must be signed in to change notification settings - Fork 37
/
99-more.Rmd
86 lines (67 loc) · 3.37 KB
/
99-more.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# Final notes
## Other learning algorithms
### Semi-supervised learning {-}
The idea behind semi-supervised learning is to use labelled
observations to guide the determination of relevant structure in the
unlabelled data. The figures below described the *phenoDisco*
algorithm described
in
[Breckels *et al.* (2013)](https://www.ncbi.nlm.nih.gov/pubmed/23523639).
![Semi-supervised learning and novelty detection](./figure/phenodisco.png)
### Deep learning in R {-}
This book focuses on introductory material in R. This shouldn't
however give the impression that more modern approaches are not
available. R has plenty of activity arounds deep learning such as, for
example, the `r CRANpkg("keras")` package, an interface
to [Keras](https://keras.io), a high-level neural networks API.
See [this blog](https://blog.rstudio.com/2017/09/05/keras-for-r/) for
an introduction.
## Model performance
When investigating multi-class problems, it is good to consider
additional performance metrics and to inspect the confusion matrices
in more details, to look if some classes suffer from greater
mis-classification rates.
Models accuracy can also evaluated using the F1 score, where $F1 = 2 ~
\frac{precision \times recall}{precision + recall}$, calculated as the
harmonic mean of the precision ($precision = \frac{tp}{tp+fp}$, a
measure of *exactness* -- returned output is a relevant result) and
recall ($recall=\frac{tp}{tp+fn}$, a measure of *completeness* --
indicating how much was missed from the output). What we are aiming
for are high generalisation accuracy, i.e high $F1$, indicating that
the marker proteins in the test data set are consistently and
correctly assigned by the algorithms.
For a multi-class problem, the macro F1 (mean of class F1s) can be
used.
## Credit and acknowledgements
Many parts of this course have been influenced by the
DataCamp's
[*Machine Learning with R* skill track](https://www.datacamp.com/tracks/machine-learning),
in particular the *Machine Learning Toolbox* (supervised learning
chapter) and the *Unsupervised Learning in R* (unsupervised learning
chapter) courses.
[Jamie Lendrum](https://github.com/jl5000) has addressed numerous
typos in the first version.
The very hands-on approach has also been influenced by the Software
and Data Carpentry lessons and teaching styles.
## References and further reading
- caret: Classification and Regression Training. Max Kuhn.
[https://CRAN.R-project.org/package=caret](https://CRAN.R-project.org/package=caret).
- [Applied predictive modeling](https://www.springer.com/us/book/9781461468486),
Max Kuhn and Kjell Johnson (book webpage
[http://appliedpredictivemodeling.com/](http://appliedpredictivemodeling.com/))
and the [caret book](https://topepo.github.io/caret/index.html).
- [An Introduction to Statistical Learning (with Applications in
R)](http://www-bcf.usc.edu/~gareth/ISL/). Gareth James, Daniela
Witten, Trevor Hastie and Robert Tibshirani.
- [mlr: Machine Learning in R](http://jmlr.org/papers/v17/15-066.html). Bischl
B, Lang M, Kotthoff L, Schiffner J, Richter J, Studerus E,
Casalicchio G and Jones Z (2016). Journal of Machine Learning
Research, *17*(170),
pp. 1-5. [https://github.com/mlr-org/mlr](https://github.com/mlr-org/mlr).
- DataCamp's
[*Machine Learning with R* skill track](https://www.datacamp.com/tracks/machine-learning) (requires
paid access).
## Session information
```{r si}
sessionInfo()
```