-
Notifications
You must be signed in to change notification settings - Fork 37
/
10-data.Rmd
154 lines (104 loc) · 3.67 KB
/
10-data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
# Example datasets
## Edgar Anderson's Iris Data
In R:
```{r}
data(iris)
```
From the `iris` manual page:
> This famous (Fisher's or Anderson's) iris data set gives the
> measurements in centimeters of the variables sepal length and width
> and petal length and width, respectively, for 50 flowers from each
> of 3 species of iris. The species are *Iris setosa*, *versicolor*,
> and *virginica*.
![Iris setosa (credit Wikipedia)](https://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Kosaciec_szczecinkowaty_Iris_setosa.jpg/220px-Kosaciec_szczecinkowaty_Iris_setosa.jpg)
![Iris versicolor (credit Wikipedia)](https://upload.wikimedia.org/wikipedia/commons/thumb/4/41/Iris_versicolor_3.jpg/220px-Iris_versicolor_3.jpg)
![Iris virginica (credit Wikipedia)](https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/Iris_virginica.jpg/220px-Iris_virginica.jpg)
```{r dtiris}
datatable(iris)
```
For more details, see `?iris`.
## Motor Trend Car Road Tests
In R
```{r}
data(mtcars)
```
From the `?mtcars` manual page:
> The data was extracted from the 1974 *Motor Trend* US magazine, and
> comprises fuel consumption and 10 aspects of automobile design and
> performance for 32 automobiles (1973-74 models).
```{r dtmtcars, fig.cap=""}
datatable(mtcars)
```
For more details, see `?mtcars`.
## Sub-cellular localisation
The `hyperLOPIT2015` data is used to demonstrate t-distributed stochastic neighbor embedding (t-SNE) and its
comparison to principal component analysis (PCA). These data provide sub-cellular localisation of
proteins in Mouse E14TG2a embryonic stem cells, as published
in [Christoforou et al. (2016)](https://doi.org/10.1038/ncomms9992).
The data comes as an `MSnSet` object from the `Biocpkg("MSnbase")`
package, specifically developed for such quantitative proteomics
data. Alternatively, comma-separated files containing a somehow
simplified version of the data can also be
found [here](https://github.com/lgatto/hyperLOPIT-csvs/).
These data are only used to illustrate some concepts and are not
loaded and used directly to avoid installing numerous dependencies.
They are available through the Bioconductor project and can be
installed with
```{r prolocinstall, eval=FALSE}
source("http://www.bioconductor.org/biocLite.R")
biocLite(c("MSnbsase", "pRoloc")) ## software
biocLite("pRolocdata") ## date
```
## The diamonds data
The `diamonds` data ships with the `r CRANpkg("ggplot2")` package and
predict the price (in US dollars) of about 54000 round cut diamonds.
In R:
```{r}
library("ggplot2")
data(diamonds)
```
```{r dtdiamonds, fig.cap=""}
datatable(diamonds)
```
See also `?diamonds`.
## The Sonar data
The `Sonar` data from the `r CRANpkg("mlbench")` package can be used
to train a classifer to recognise mines from rocks using sonar
data. The data is composed to 60 features representing the energy
within a particular frequency band.
In R:
```{r}
library("mlbench")
data(Sonar)
```
```{r dtsonar, fig.cap=""}
datatable(Sonar)
```
See also `?Sonar`.
## Housing Values in Suburbs of Boston
The `Boston` data from the `r CRANpkg("MASS")` provides the median
value of owner-occupied homes (`medv`) in $1000s as well as 13 other
features for 506 homes in Boston.
In R:
```{r, message=FALSE}
library("MASS")
data(Boston)
```
```{r dtboston, fig.cap=""}
datatable(Boston)
```
See also `?Boston`.
## Customer churn
This data from the `r CRANpkg("C50")` package and distributes a
training set with 3333 samples and a test set containing 1667 samples
of customer attrition.
In R:
```{r}
library("modeldata")
data(mlc_churn, package = "modeldata")
churnTrain <- mlc_churn[1:3333, ]
churnTest <- mlc_churn[3334:5000, ]
```
```{r dtchurn, fig.cap=""}
datatable(churnTrain)
```