-
-
Notifications
You must be signed in to change notification settings - Fork 5
Expand file tree
/
Copy pathREADME.Rmd
More file actions
140 lines (100 loc) · 5.06 KB
/
README.Rmd
File metadata and controls
140 lines (100 loc) · 5.06 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
output: github_document
---
```{r, include = FALSE}
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")
set.seed(0)
options(
datatable.print.nrows = 10,
datatable.print.class = FALSE,
datatable.print.keys = FALSE,
datatable.print.trunc.cols = TRUE,
width = 100)
# mute load messages
library("mlr3fselect")
```
# mlr3fselect <img src="man/figures/logo.png" align="right" width = "120" />
Package website: [release](https://mlr3fselect.mlr-org.com/) | [dev](https://mlr3fselect.mlr-org.com/dev/)
<!-- badges: start -->
[](https://github.com/mlr-org/mlr3fselect/actions/workflows/r-cmd-check.yml)
[](https://cran.r-project.org/package=mlr3fselect)
[](https://lmmisld-lmu-stats-slds.srv.mwn.de/mlr_invite/)
<!-- badges: end -->
*mlr3fselect* is the feature selection package of the [mlr3](https://mlr-org.com/) ecosystem.
It selects the optimal feature set for any mlr3 [learner](https://github.com/mlr-org/mlr3learners).
The package works with several optimization algorithms e.g. Random Search, Recursive Feature Elimination, and Genetic Search.
Moreover, it can automatically optimize learners and estimate the performance of optimized feature sets with [nested resampling](https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-autofselect).
The package is built on the optimization framework [bbotk](https://github.com/mlr-org/bbotk).
## Resources
There are several section about feature selection in the [mlr3book](https://mlr3book.mlr-org.com).
* Getting started with [wrapper feature selection](https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-fs-wrapper).
* Do a [sequential forward selection](https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-fs-wrapper-example) Palmer Penguins data set.
* Optimize [multiple performance measures](https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-multicrit-featsel).
* Estimate Model Performance with [nested resampling](https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-autofselect).
The [gallery](https://mlr-org.com/gallery.html) features a collection of case studies and demos about optimization.
* Perform wrapper-based [Ensemble Feature Selection](https://mlr-org.com/gallery/technical/2025-01-12-efs/).
* Utilize the built-in feature importance of models with [Recursive Feature Elimination](https://mlr-org.com/gallery/optimization/2023-02-07-recursive-feature-elimination/).
* Run a feature selection with [Shadow Variable Search](https://mlr-org.com/gallery/optimization/2023-02-01-shadow-variable-search/).
The [cheatsheet](https://cheatsheets.mlr-org.com/mlr3fselect.pdf) summarizes the most important functions of mlr3fselect.
## Installation
Install the last release from CRAN:
```{r eval = FALSE}
install.packages("mlr3fselect")
```
Install the development version from GitHub:
```{r eval = FALSE}
# install.packages("pak")
pak::pak("mlr-org/mlr3fselect")
```
## Example
We run a feature selection for a support vector machine on the [Spam](https://mlr3.mlr-org.com/reference/mlr_tasks_spam.html) data set.
```{r}
library("mlr3verse")
tsk("spam")
```
We construct an instance with the `fsi()` function.
The instance describes the optimization problem.
```{r}
instance = fsi(
task = tsk("spam"),
learner = lrn("classif.svm", type = "C-classification"),
resampling = rsmp("cv", folds = 3),
measures = msr("classif.ce"),
terminator = trm("evals", n_evals = 20)
)
instance
```
We select a simple random search as the optimization algorithm.
```{r}
fselector = fs("random_search", batch_size = 5)
fselector
```
To start the feature selection, we simply pass the instance to the fselector.
```{r, results='hide'}
fselector$optimize(instance)
```
The fselector writes the best hyperparameter configuration to the instance.
```{r}
instance$result_feature_set
```
And the corresponding measured performance.
```{r}
instance$result_y
```
The archive contains all evaluated hyperparameter configurations.
```{r}
as.data.table(instance$archive)
```
We fit a final model with the optimized feature set to make predictions on new data.
```{r}
task = tsk("spam")
learner = lrn("classif.svm", type = "C-classification")
task$select(instance$result_feature_set)
learner$train(task)
```
## Citation
If you use **mlr3fselect** in your work, please cite the package:
> Becker M, Schratz P, Lang M, Bischl B, Zobolas J (2025). mlr3fselect: Feature Selection for 'mlr3'. R package version 1.4.0, https://github.com/mlr-org/mlr3fselect
The **ensemble feature selection** components (hEFS) are described in the following study:
> Zobolas, J., George, AM., López, A., Fischer, S., Becker, M., & Aittokallio, T. Prognostic biomarker discovery in pancreatic cancer through hybrid ensemble feature selection and multi-omics data. BioData Mining (2026). https://doi.org/10.1186/s13040-026-00546-0