Update v02_Evaluation_using_Trio.Rmd

littlecabiria · littlecabiria · commit ec25e7be0915 · 2026-04-18T23:48:25.000+10:00
diff --git a/vignettes/v02_Evaluation_using_Trio.Rmd b/vignettes/v02_Evaluation_using_Trio.Rmd
@@ -15,17 +15,24 @@ library(glmnet)
 
 # Import microbiome data
 
-The `Trio` object in `BenchHub` can take datasets provided by users. To demonstrate, its ability to take user-provided datasets, we'll be using a microbiome dataset called `Lubomski` obtained from the `PD16Sdata` package. The following code will import the `Lubomksi` data into R. `lubomski_microbiome_data.Rdata` contains two data objects: `x` and `lubomPD`. `x` is a 575 by 1192 matrix containing the abundance of 1192 microbial taxa for 575 samples. `lubom_pd` is a factor vector of binary patient classes for 575 samples where where `1` represents `PD` and `0` represents `HC`.
+The `Trio` object in `BenchHub` can take datasets provided by users. To demonstrate this workflow, we use the `lubomski_microbiome_data` example dataset distributed with `BenchHub`. We first load the example objects into a temporary environment to avoid placing `x` and `lubomPD` directly into the global workspace. The dataset contains two objects: `x`, a 575 by 1192 matrix of microbial abundances for 575 samples, and `lubomPD`, a binary factor indicating Parkinson's disease (`PD`) or healthy control (`HC`) status for each sample.
 
 ```{r}
-# import the microbiome data
-data("lubomski_microbiome_data", package = "BenchHub")
+# import the microbiome data into a temporary environment
+exampleEnv <- new.env(parent = emptyenv())
+data("lubomski_microbiome_data", envir = exampleEnv, package = "BenchHub")
+
+x <- exampleEnv[["x"]]
+lubomPD <- exampleEnv[["lubomPD"]]
 
 # check the dimension of the microbiome matrix
 dim(x)
 
 # check the length of the patient status
 length(lubomPD)
+
+# Add sample IDs so the evidence matches the dataset rows by name.
+names(lubomPD) <- rownames(x)
 ```
 
 The task we'll be evaluating uses a binary classification task where each sample is either a Parkinson's Disease (PD) patient or Healthy Control (HC). Once the data are ready to be inputted to `Trio`, we can load `Trio`.