tidyomics · May 21, 2024
diff --git a/‎vignettes/Session_1_sequencing_assays.Rmd
+17-8 b/‎vignettes/Session_1_sequencing_assays.Rmd
+17-8
diff --git a/‎vignettes/Session_2_Tidy_spatial_analyses.Rmd
+273-42 b/‎vignettes/Session_2_Tidy_spatial_analyses.Rmd
+273-42
diff --git a/‎vignettes/Solutions.Rmd
+137-37 b/‎vignettes/Solutions.Rmd
+137-37
@@ -463,7 +463,7 @@ reducedDimNames(spatial_data)
 reducedDim(spatial_data, "PCA")[1:5, 1:5]
 ```
 
-::: note
+::: {.note}
 As for single-cell data, we need to verify that there is not significant batch effect. If so we need to adjust for it (a.k.a. integration) before calculating principal component. Many adjustment methods to output adjusted principal components directly. 
 :::
 
@@ -481,7 +481,7 @@ spatial_data <- scater::runUMAP(spatial_data, dimred = "PCA")
 scater::plotUMAP(spatial_data, colour_by = "sample_id", point_size = 0.2) 
 ```
 
-::: note
+::: {.note}
 **Exercise 1.1**
 
 Visualise where the two macro clusters are located spatially. We will take a very pragmatic approach and get cluster label from splitting the UMAP coordinated in two (`colData()` and `reducedDim()` will help us, see above), and then visualise it with `ggspavis`.
@@ -631,7 +631,7 @@ spe_joint <- do.call(cbind, spatial_data_list)
 
 Here, we perform PCA using the BANKSY algorithm on the joint dataset. The group argument specifies how to treat different samples, ensuring that features are scaled separately per sample group to account for variations among them.
 
-::: note
+::: {.note}
 Note: this step takes long time
 :::
 
@@ -740,7 +740,7 @@ ggspavis::plotSpots(spatial_data, annotate = "spatialLIBD") +
   labs(title = "spatialLIBD regions")
 ```
 
-::: note
+::: {.note}
 **Exercise 1.2**
 
 We have applied cluster smoothing using `smoothLabels`. How much do you think this operation has affected the cluster labels. To find out,
@@ -955,7 +955,7 @@ plotCorrelationMatrix(res$mat)
 
 
 
-::: note
+::: {.note}
 **Exercise 1.3**
 
 Rather than looking at the correlation matrix, overall, let's observe whether the correlation structure amongst cell types is consistent across samples. Do you think it's consistent or noticeably different?
@@ -967,7 +967,7 @@ Rather than looking at the correlation matrix, overall, let's observe whether th
 lapply(res_spatialLIBD, function(x) plotCorrelationMatrix(as.matrix(x[,-10]))) 
 ```
 
-::: note
+::: {.note}
 **Exercise 1.4**
 
 Now let's observe whether the correlation structure is consistent across spatial regions, irrespectively of the sample of origin. Do you think they are consistent or noticably different?
@@ -980,18 +980,27 @@ lapply(res_spatialLIBD, function(x) plotCorrelationMatrix(as.matrix(x[,-10])))
 
 ```
 
-::: note
+::: {.note}
 **Exercise 1.5**
 
-Some of the most positive correlations involve the end of cells with Oligodendrocytes and Leptomeningeal cells.
+Some of the most positive correlations involve the endothelial cells with Oligodendrocytes and Leptomeningeal cells.
 
 Leptomeningeal cells refer to the cells that make up the leptomeninges, which consist of two of the three layers olet's meninges surrounding the brain and spinal cord: the arachnoid mater and the pia mater. These layers play a critical role in protecting the central nervous system and assisting in various physiological processes.
 
 Oligodendrocytes are a type of glial cell in the central nervous system (CNS) of vertebrates, including humans and mouse. These cells are crucial for the formation and maintenance of the myelin sheath, a fatty layer that encases the axons of many neurons.
 
 Let's try to visualise the pixel where these cell types most occur.
+
+
+- Label pixel that have > 10% (> 0.1) endothelial_cell and leptomeningeal_cell
+- Label pixels that have > 40% (> 0.4) across these two cells
+- Plot pixels colouring by the new label
+
 :::
 
+```{r}
+mat_df = as.data.frame(res$mat)
+```
 
 **Session Information**
 
 
@@ -44,6 +44,26 @@ style="width:100%; height:600px;"
 frameborder="0">
 </iframe>
 
+#### Installation
+
+let's make sure we get the latest packages available on github
+
+```{r, eval=FALSE}
+
+# In May 2024, the following packages should be installed from github repositories, to use the latest features. In case you have them pre installed, run the following command
+BiocManager::install(c("lmweber/ggspavis", 
+                       "stemangiola/tidySummarizedExperiment", 
+                        "stemangiola/tidySingleCellExperiment", 
+                       "william-hutchison/tidySpatialExperiment", 
+                       "stemangiola/tidybulk", 
+                       "stemangiola/tidygate", 
+                       "stemangiola/CuratedAtlasQueryR"), 
+                     update = FALSE)
+
+```
+
+**Then please restart your R session** to make sure the packages we will load will be the ones we intalled mode recently. 
+
 Let's load the libraries needed for this session
 
 ```{r, message = FALSE}
@@ -87,6 +107,9 @@ spatial_data <-
 # Clear the reductions
 reducedDims(spatial_data) = NULL 
 
+# Make cell ID unique
+colnames(spatial_data)  = paste0(colnames(spatial_data), colData(spatial_data)$sample_id)
+
 # Display the object
 spatial_data
 ```
@@ -120,6 +143,10 @@ options("restore_SpatialExperiment_show" = FALSE)
 spatial_data
 ```
 
+::: {.note}
+Note that **rows** in this context refers to rows of the abstraction, not **rows** of the SpatialExperiment which correspond to genes **tidySpatialExperiment** prioritizes cells as the units of observation in the abstraction, while the full dataset, including measurements of expression of all genes, is still available "in the background".
+:::
+
 #### Original behaviour is preserved
 
 The tidy representation behaves exactly as a native `SpatialExperiment`. It can be interacted with using [SpatialExperiment commands](https://www.bioconductor.org/packages/release/bioc/vignettes/SpatialExperiment/inst/doc/SpatialExperiment.html) 
@@ -133,10 +160,46 @@ assays(spatial_data)
 
 We can also interact with our object as we do with any tidyverse tibble. We can use `tidyverse` commands, such as `filter`, `select` and `mutate` to explore the `tidySpatialExperiment` object. Some examples are shown below and more can be seen at the `tidySpatialExperiment` [website](https://stemangiola.github.io/tidySpatialExperiment/articles/introduction.html#tidyverse-commands-1).
 
+#### Select
+
+We can use `select` to view columns, for example, to see the filename, total cellular RNA abundance and cell phase. 
+
+If we use `select` we will also get any view-only columns returned, such as the UMAP columns generated during the preprocessing.
+
+```{r}
+spatial_data |> select(.cell, sample_id, in_tissue, spatialLIBD)
+```
+
+::: {.note}
+Note that some columns are always displayed no matter whet. These column include special slots in the objects such as reduced dimensions, spatial coordinates (mandatory for `SpatialExperiment`), and sample identifier (mandatory for `SpatialExperiment`). 
+:::
+
+Although the select operation can be used as a display tool, to explore our object, it updates the `SpatialExperiment` metadata, subsetting the desired columns.
+
+```{r}
+spatial_data |> 
+  select(.cell, sample_id, in_tissue, spatialLIBD) |> 
+  colData()
+```
+
+To select columns of interest, we can use `tidyverse` powerful pattern-matching tools. For example, using the method `contains` to select 
+
+```{r}
+
+spatial_data |> 
+  select(.cell, contains("sum")) 
+```
+
+
 #### Filter
 
-We can use `filter` to choose rows, for example, to select our three samples we are going to work with.
+We can use `filter` to subset rows, for example, to keep our three samples we are going to work with.
+
+We just display the dimensions of the dataset before filtering
 
+```{r}
+nrow(spatial_data)
+```
 
 ```{r}
 spatial_data = 
@@ -146,6 +209,12 @@ spatial_data =
 spatial_data
 ```
 
+Here we confirm that the tidy R manipulation has changed the underlining object.
+
+```{r}
+nrow(spatial_data)
+```
+
 In comparison the base-R method recalls the variable multiple times
 
 ```{r, eval=FALSE}
@@ -155,35 +224,69 @@ spatial_data = spatial_data[,spatial_data$sample_id %in% c("151673", "151675", "
 Or for example, to see just the rows for the cells in G1 cell-cycle stage.
 
 ```{r}
-spatial_data |> dplyr::filter(spatialLIBD == "L1")
+spatial_data |> dplyr::filter(sample_id == "151673", spatialLIBD == "L1")
 ```
 
-:::: {.note}
-Note that **rows** in this context refers to rows of the abstraction, not **rows** of the SpatialExperiment which correspond to genes **tidySpatialExperiment** prioritizes cells as the units of observation in the abstraction, while the full dataset, including measurements of expression of all genes, is still available "in the background".
-::::
+Flexible, more powerful filters with `stringr`
 
-#### Select
+```{r}
 
-We can use `select` to view columns, for example, to see the filename, total cellular RNA abundance and cell phase. 
+spatial_data |> 
+  dplyr::filter(
+    subject |> str_detect("Br[0-9]1"), 
+    spatialLIBD == "L1"
+  )
 
-If we use `select` we will also get any view-only columns returned, such as the UMAP columns generated during the preprocessing.
+```
+
+#### Summarise
+
+The integration of all spot/pixel/cell-related information in one table abstraction is very powerful to speed-up data exploration ana analysis.
 
 ```{r}
-spatial_data |> select(.cell, sample_id, in_tissue, spatialLIBD)
+
+spatial_data |> 
+  filter(sum_umi < 1000) |> 
+  count(sample_id)
+
 ```
 
 #### Mutate
 
 We can use `mutate` to create a column. For example, we could create a new `Phase_l` column that contains a lower-case version of `Phase`. 
 
-In this case, three columns that are view only (`sample_id`, `pxl_col_in_fullres`, `pxl_row_in_fullres`, `PC*`) will be always included in the tidy representation because they cannot be omitted from the data container (is opposed to metadata)
+::: {.note}
+Note that the special columns `sample_id`, `pxl_col_in_fullres`, `pxl_row_in_fullres`, `PC*` are view only and cannot be mutated.
+:::
 
 ```{r message=FALSE}
 spatial_data |>
   mutate(spatialLIBD_lower = tolower(spatialLIBD)) |>
   select(.cell, spatialLIBD, spatialLIBD_lower)
 ```
 
+We can update the underlying `SpatialExperiment` object, for future analyses. And confirm that the `SpatialExperiment` metadata has been mutated.
+
+```{r message=FALSE}
+spatial_data = 
+  spatial_data |>
+  mutate(spatialLIBD_lower = tolower(spatialLIBD))
+
+spatial_data |> 
+  colData() |>
+  _[,c("spatialLIBD", "spatialLIBD_lower")]
+```
+
+We can mutate columns for on-the-fly analyses and exploration. Let's suppose one column has capitalisation inconsistencies, and we want to apply a unique filter.
+
+```{r message=FALSE}
+spatial_data |>
+  mutate(spatialLIBD = tolower(spatialLIBD)) |>
+  filter(spatialLIBD == "wm")
+```
+
+#### Extract
+
 We can use tidyverse commands to polish an annotation column. We will extract the sample, and group information from the file name column into separate columns. 
 
 ```{r message=FALSE}
@@ -196,8 +299,6 @@ spatial_data = spatial_data  |> mutate(file_path = glue("../data/single_cell/{sa
 spatial_data |> select(.cell, file_path)
 ```
 
-#### Extract
-
 Extract specific identifiers from complex data paths, simplifying the dataset by isolating crucial metadata. This process allows for clearer identification of samples based on their file paths, improving data organization.
 
 ```{r}
@@ -222,7 +323,6 @@ spatial_data <- spatial_data |> unite("sample_subject", sample_id, subject, remo
 spatial_data |> select(.cell, sample_id, sample_subject, subject)
 ```
 
-
 ### 3. Advanced filtering/gating and pseudobulk
 
 `tidySpatialExperiment` provide a interactive advanced tool for gating region of interest for streamlined exploratory analyses.
@@ -233,67 +333,172 @@ Let's draw an arbitrary gate interactively
 
 ```{r, eval=FALSE}
 
-spatial_data = 
+spatial_data_gated = 
   spatial_data |> 
   
   # Filter one sample
   filter(in_tissue, sample_id=="151673") |> 
   
   # Gate based on tissue morphology
-  tidySpatialExperiment::gate_spatial(opacity = 0.2, image_index = 1) 
+  tidySpatialExperiment::gate_spatial(alpha = 0.1) 
+```
+
+
+`tidySpatialExperiment` added a `.gate` column to the `SpatialExperiment` object. We can see this column in its tibble abstraction.
+
+```{r, eval=FALSE}
+spatial_data_gated |> select(.cell, .gate)
 ```
 
 This is recorded in the `.gate` column
 
 ```{r, eval=FALSE}
 
-spatial_data |>  select(.cell, .gate)
+spatial_data_gated |>  select(.cell, .gate)
 ```
 
 We can count how many pixels we selected with simple `tidyverse` grammar
 
 ```{r, eval=FALSE}
-spatial_data |> count(.gate)
+spatial_data_gated |> count(.gate)
 ```
 
-We can visualise the gating 
-
+To have a visual feedback of our selection we can plot the slide annotating by our newly created column.
 
 ```{r, eval=FALSE, fig.width=7, fig.height=8}
-spatial_data |> 
-  
-  # Plot our gate
-  ggspavis::plotSpots(annotate = ".gate") +
-    scale_color_manual(values = libd_layer_colors |> str_remove("ayer")) +
-  labs(title = ".gate regions")
-  
+spatial_data_gated |> 
+  ggspavis::plotVisium(annotate = ".gate")
 ```
 
+
 ```{r, echo=FALSE, out.width="300px"}
 knitr::include_graphics(here("inst/images/tidySPE_gate.png"))
 ```
 
-And filter, for further analyses
+We can also filter, for further analyses
 
 ```{r, eval=FALSE}
-spatial_data |> 
+spatial_data_gated |> 
   filter(.gate)
 ```
 
-#### Summarisation/aggregation
+::: {.note}
+**Exercise 2.1**
+Gate roughly the white matter layer of the tissue (bottom-left) and visualise in UMAP reduced dimensions where this manual gate is distributed.
+
+- Calculate UMAPs as we did for Sesison 1
+- Plot UMAP dimensions according to the gating
+:::
+
+### 4. Work with features
+
+By default `tidySpatialExperiment` (as well as `tidySingleCellExperiment`) focus their tidy abstraction on pixels and cells, as this is the key analysis and visualisation unit in sopatial and single-cell data. This has proven to be a practican solution to achieve elegant `tidy` analyses and visualisation.
+
+In contrast, bulk data focuses to features/genes for analysis. In this case its tidy representation with `tidySummarizedExperiment` prioritise features, exposing them to the user.
+
+If you want to interact with features, the method `join_features` will be helpful. For example, we can add one or more features of interest to our abstraction.
+
+Let's add the astrocyte marker GFAP 
+
+Find out ENSEMBL ID
+
+```{r}
+rowData(spatial_data) |> 
+  as_tibble() |> 
+  filter( gene_name == "GFAP")
+```
+
+Join the feature to the metadata
+
+```{r}
+spatial_data = 
+  spatial_data |> 
+  join_features("ENSG00000131095", shape="wide")
+
+spatial_data |> 
+  select(.cell, ENSG00000131095)
+
+```
+
+
+::: {.note}
+**Exercise 2.2**
+Join the endothelial marker PECAM1 (CD31, look for ENSEMBL ID), and plot in space the pixel that are in the 0.75 percentile of EPCAM1 expression. Are the PECAM1-positive pixels (endothelial?) spatially clustered?
+
+- Get the ENSEMBL ID
+- Join the feature to the tidy data abstraction
+- Calculate the 0.75 quantile across all pixels
+- Label the cells with high PECAM1
+- Plot the slide colouring for the new label 
+:::
+
+
+### 5. Summarisation/aggregation
+
+#### Distinct
+
+We can quickly explore the elements of a variable with distinct
+
+```{r}
+spatial_data |> 
+  distinct(sample_id)
+```
+We can `distinct` across multiple variables
+
+```{r}
+spatial_data |> 
+  distinct(sample_id, Cluster)
+```
 
-The gated cells can then be divided into pseudobulks within a SummarizedExperiment object using tidySpatialExperiment’s aggregate_cells utility function.
+#### Count
 
-```{r , eval=FALSE}
+We can gather more information counting the instances of a variable
+
+```{r}
+spatial_data |> 
+  count(Cluster) |> 
+  arrange(desc(n))
+```
+
+We calculate summary statistics of a subset of data
+
+```{r}
+spatial_data |> 
+filter(Cluster==1) |> 
+  count(sample_id) |> 
+  arrange(desc(n))
+
+```
+
+#### Aggregate
+
+For summarised analyses, we can aggregate pixels/cells as pseudobulk with the function `aggregate_cells`. This also works for `SingleCellExeriment`.We obtain a `SummarizedExperiment`. 
+
+```{r}
 spe_regions_aggregated <-
-  spe_regions |>
-  aggregate_cells(c(sample_id, region))
+  spatial_data |>
+  aggregate_cells(c(sample_id, spatialLIBD))
 
 spe_regions_aggregated
 ```
 
+`tidyomics` allows to cross spatial, single-cell (Bioconductor and seurat), and bulk keeping a consistent interface.
+
+```{r}
+library(tidySummarizedExperiment)
+
+spe_regions_aggregated
+
+```
 
-### 4. tidyfying your workflow
+You will be able to apply the familiar `tidyverse` operations
+
+```{r}
+spe_regions_aggregated |> 
+  filter(sample_id == "151673")
+```
+
+### 6. tidyfying your workflow
 
 We will take workflow used in **Session 2**, performed using mostly base R syntax and convert it to tidy R syntax. We will show you how the readability and modularity of your workflow will improve. 
 
@@ -321,11 +526,11 @@ The `tidyverse` approach inherently supports chaining further operations without
 
 #### Manipulating feature information
 
-:::: {.note}
+::: {.note}
 For `SingleCellExperiment` there is no tidy API for manipulating feature wise data yet, on the contrary for `SummarizedExperiment`, because gene-centric the abstraction  allow for direct gene information manipulation. Currently, `tidySingleCellExperiment` and `tidySpatialExperiment` do not prioritize the manipulation of features (genes). 
 
 While these functions can employ genes for cell manipulation and visualisation, as demonstrated in `join_features()`, they lack tools for altering feature-related information. Instead, their primary focus is on cell information, which serves as the main observational unit in single-cell data. This contrasts with bulk RNA sequencing data, where features are more central.
-::::
+:::
 
 The tidy API for `SingleCellExperiment` has feature-manipulation API among our plans. See [tidyomics challenges](https://github.com/orgs/tidyomics/projects/1)
 
@@ -495,7 +700,7 @@ spatial_data_filtered =
 **Maintainability:** Fewer and self-explanatory lines of code and no need for intermediate steps make the code easier to maintain and modify, especially when conditions change or additional filters are needed.
 
 
-### 5. Visualisation
+### 7. Visualisation
 
 Here, we will show how to use ad-hoc spatial visualisation, as well as `ggplot` to explore spatial data we will show how `tidySpatialExperiment` allowed to alternate between tidyverse visualisation, and any visualisation compatible with `SpatialExperiment`. 
 
@@ -506,15 +711,14 @@ Let’s visualise the regions that spatialLIBD labelled across three Visium 10X
 ```{r, fig.width=7, fig.height=8}
 spatial_data_filtered |> 
   ggspavis::plotSpots(annotate = "spatialLIBD") +
+  facet_wrap(~sample_id) +
     scale_color_manual(values = libd_layer_colors |> str_remove("ayer")) +
   theme(legend.position = "none") +
   labs(title = "spatialLIBD regions")
 ```
 
 #### Custom visualisation: Plotting the regions
 
-
-
 ```{r, fig.width=7, fig.height=8}
 spatial_data |> 
     ggplot(aes(array_row, array_col)) +
@@ -589,16 +793,43 @@ spatial_data_filtered |>
 
 As you can appreciate, the relationship between the number of genes, probed Purcell and their mitochondrial prescription abundance it's quite  consistent.
 
-:::: {.note}
-**Excercise 2.1**
+::: {.note}
+**Excercise 2.3**
 
 To to practice the use of `tidyomics` on spatial data, we propose a few exercises that connect manipulation, calculations and visualisation. These exercises are just meant to be simple use cases that exploit tidy R streamlined language.
 
 
 We assume that the cells we filtered as non-alive or damaged, characterised by being reached uniquely for mitochondrial, genes, and genes, linked to up ptosis. it is good practice to check these assumption. This exercise aims to estimate what genes are differentially expressed between filtered and unfiltered cells. Then visualise the results
 
 Use `tidyomic`s/`tidyverse` tools to label dead cells and perform differential expression within each region. Some of the comments you can use are: `mutate`, `nest`, `aggregate_cells`.
-::::
+:::
+
+::: {.note}
+**Excercise 2.4**
+
+Inspired by our audience, let's try to use `tidyomics` to identify potential Amyloid Plaques.
+
+Amyloid plaques are extracellular deposits primarily composed of aggregated amyloid-beta (Aβ) peptides. They are a hallmark of Alzheimer's disease (AD) and are also found in certain other neurodegenerative conditions.
+
+Amyloid plaques can be found in the brains of mice, particularly in transgenic mouse models that are engineered to develop Alzheimer's disease-like pathology. 
+
+Although amyloid plaques themselves are extracellular, the presence and formation of these plaques are associated with specific gene expression changes in the surrounding and involved cells. These gene markers are indicative of the processes that contribute to amyloid plaque formation, as well as the cellular response to these plaques ([Ranman et al., 2021](https://molecularneurodegeneration.biomedcentral.com/articles/10.1186/s13024-021-00465-0).)
+
+```{r}
+marker_genes_of_amyloid_plaques = c("APP", "PSEN1", "PSEN2", "CLU", "APOE", "CD68", "ITGAM", "AIF1")
+
+rownames(spatial_data) = rowData(spatial_data)$gene_name
+
+```
+
+The excercise includes
+- Join the features
+- Rescaling
+- Summarising signature (sum), `mutate()`
+- Plotting colousing by the signature
+
+# Plotting 
+:::
 
 
 **Session Information**
 
@@ -16,7 +16,7 @@ knitr::opts_chunk$set(echo = TRUE, cache = FALSE)
 ```
 
 
-::: note
+::: {.note}
 **Exercise 1.1**
 :::
 
@@ -35,7 +35,7 @@ ggspavis::plotVisium(
 ```
 
 
-::: note
+::: {.note}
 **Exercise 1.2**
 :::
 
@@ -63,9 +63,123 @@ plotSpotQC(
   facet_wrap(~sample_id)
 ```
 
-:::: {.note}
-**Excercise 2.1**
-::::
+
+::: {.note}
+**Excercise 1.3**
+:::
+
+
+```{r, fig.width=7, fig.height=8}
+
+res_spatialLIBD = split(data.frame(res$mat), colData(spatial_data_gene_name)$sample_id ) 
+
+lapply(res_spatialLIBD, function(x) plotCorrelationMatrix(as.matrix(x[,-10]))) 
+
+```
+
+::: {.note}
+**Excercise 1.4**
+:::
+
+
+```{r, fig.width=7, fig.height=8}
+
+res_spatialLIBD = split(data.frame(res$mat), colData(spatial_data_gene_name)$spatialLIBD ) 
+
+lapply(res_spatialLIBD, function(x) plotCorrelationMatrix(as.matrix(x[,-10]))) 
+```
+
+::: {.note}
+**Excercise 1.5**
+:::
+
+
+```{r, fig.width=7, fig.height=8}
+
+
+
+is_endothelial_leptomeningeal = mat_df$endothelial_cell >0.1 & mat_df$leptomeningeal_cell>0.1 & mat_df$endothelial_cell + mat_df$leptomeningeal_cell > 0.4 
+
+spatial_data$is_endothelial_leptomeningeal = is_endothelial_leptomeningeal
+
+ggspavis::plotSpots(spatial_data, annotate = "is_endothelial_leptomeningeal") +
+    facet_wrap(~sample_id) +
+  scale_color_manual(values = c("TRUE"= "red", "FALSE" = "grey"))
+theme(legend.position = "none") +
+  labs(title = "endothelial + leptomeningeal")
+
+
+
+
+is_endothelial_oligodendrocytes = mat_df$endothelial_cell >0.1 & mat_df$oligodendrocyte>0.05 & mat_df$endothelial_cell  + mat_df$oligodendrocyte > 0.4 
+
+spatial_data$is_endothelial_oligodendrocyte = is_endothelial_oligodendrocytes
+
+ggspavis::plotSpots(spatial_data, annotate = "is_endothelial_oligodendrocyte") +
+    facet_wrap(~sample_id) +
+  scale_color_manual(values = c("TRUE"= "blue", "FALSE" = "grey"))
+theme(legend.position = "none") +
+  labs(title = "endothelial + oligodendrocyte")
+
+```
+
+
+
+::: {.note}
+**Exercise 2.1**
+:::
+
+```{r}
+# Get top variable genes 
+genes <- !grepl(pattern = "^Rp[l|s]|Mt", x = rownames(spatial_data))
+hvg = scran::modelGeneVar(spatial_data, subset.row = genes, block = spatial_data$sample_id) 
+hvg = scran::getTopHVGs(dec, n = 1000)
+
+# Calculate PCA
+spatial_data <- 
+  spatial_data |> 
+  scuttle::logNormCounts() |> 
+  scater::runPCA(subset_row = hvg) |> 
+
+  # Calculate UMAP
+  scater::runUMAP(dimred = "PCA") |> 
+
+  # Plot
+  scater::plotUMAP(colour_by = ".gate")
+```
+
+
+::: {.note}
+**Exercise 2.2**
+:::
+
+
+```{r}
+rowData(spatial_data) |> 
+  as_tibble() |> 
+  filter( gene_name == "PECAM1")
+
+spatial_data |> 
+
+  # Join the feature
+  join_features("ENSG00000261371", shape="wide") |> 
+
+  # Calculate the quantile
+  mutate(my_quantile = quantile(ENSG00000261371, 0.75)) |> 
+
+  # Label the pixels
+  mutate(PECAM1_positive = ENSG00000261371 > my_quantile) |> 
+
+  # Plot
+  ggspavis::plotSpots(annotate = "PECAM1_positive") +
+  facet_wrap(~sample_id) 
+
+```
+
+
+::: {.note}
+**Excercise 2.3**
+:::
 
 ```{r}
 library(tidySummarizedExperiment)
@@ -102,45 +216,31 @@ differential_analysis |>
 ```
 
 
+::: {.note}
+**Excercise 2.4**
+:::
 
+```{r}
+rownames(spatial_data) = rowData(spatial_data)$gene_name
 
-```{r, fig.width=7, fig.height=8}
-
-res_spatialLIBD = split(data.frame(res$mat), colData(spatial_data_gene_name)$sample_id ) 
-
-lapply(res_spatialLIBD, function(x) plotCorrelationMatrix(as.matrix(x[,-10]))) 
-
-```
-
-
-```{r, fig.width=7, fig.height=8}
-
-res_spatialLIBD = split(data.frame(res$mat), colData(spatial_data_gene_name)$spatialLIBD ) 
-
-lapply(res_spatialLIBD, function(x) plotCorrelationMatrix(as.matrix(x[,-10]))) 
-```
-
+marker_genes_of_amyloid_plaques = c("APP", "PSEN1", "PSEN2", "CLU", "APOE", "CD68", "ITGAM", "AIF1")
 
-```{r, fig.width=7, fig.height=8}
+spatial_data |> 
 
-mat_df = as.data.frame(res$mat)
+# Join the features
+  join_features(marker_genes_of_amyloid_plaques, shape = "wide") |> 
 
-is_endothelial_leptomeningeal = mat_df$endothelial_cell >0.1 & mat_df$leptomeningeal_cell>0.1 & mat_df$endothelial_cell  + mat_df$leptomeningeal_cell > 0.4 
-is_endothelial_oligodendrocytes = mat_df$endothelial_cell >0.1 & mat_df$oligodendrocyte>0.05 & mat_df$endothelial_cell  + mat_df$oligodendrocyte > 0.4 
+  # Rescaling
+  mutate(across(any_of(marker_genes_of_amyloid_plaques), scales::rescale)) |> 
 
-spatial_data$is_endothelial_leptomeningeal = is_endothelial_leptomeningeal
-spatial_data$is_endothelial_oligodendrocyte = is_endothelial_oligodendrocytes
+# Summarising signature
+  mutate(amyloid_plaques_signature  = APP + PSEN1 + PSEN2 + CLU + APOE + CD68 + ITGAM + AIF1) |> 
 
-ggspavis::plotSpots(spatial_data, annotate = "is_endothelial_leptomeningeal") +
-    facet_wrap(~sample_id) +
-  scale_color_manual(values = c("TRUE"= "red", "FALSE" = "grey"))
-theme(legend.position = "none") +
-  labs(title = "endothelial + leptomeningeal")
+# Plotting
+  ggspavis::plotSpots(
+    annotate = "amyloid_plaques_signature"
+  ) + 
+  facet_wrap(~sample_id)
 
-ggspavis::plotSpots(spatial_data, annotate = "is_endothelial_oligodendrocyte") +
-    facet_wrap(~sample_id) +
-  scale_color_manual(values = c("TRUE"= "blue", "FALSE" = "grey"))
-theme(legend.position = "none") +
-  labs(title = "endothelial + oligodendrocyte")
 
 ```