You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: vignettes/Session_1_sequencing_assays.Rmd
+17-8
Original file line number
Diff line number
Diff line change
@@ -463,7 +463,7 @@ reducedDimNames(spatial_data)
463
463
reducedDim(spatial_data, "PCA")[1:5, 1:5]
464
464
```
465
465
466
-
::: note
466
+
::: {.note}
467
467
As for single-cell data, we need to verify that there is not significant batch effect. If so we need to adjust for it (a.k.a. integration) before calculating principal component. Many adjustment methods to output adjusted principal components directly.
Visualise where the two macro clusters are located spatially. We will take a very pragmatic approach and get cluster label from splitting the UMAP coordinated in two (`colData()` and `reducedDim()` will help us, see above), and then visualise it with `ggspavis`.
Here, we perform PCA using the BANKSY algorithm on the joint dataset. The group argument specifies how to treat different samples, ensuring that features are scaled separately per sample group to account for variations among them.
Rather than looking at the correlation matrix, overall, let's observe whether the correlation structure amongst cell types is consistent across samples. Do you think it's consistent or noticeably different?
@@ -967,7 +967,7 @@ Rather than looking at the correlation matrix, overall, let's observe whether th
Now let's observe whether the correlation structure is consistent across spatial regions, irrespectively of the sample of origin. Do you think they are consistent or noticably different?
Some of the most positive correlations involve the end of cells with Oligodendrocytes and Leptomeningeal cells.
986
+
Some of the most positive correlations involve the endothelial cells with Oligodendrocytes and Leptomeningeal cells.
987
987
988
988
Leptomeningeal cells refer to the cells that make up the leptomeninges, which consist of two of the three layers olet's meninges surrounding the brain and spinal cord: the arachnoid mater and the pia mater. These layers play a critical role in protecting the central nervous system and assisting in various physiological processes.
989
989
990
990
Oligodendrocytes are a type of glial cell in the central nervous system (CNS) of vertebrates, including humans and mouse. These cells are crucial for the formation and maintenance of the myelin sheath, a fatty layer that encases the axons of many neurons.
991
991
992
992
Let's try to visualise the pixel where these cell types most occur.
993
+
994
+
995
+
- Label pixel that have > 10% (> 0.1) endothelial_cell and leptomeningeal_cell
996
+
- Label pixels that have > 40% (> 0.4) across these two cells
let's make sure we get the latest packages available on github
50
+
51
+
```{r, eval=FALSE}
52
+
53
+
# In May 2024, the following packages should be installed from github repositories, to use the latest features. In case you have them pre installed, run the following command
54
+
BiocManager::install(c("lmweber/ggspavis",
55
+
"stemangiola/tidySummarizedExperiment",
56
+
"stemangiola/tidySingleCellExperiment",
57
+
"william-hutchison/tidySpatialExperiment",
58
+
"stemangiola/tidybulk",
59
+
"stemangiola/tidygate",
60
+
"stemangiola/CuratedAtlasQueryR"),
61
+
update = FALSE)
62
+
63
+
```
64
+
65
+
**Then please restart your R session** to make sure the packages we will load will be the ones we intalled mode recently.
Note that **rows** in this context refers to rows of the abstraction, not **rows** of the SpatialExperiment which correspond to genes **tidySpatialExperiment** prioritizes cells as the units of observation in the abstraction, while the full dataset, including measurements of expression of all genes, is still available "in the background".
148
+
:::
149
+
123
150
#### Original behaviour is preserved
124
151
125
152
The tidy representation behaves exactly as a native `SpatialExperiment`. It can be interacted with using [SpatialExperiment commands](https://www.bioconductor.org/packages/release/bioc/vignettes/SpatialExperiment/inst/doc/SpatialExperiment.html)
@@ -133,10 +160,46 @@ assays(spatial_data)
133
160
134
161
We can also interact with our object as we do with any tidyverse tibble. We can use `tidyverse` commands, such as `filter`, `select` and `mutate` to explore the `tidySpatialExperiment` object. Some examples are shown below and more can be seen at the `tidySpatialExperiment`[website](https://stemangiola.github.io/tidySpatialExperiment/articles/introduction.html#tidyverse-commands-1).
135
162
163
+
#### Select
164
+
165
+
We can use `select` to view columns, for example, to see the filename, total cellular RNA abundance and cell phase.
166
+
167
+
If we use `select` we will also get any view-only columns returned, such as the UMAP columns generated during the preprocessing.
Note that some columns are always displayed no matter whet. These column include special slots in the objects such as reduced dimensions, spatial coordinates (mandatory for `SpatialExperiment`), and sample identifier (mandatory for `SpatialExperiment`).
175
+
:::
176
+
177
+
Although the select operation can be used as a display tool, to explore our object, it updates the `SpatialExperiment` metadata, subsetting the desired columns.
Note that **rows** in this context refers to rows of the abstraction, not **rows** of the SpatialExperiment which correspond to genes **tidySpatialExperiment** prioritizes cells as the units of observation in the abstraction, while the full dataset, including measurements of expression of all genes, is still available "in the background".
163
-
::::
230
+
Flexible, more powerful filters with `stringr`
164
231
165
-
#### Select
232
+
```{r}
166
233
167
-
We can use `select` to view columns, for example, to see the filename, total cellular RNA abundance and cell phase.
234
+
spatial_data |>
235
+
dplyr::filter(
236
+
subject |> str_detect("Br[0-9]1"),
237
+
spatialLIBD == "L1"
238
+
)
168
239
169
-
If we use `select` we will also get any view-only columns returned, such as the UMAP columns generated during the preprocessing.
240
+
```
241
+
242
+
#### Summarise
243
+
244
+
The integration of all spot/pixel/cell-related information in one table abstraction is very powerful to speed-up data exploration ana analysis.
We can use `mutate` to create a column. For example, we could create a new `Phase_l` column that contains a lower-case version of `Phase`.
178
257
179
-
In this case, three columns that are view only (`sample_id`, `pxl_col_in_fullres`, `pxl_row_in_fullres`, `PC*`) will be always included in the tidy representation because they cannot be omitted from the data container (is opposed to metadata)
258
+
::: {.note}
259
+
Note that the special columns `sample_id`, `pxl_col_in_fullres`, `pxl_row_in_fullres`, `PC*` are view only and cannot be mutated.
We can update the underlying `SpatialExperiment` object, for future analyses. And confirm that the `SpatialExperiment` metadata has been mutated.
269
+
270
+
```{r message=FALSE}
271
+
spatial_data =
272
+
spatial_data |>
273
+
mutate(spatialLIBD_lower = tolower(spatialLIBD))
274
+
275
+
spatial_data |>
276
+
colData() |>
277
+
_[,c("spatialLIBD", "spatialLIBD_lower")]
278
+
```
279
+
280
+
We can mutate columns for on-the-fly analyses and exploration. Let's suppose one column has capitalisation inconsistencies, and we want to apply a unique filter.
281
+
282
+
```{r message=FALSE}
283
+
spatial_data |>
284
+
mutate(spatialLIBD = tolower(spatialLIBD)) |>
285
+
filter(spatialLIBD == "wm")
286
+
```
287
+
288
+
#### Extract
289
+
187
290
We can use tidyverse commands to polish an annotation column. We will extract the sample, and group information from the file name column into separate columns.
Extract specific identifiers from complex data paths, simplifying the dataset by isolating crucial metadata. This process allows for clearer identification of samples based on their file paths, improving data organization.
Gate roughly the white matter layer of the tissue (bottom-left) and visualise in UMAP reduced dimensions where this manual gate is distributed.
388
+
389
+
- Calculate UMAPs as we did for Sesison 1
390
+
- Plot UMAP dimensions according to the gating
391
+
:::
392
+
393
+
### 4. Work with features
394
+
395
+
By default `tidySpatialExperiment` (as well as `tidySingleCellExperiment`) focus their tidy abstraction on pixels and cells, as this is the key analysis and visualisation unit in sopatial and single-cell data. This has proven to be a practican solution to achieve elegant `tidy` analyses and visualisation.
396
+
397
+
In contrast, bulk data focuses to features/genes for analysis. In this case its tidy representation with `tidySummarizedExperiment` prioritise features, exposing them to the user.
398
+
399
+
If you want to interact with features, the method `join_features` will be helpful. For example, we can add one or more features of interest to our abstraction.
400
+
401
+
Let's add the astrocyte marker GFAP
402
+
403
+
Find out ENSEMBL ID
404
+
405
+
```{r}
406
+
rowData(spatial_data) |>
407
+
as_tibble() |>
408
+
filter( gene_name == "GFAP")
409
+
```
410
+
411
+
Join the feature to the metadata
412
+
413
+
```{r}
414
+
spatial_data =
415
+
spatial_data |>
416
+
join_features("ENSG00000131095", shape="wide")
417
+
418
+
spatial_data |>
419
+
select(.cell, ENSG00000131095)
420
+
421
+
```
422
+
423
+
424
+
::: {.note}
425
+
**Exercise 2.2**
426
+
Join the endothelial marker PECAM1 (CD31, look for ENSEMBL ID), and plot in space the pixel that are in the 0.75 percentile of EPCAM1 expression. Are the PECAM1-positive pixels (endothelial?) spatially clustered?
427
+
428
+
- Get the ENSEMBL ID
429
+
- Join the feature to the tidy data abstraction
430
+
- Calculate the 0.75 quantile across all pixels
431
+
- Label the cells with high PECAM1
432
+
- Plot the slide colouring for the new label
433
+
:::
434
+
435
+
436
+
### 5. Summarisation/aggregation
437
+
438
+
#### Distinct
439
+
440
+
We can quickly explore the elements of a variable with distinct
441
+
442
+
```{r}
443
+
spatial_data |>
444
+
distinct(sample_id)
445
+
```
446
+
We can `distinct` across multiple variables
447
+
448
+
```{r}
449
+
spatial_data |>
450
+
distinct(sample_id, Cluster)
451
+
```
284
452
285
-
The gated cells can then be divided into pseudobulks within a SummarizedExperiment object using tidySpatialExperiment’s aggregate_cells utility function.
453
+
#### Count
286
454
287
-
```{r , eval=FALSE}
455
+
We can gather more information counting the instances of a variable
456
+
457
+
```{r}
458
+
spatial_data |>
459
+
count(Cluster) |>
460
+
arrange(desc(n))
461
+
```
462
+
463
+
We calculate summary statistics of a subset of data
464
+
465
+
```{r}
466
+
spatial_data |>
467
+
filter(Cluster==1) |>
468
+
count(sample_id) |>
469
+
arrange(desc(n))
470
+
471
+
```
472
+
473
+
#### Aggregate
474
+
475
+
For summarised analyses, we can aggregate pixels/cells as pseudobulk with the function `aggregate_cells`. This also works for `SingleCellExeriment`.We obtain a `SummarizedExperiment`.
476
+
477
+
```{r}
288
478
spe_regions_aggregated <-
289
-
spe_regions |>
290
-
aggregate_cells(c(sample_id, region))
479
+
spatial_data |>
480
+
aggregate_cells(c(sample_id, spatialLIBD))
291
481
292
482
spe_regions_aggregated
293
483
```
294
484
485
+
`tidyomics` allows to cross spatial, single-cell (Bioconductor and seurat), and bulk keeping a consistent interface.
486
+
487
+
```{r}
488
+
library(tidySummarizedExperiment)
489
+
490
+
spe_regions_aggregated
491
+
492
+
```
295
493
296
-
### 4. tidyfying your workflow
494
+
You will be able to apply the familiar `tidyverse` operations
495
+
496
+
```{r}
497
+
spe_regions_aggregated |>
498
+
filter(sample_id == "151673")
499
+
```
500
+
501
+
### 6. tidyfying your workflow
297
502
298
503
We will take workflow used in **Session 2**, performed using mostly base R syntax and convert it to tidy R syntax. We will show you how the readability and modularity of your workflow will improve.
299
504
@@ -321,11 +526,11 @@ The `tidyverse` approach inherently supports chaining further operations without
321
526
322
527
#### Manipulating feature information
323
528
324
-
:::: {.note}
529
+
::: {.note}
325
530
For `SingleCellExperiment` there is no tidy API for manipulating feature wise data yet, on the contrary for `SummarizedExperiment`, because gene-centric the abstraction allow for direct gene information manipulation. Currently, `tidySingleCellExperiment` and `tidySpatialExperiment` do not prioritize the manipulation of features (genes).
326
531
327
532
While these functions can employ genes for cell manipulation and visualisation, as demonstrated in `join_features()`, they lack tools for altering feature-related information. Instead, their primary focus is on cell information, which serves as the main observational unit in single-cell data. This contrasts with bulk RNA sequencing data, where features are more central.
328
-
::::
533
+
:::
329
534
330
535
The tidy API for `SingleCellExperiment` has feature-manipulation API among our plans. See [tidyomics challenges](https://github.com/orgs/tidyomics/projects/1)
331
536
@@ -495,7 +700,7 @@ spatial_data_filtered =
495
700
**Maintainability:** Fewer and self-explanatory lines of code and no need for intermediate steps make the code easier to maintain and modify, especially when conditions change or additional filters are needed.
496
701
497
702
498
-
### 5. Visualisation
703
+
### 7. Visualisation
499
704
500
705
Here, we will show how to use ad-hoc spatial visualisation, as well as `ggplot` to explore spatial data we will show how `tidySpatialExperiment` allowed to alternate between tidyverse visualisation, and any visualisation compatible with `SpatialExperiment`.
501
706
@@ -506,15 +711,14 @@ Let’s visualise the regions that spatialLIBD labelled across three Visium 10X
As you can appreciate, the relationship between the number of genes, probed Purcell and their mitochondrial prescription abundance it's quite consistent.
591
795
592
-
:::: {.note}
593
-
**Excercise 2.1**
796
+
::: {.note}
797
+
**Excercise 2.3**
594
798
595
799
To to practice the use of `tidyomics` on spatial data, we propose a few exercises that connect manipulation, calculations and visualisation. These exercises are just meant to be simple use cases that exploit tidy R streamlined language.
596
800
597
801
598
802
We assume that the cells we filtered as non-alive or damaged, characterised by being reached uniquely for mitochondrial, genes, and genes, linked to up ptosis. it is good practice to check these assumption. This exercise aims to estimate what genes are differentially expressed between filtered and unfiltered cells. Then visualise the results
599
803
600
804
Use `tidyomic`s/`tidyverse` tools to label dead cells and perform differential expression within each region. Some of the comments you can use are: `mutate`, `nest`, `aggregate_cells`.
601
-
::::
805
+
:::
806
+
807
+
::: {.note}
808
+
**Excercise 2.4**
809
+
810
+
Inspired by our audience, let's try to use `tidyomics` to identify potential Amyloid Plaques.
811
+
812
+
Amyloid plaques are extracellular deposits primarily composed of aggregated amyloid-beta (Aβ) peptides. They are a hallmark of Alzheimer's disease (AD) and are also found in certain other neurodegenerative conditions.
813
+
814
+
Amyloid plaques can be found in the brains of mice, particularly in transgenic mouse models that are engineered to develop Alzheimer's disease-like pathology.
815
+
816
+
Although amyloid plaques themselves are extracellular, the presence and formation of these plaques are associated with specific gene expression changes in the surrounding and involved cells. These gene markers are indicative of the processes that contribute to amyloid plaque formation, as well as the cellular response to these plaques ([Ranman et al., 2021](https://molecularneurodegeneration.biomedcentral.com/articles/10.1186/s13024-021-00465-0).)
0 commit comments