Spatial omics have transformed tissue architecture and cellular heterogeneity analysis by integrating molecular data with spatial localization. In spatially resolved transcriptomics, identifying spatial domains is critical for analysis of anatomical regions within heterogeneous datasets and understanding tissue function. Since 2020, more than 50 spatially aware clustering methods have been developed for this task. However, the reliability of existing benchmarks is undermined by their narrow focus on Visium and brain tissue datasets, as well as the dependence on questionable ground truth annotations. Here, we implemented a consensus framework that surpasses traditional benchmarking practices.
Our framework comprises a community-driven benchmark-like platform that streamlines data formatting, method integration, and metric evaluation while accommodating new methods and datasets. Currently, the platform includes 22 spatially aware clustering methods across 15 datasets spanning 9 technologies and diverse tissue types. The benchmark approach uncovered significant limitations in generalizability and reproducibility where methods that perform well on healthy tissues often falter on cancer samples. We also found that anatomical labels commonly used as ground truths are often biased, potentially error-prone, and in some cases, unsuitable for benchmarking efforts.
In light of these issues, we adopt a flexible expert-in-the-loop consensus-driven approach. This goes beyond traditional ensemble/consensus methods, and allows researchers to interact with intermediate results to determine which tools should be used to generate a consensus. We believe that the inclusion of an expert-in-the-loop is critical to ensure that the computational analysis matches the biological question at hand, and we believe that when the focus of the analysis is to un cover novel biological discoveries, tissue experts are accessible more often than not.
This framework has established (and allows users to contribute) "modules" in their preferred programming language (.. as long as that is either R or Python). A module is a set of scripts set up something in one of the following categories: a dataset, a computational method, or an evaluation metric. Interfaces between each category enable seamless integration of new data, methods, or metrics, thus enabling an extensible and community-driven framework.
This repository contains some templates and examples of how to implement your module so that it interfaces seamlessly with other modules in the workflow. For example, if you want to implement a new method, you do not need to worry about input data or evaluation metrics as long as you follow the template for reading input and writing output - if you correctly adhere to the input and output guidelines, you should be able to interface with our default data modules and default evaluation metrics modules.
The existing modules are:
- data (currently 28)
- LIBD Visium DLPFC dataset (4 samples, each with 3 replicates)
- SEA_AD_data
- STARmap-2018-mouse-cortex
- STARmap_plus
- abc_atlas_wmb_thalamus
- cosmx_liver
- cosmx_lung
- her2st-breast-cancer
- locus_coeruleus
- merfish_devheart
- mouse_brain_sagittal_anterior
- mouse_brain_sagittal_posterior
- mouse_kidney_coronal
- osmfish_Ssp
- pachter_simulation
- slideseq2_olfactory_bulb
- sotip_simulation
- spatialDLPFC
- stereoseq_developing_Drosophila_embryos_larvae
- stereoseq_liver
- stereoseq_mouse_embryo
- stereoseq_olfactory_bulb
- visium_breast_cancer_SEDR
- visium_chicken_heart
- visium_hd_cancer_colon
- xenium-breast-cancer
- xenium-mouse-brain-SergioSalas
- methods (currently 24):
- BANKSY
- BayesSpace
- CellCharter
- DRSC
- DeepST
- Giotto
- GraphST
- SCAN-IT
- SC_MEB
- SEDR
- SOTIP
- STAGATE
- SpaceFlow
- SpiceMix
- bass
- conST
- maple
- meringue
- precast
- scanpy
- seurat
- spaGCN
- spatialGE
- stardust
- evaluation metrics (currently 17)
- ARI
- CHAOS
- Calinski-Harabasz
- Completeness
- Davies-Bouldin
- Entropy
- FMI
- Homogeneity
- LISI
- MCC
- NMI
- PAS
- SpatialARI
- V_measure
- cluster-specific-silhouette
- domain-specific-f1
- jaccard
Read our Contributing Guide and Code of Conduct.
We are close to releasing a preprint. Until then, please cite us as follows:
SpaceHack 2.0. Participants. SpaceHack 2.0: an expert in the loop consensus driven framework for spatially aware clustering [Computer software]. https://github.com/SpatialHackathon/SpaceHack2023
We have adopted the "MIT No Attribution" (MIT-0) License. It is currently attributed to the "SpaceHack organizers", but please also make sure to add your name to your contributions. More on MIT-0 here