Skip to content

Commit

Permalink
Merge pull request #5281 from Mennayousef/main
Browse files Browse the repository at this point in the history
Adding single cell GO enrichment tutorial
  • Loading branch information
bgruening committed Sep 17, 2024
2 parents d0702ea + e26a0aa commit 0bf34ca
Show file tree
Hide file tree
Showing 31 changed files with 617 additions and 0 deletions.
4 changes: 4 additions & 0 deletions CONTRIBUTORS.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2501,6 +2501,10 @@ ThibaudGlinez:
- pndb
- elixir-europe

MennaGamal:
name: Menna Gamal
email: [email protected]
joined: 2024-09

tbrown91:
name: Tom Brown
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added topics/single-cell/images/workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions topics/single-cell/tutorials/GO-enrichment/data-library.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
destination:
type: library
name: GTN - Material
description: Galaxy Training Network Material
synopsis: Galaxy Training Network Material. See https://training.galaxyproject.org
items:
- name: New topic
description: Topic summary
items:
- name: GO Enrichment Analysis on Single-Cell RNA-Seq Data
items:
- name: 'DOI: 10.5281/zenodo.13461890'
description: latest
items:
- url: https://zenodo.org/api/records/13461890/files/Galaxy3-[GO].obo
src: url
ext: 'obo'
info: https://zenodo.org/records/13461890
- url: https://zenodo.org/api/records/13461890/files/Galaxy2-[GO_annotations_Mus_musculus].tabular
src: url
ext: 'tabular'
info: https://zenodo.org/records/13461890
- url: https://zenodo.org/api/records/13461890/files/Galaxy5-[Markers_-_clusters].tabular
src: url
ext: 'tabular'
info: https://zenodo.org/records/13461890
- url: https://zenodo.org/api/records/13461890/files/Galaxy4-[Background_gene_set].tabular
src: url
ext: 'tabular'
info: https://zenodo.org/records/13461890
- url: https://zenodo.org/api/records/13461890/files/Galaxy1-[Markers_-_genotype_].tabular
src: url
ext: 'tabular'
info: https://zenodo.org/records/13461890
3 changes: 3 additions & 0 deletions topics/single-cell/tutorials/GO-enrichment/faqs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
layout: faq-page
---
233 changes: 233 additions & 0 deletions topics/single-cell/tutorials/GO-enrichment/slides.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
---
layout: tutorial_slides
logo: "GTN"
title: "GO Enrichment Analysis on Single-Cell RNA-Seq Data"

subtopic: scintroduction
priority: 3

video: no
zenodo_link: "https://zenodo.org/records/13461890"
tags:
questions:
- "What is Gene Ontology (GO) enrichment analysis, and why should I perform it on my marker genes?"
- "How can I use GO enrichment analysis to better understand the biological functions of the genes in my clusters?"
- "Can GO enrichment analysis help me confirm that my clusters represent distinct cell types or states?"
objectives:
- "Aim to identify enriched GO terms associated with the DEGs."
- "Enhance understanding of biological processes, molecular functions, and cellular components involved."
time_estimation: "1h"
key_points:
- "GO enrichment analysis helps identify which biological processes, molecular functions, and cellular components are significantly represented in a set of genes."
- "The analysis employs statistical methods to determine whether the observed enrichment of GO terms is greater than what would be expected by chance."
contributors:
- nomadscientist
- MennaGamal

---

### scRNA-Seq data analysis roadmap

.image-100[![slide5](../../images/GO-enrichment/slides_images/roadmap_1.png)]


???
Here is a typical workflow for analyzing single-cell RNA sequencing data.
We can break this process down into three main sections:
- Data Preprocessing: This is the initial step, where we focus on quality control, alignment, and quantification of the data. It’s crucial to ensure that our data is clean and reliable.
- General Analyses: In this phase, we filter out low-quality cells, normalize the data, and select highly variable genes, or HVGs. We then perform dimensionality reduction, cluster the cells, and annotate the different cell types. This step allows us to make sense of the data and identify distinct cell populations.
- Exploratory Analyses: Finally, we delve into exploratory analyses. This includes differential expression gene (DEG) analysis, functional enrichment studies, gene set variation analysis (GSVA), and transcription factor (TF) prediction. We also investigate cell trajectories, interactions between cells, cell cycles, and even spatial transcriptomics.
This tutorial will focus on Gene Ontology (GO) Enrichment exploratory analysis.


---

### Ontology

.center[A standardized vocabulary for expressing knowledge within a specific domain.]
.image-100[![slide6](../../images/GO-enrichment/slides_images/ontology_2.png)]

???
- Before we introduce “GO enrichment analysis” let us first understand what it means by Ontology and Gene Ontology (GO).
- Ontology is a set of terms with their precise definitions and defined relationships between them. For example, imagine you are organizing a library of books. You want to classify and organize these books so that others can easily find what they are looking for. Ontology in this context would be a structured system for categorizing books.

---

### Gene Ontology (GO): Unifying Biology

.image-100[![slide7](../../images/GO-enrichment/slides_images/go_3.png)]

???
Gene Ontology has 3 main classifications (Biological process, Molecular function, and Cellular component) this allows scientists to precisely describe what a gene does, how it does it, and where it happens in the cell.

---

### GO Hierarchy

.image-100[![slide8](../../images/GO-enrichment/slides_images/hierarchy_4.png)]

???
- The structure of Gene Ontology is hierarchical, meaning that terms at lower levels are more specific, while those higher up are more general.
We can visualize the structure of GO as a graph, where each term represents a node, and the relationships between these terms are represented as edges connecting the nodes. This organization helps in understanding the connections and relationships among various gene functions, these relationships are either is_a, part_of, has_part, or regulates.
- Note that it is possible for a term to have multiple parents. 
---

.image-100[![slide9](../../images/GO-enrichment/slides_images/components_5.png)]

???
- "GO terms” are the unified vocabulary used within the Gene Ontology framework. Each GO term is accompanied by detailed information, and the GO list provides insights into these terms.
- Additionally, GO annotations illustrate the relationships between each GO term and the genes associated with them.

---

### Example describing gene functions before and after Gene Ontology

.image-100[![slide10](../../images/GO-enrichment/slides_images/example_6.png)]

???
This simple example illustrates how Gene Ontology (GO) adds clarity and standardization when describing gene functions.

---

.center[How to use GO to perform **GO Enrichment Analysis** on scRNA-Seq data?]

???
Now that we understand what GO mean, let’s explore what is GO Enrichment analysis in the context of scRNA-Seq data.

---

### Enrichment analysis of scRNA-Seq data

.image-100[![slide12](../../images/GO-enrichment/slides_images/enrichment_7.png)]
.center[Enrichment analysis is a type of functional annotation process which is the process of associating biological functions with genes or cells based on the expression data.]

???
- Enrichment analysis is a statistical approach used to determine whether certain biological features are overrepresented (enriched) in a subset of genes(i.e., Marker genes) compared to a broader background set.

---

### GO Enrichment Analysis


.image-100[![slide13](../../images/GO-enrichment/slides_images/go_enrichment_8.png)]

???
- Each GO term is associated with a set of genes.
- We try to find out which GO terms or gene sets are enriched in our list of marker genes.
- These GO terms are functionally assigned to various biological functions and locations, aiding our understanding of the roles of marker genes.
- Also note that each gene can be assigned to multiple GO terms allowing the capture of various functions, processes, and cellular components associated with a gene.

---

### Steps of GO enrichment analysis

.left[1- Select the marker genes for each cell cluster / condition]
.image-60[![slide14](../../images/GO-enrichment/slides_images/step1_9.png)]

???
We start the analysis by selecting a list of differentially expressed genes (marker genes). Marker genes could be a list of differentially expressed genes between 2 different conditions or between different cell types.

---

.left[2- Tag the genes with GO terms]
.image-60[![slide14](../../images/GO-enrichment/slides_images/step2_10.png)]

???
Each gene can be "tagged" with one or more GO terms, similar to how books are categorized by genre or topic.

---

.left[3- Count How Many Times Each GO Term Appears]
.image-60[![slide14](../../images/GO-enrichment/slides_images/step3_11.png)]

???
We then count the number of times each GO term shows up in your list of genes. For example, GO term A appears in 2 out of 3 genes

---

.left[4- Compare to the whole background gene set]
.image-60[![slide15](../../images/GO-enrichment/slides_images/step4_12.png)]

???
- Next, we compare how often each GO term appears in our gene list compared to how often it appears across the background gene set, this background may list all genes in the genome or all genes involved in the experiment.
- This comparison tells you whether certain GO terms are "enriched" or more common in the marker genes than you would expect by chance.

---

.left[5- Is It Significant? Basic interpretation:]
.image-60[![slide16](../../images/GO-enrichment/slides_images/step5_13.png)]

???
- Just because you see GO Term A a lot doesn’t necessarily mean it’s important—it could be by chance. So, we use statistical tests to see if the enrichment is statistically significant.
- While in this simple example, it is obvious that the proportion of genes associated with GO term A in the marker gene set (67%) is higher than the proportion in the background set (30%). This suggests that GO term A is potentially enriched in the marker gene set and we can add it to our purple list of enriched GO terms.

---

.left[6- Statistical testing:]
.image-60[![slide17](../../images/GO-enrichment/slides_images/step6_14.png)]

???
- In real-world scenarios where we have hundreds or thousands of genes we need to formally assess whether this difference is statistically significant (i.e., whether GO term A is truly enriched or if this difference is by chance). Fisher's Exact Test and the hypergeometric test are the most commonly used tests in this situation.
- Fischer’s Exact test substitutes the values of the contingency table in a formula to calculate the probability (P-value) that corresponds to how likely the observed distribution is by chance. A lower P-value suggests that the GO term is truly enriched in the list of marker genes.

---

.left[7- Interpret the results:]
.image-60[![slide18](../../images/GO-enrichment/slides_images/step7_15.png)]

???
After we have transformed the long list of marker genes into a short list of biological themes in the form of GO terms we can proceed with the interpretation of the results through visualization of the most common themes to identify patterns or relationships between GO terms, we can also analyze the GO hierarchy where higher-level categories (parent terms) provide broader biological contexts, while lower-level categories (child terms) offer more specific insights, in addition to relating the enriched GO terms to existing biological knowledge.

---

### Example 1: GO Enrichment Analysis of Platelet Proteins in Early-Stage Cancer

.image-90[![slide19](../../images/GO-enrichment/slides_images/example1_16.png)]

???
- In a study by Zúñiga-León et al. (2018) they explored the differentially expressed platelet proteins in early-stage cancer in comparison to a control group.

- The study revealed that out of 4,384 unique proteins expressed in platelets, 85 proteins showed significant differences in abundance in early-stage cancer patients compared to controls. This highlights the potential of platelets as biomarkers for early cancer detection.

- The enrichment analysis identified 19 enriched biological processes, it also uncovered six enriched molecular functions.


---

### Biological interpretation of the results

.image-100[![slide20](../../images/GO-enrichment/slides_images/interpretation_18.png)]

???
- The 19 biological processes identified primarily relate to inflammatory response, immune response, and cancer annotations.
- Among the enriched GO terms, one notable member is protein (P06702), which has been found to be differentially expressed in several cancer types, including breast, colon, liver, gastric, and non-small cell lung cancer. This protein plays a crucial role in promoting cancer growth.
- Additionally, the analysis revealed five proteins that are directly associated with the GO term angiogenesis (GO:0001525). Angiogenesis is essential for tumor development, as it supplies the necessary blood flow for delivering nutrients and oxygen.


---
### Example 2: Identifying Potential Biomarkers of Non-Small Cell Lung Cancer Through GO Enrichment Analysis of scRNA-Seq Data

.image-100[![slide21](../../images/GO-enrichment/slides_images/example2.png)]

???
- Non-small cell lung cancer (NSCLC) is the most prevalent lung cancer. This study aims to identify gene biomarkers for NSCLC diagnosis and prognosis using single-cell RNA sequencing (scRNA-seq) data and bioinformatics techniques.

- 158 differentially expressed genes (DEGs) were identified, comprising 48 upregulated and 110 downregulated genes.

- Gene Ontology enrichment was then conducted on the differentially expressed genes (Marker genes), statistical testing was performed on each GO term and only GO terms with P-value < 0.01 were selected.

---
### Results interpretation

.image-100[![slide22](../../images/GO-enrichment/slides_images/example2_int.png)]

---
### Purpose and Importance

.image-100[![slide23](../../images/GO-enrichment/slides_images/purpose_19.png)]

???
- To conclude, GO enrichment analysis helps scientists make sense of large gene lists by pointing out which biological processes, functions, or cellular locations are most relevant.
- Instead of looking at each gene individually, it allows researchers to focus on broader biological themes, making it easier to interpret the data.


42 changes: 42 additions & 0 deletions topics/single-cell/tutorials/GO-enrichment/tutorial.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@

# This is the bibliography file for your tutorial.
#
# To add bibliography (bibtex) entries here, follow these steps:
# 1) Find the DOI for the article you want to cite
# 2) Go to https://doi2bib.org and fill in the DOI
# 3) Copy the resulting bibtex entry into this file
#
# To cite the example below, in your tutorial.md file
# use {% cite Batut2018 %}
#
# If you want to cite an online resourse (website etc)
# you can use the 'online' format (see below)
#
# You can remove the examples below
@article{Batut2018,
doi = {10.1016/j.cels.2018.05.012},
url = {https://doi.org/10.1016/j.cels.2018.05.012},
year = {2018},
month = jun,
publisher = {Elsevier {BV}},
volume = {6},
number = {6},
pages = {752--758.e1},
author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and
Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton
and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and
Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and
Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{i}}rez and Devon Ryan and
Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and
Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Bj\"{o}rn Gr\"{u}ning},
title = {Community-Driven Data Analysis Training for Biology},
journal = {Cell Systems}
}

@online{gtn-website,
author = {GTN community},
title = {GTN Training Materials: Collection of tutorials developed and maintained by the worldwide Galaxy community},
url = {https://training.galaxyproject.org},
urldate = {2021-03-24}
}
Loading

0 comments on commit 0bf34ca

Please sign in to comment.