A pan-cancer genome-wide analysis reveals tumour dependencies by induction of nonsense-mediated decay
Zhiyuan Hu1,2,3, Christopher Yau3,4* and Ahmed Ashour Ahmed1,2*
- Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, OX3 9DU, UK
- Nuffield Department of Obstetrics and Gynaecology, University of Oxford, Oxford, OX3 9DU, UK
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Computational Biology, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
*Corresponding authors
Note: please address your inquiry to the corresponding authors to make sure that it get anwsered.
Herein the repo contains the raw data and coding to reproduce the results in the manusript:
Z. Hu, C. Yau and A. Ahmed (2017) A pan-cancer genome-wide analysis reveals tumour dependencies by induction of non-sense mediated decay (accepted).
While some mutations can introduce premature termination codons (PTCs) into genes, nonsense mediated decay (NMD) can detect these PTCs and then eliminate the abnormal transcripts. PTCs and NMD play important roles in genetic diseases and cancers.
Herein, three rules are used to predict whether a PTC-generating mutation is NMD-elicit or NMD-escape:
-
PTC is more than 50-54bp upstream of the last-exon-exon junction.
-
targeted gene is not intronless.
-
PTC is more than 200bp downstream of the start codon.
Using these rules, we can predict whether a called mutation will elicit NMD on the mRNA from the mutated gene, i.e. the NMD-elicit mutations.
To predict the NMD-elicit mutations in your own dataset, please try our R package masonmd.
In R or Rstudio, use the following codes to install the masonmd package directly from Github:
install.packages(“devtools”)
devtools::install_github("zhiyhu/masonmd")
The raw data is in the data/
folder. Before start, combine the splitted RNA-seq data by putting the following commands into the terminal.
cd NMD-paper
cd data
cat PANCAN_HiSeqV2a*.zip > PANCAN_HiSeqV2.zip
In total, eight R scripts are included in the R
subdirectory. You can run them by the order from 01_PANCAN_data.R
.
-
00_PANCAN_function.R
contains the functions that are used in the prediction or analysis. -
01_PANCAN_data.R
is about loading data and preprocessing. -
02_PANCAN_classify.R
is about prediction of NMD-elicit mutations. -
06_PANCAN_figures_manuscript.R
generates the figures in the manuscript -
07_PANCAN_supplementary_figures.R
generates the supplementary figures.
A manuscript detailing our work has been published by Nature Communications:
Z. Hu, C. Yau and A. Ahmed (2017) A pan-cancer genome-wide analysis reveals tumour dependencies by induction of nonsense-mediated decay. Nat. Commun. 8, 15943 doi: 10.1038/ncomms15943 (2017).