-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy path02-prerequisites.Rmd
More file actions
139 lines (106 loc) · 5.68 KB
/
02-prerequisites.Rmd
File metadata and controls
139 lines (106 loc) · 5.68 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# Prerequisites
```{r, include = FALSE}
ottrpal::set_knitr_image_path()
```
## Learning Objectives
This chapter will cover the prerequisites for this course, including:
- Installing Docker
- Installing R Studio
- Downloading data files
## Docker
For the purpose of this course, we will be using Docker to run pVACseq and
pVACfuse.
Docker is a tool that is used to automate the deployment of applications
in lightweight containers so that applications can work efficiently in
different environments in isolation. We provide versioned Docker containers
for all pVACtools [releases](https://github.com/griffithlab/pVACtools/releases)
via [Docker Hub using the griffithlab/pvactools image name](https://hub.docker.com/r/griffithlab/pvactools).
In order to use Docker, you will to download the [Docker Desktop software](https://www.docker.com/get-started/).
Please ensure you select the correct install package for your operating
system.
## Terminal
We will be running Docker from the command line on your preferred terminal
using the Docker command line interface (CLI). The Docker CLI is already
included with Docker Desktop. Most operating systems already
come with a Terminal application. If yours doesn't, you will need to first
install one.
## R Studio and R package dependencies
In order to use pVACview, you will need to download R. Please refer
[here](https://cran.rstudio.com/) for downloading R (version 3.5 and above
required). You may also take the additional step of [downloading R
studio](https://www.rstudio.com/products/rstudio/download/) if
you are not familiar with launching R Shiny from the command line.
Additionally, there are a number of packages you will need to install in your R/R studio:
```{r, eval = FALSE}
install.packages("shiny", dependencies=TRUE)
install.packages("ggplot2", dependencies=TRUE)
install.packages("DT", dependencies=TRUE)
install.packages("reshape2", dependencies=TRUE)
install.packages("jsonlite", dependencies=TRUE)
install.packages("tibble", dependencies=TRUE)
install.packages("tidyr", dependencies=TRUE)
install.packages("plyr", dependencies=TRUE)
install.packages("dplyr", dependencies=TRUE)
install.packages("shinydashboard", dependencies=TRUE)
install.packages("shinydashboardPlus", dependencies=TRUE)
install.packages("fresh", dependencies=TRUE)
install.packages("shinycssloaders", dependencies=TRUE)
install.packages("RCurl", dependencies=TRUE)
install.packages("curl", dependencies=TRUE)
install.packages("string", dependencies=TRUE)
install.packages("shinycssloaders", dependencies=TRUE)
install.packages("plotly", dependencies=TRUE)
install.packages("shinyWidgets", dependencies=TRUE)
install.packages("colourpicker", dependencies=TRUE)
```
## Data
For this course, we have put together a set of input data generated from the breast
cancer cell line HCC1395 and a matched normal lymphoblastoid cell line HCC1395BL.
Data from this cell line is commonly used as test data in bioinformatics applications.
For more information on these lines and the generation of test data, please refer to
the [data section of our precision medicine bioinformatics course](https://pmbio.org/module-02-inputs/0002/05/01/Data/).
The input data consists of the following files:
For pVACseq:
- `annotated.expression.vcf.gz`: A somatic (tumor-normal) VCF and its tbi index file. The VCF has been
annotated with VEP and has coverage and expression information added. It has also been annotated with
custom VEP plugins that provide wild type and mutant versions of the full length protein sequences
predicted to arise from each transcript annotated with each variant.
- `phased.vcf.gz`: A phased tumor-germline VCF and its tbi index file to provide information about
in-phase proximal variants that might alter the predicted peptide sequence around a somatic
mutation of interest.
- `optitype_normal_result.tsv`: A OptiType file with HLA allele typing predictions.
For more detailed information on how the variant input file is created, please refer to the
[input file preparation](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep.html)
section of the pVACtools docs.
For pVACfuse:
- `agfusion_results`: An AGFusion output directory with annotated fusion
calls.
- `star-fusion.fusion_predictions.tsv`: A STARFusion prediction file with fusion read support
and expression information.
For pVACsplice:
- `HCC1395.splice_junctions.tsv`: A RegTools junctions output TSV file
General:
- `Homo_sapiens.GRCh38.pep.all.fa.gz`: A reference proteome peptide FASTA to use
for determining whether there are any reference matches of neoantigen candidates.
To download this data, please run the following commands:
```{r, engine = 'bash', eval = FALSE}
wget https://raw.githubusercontent.com/griffithlab/pVACtools_Intro_Course/main/HCC1395_inputs.zip
unzip HCC1395_inputs.zip
```
Additionally, to run pVACsplice, a set of reference files are required that
are too large to include in the HCC1395_inputs.zip archive. To download these
manually please run the following commands:
```{r, engine = 'bash', eval = FALSE}
cd HCC1395_inputs
wget https://ftp.ensembl.org/pub/release-105/gtf/homo_sapiens/Homo_sapiens.GRCh38.105.chr.gtf.gz
wget http://genomedata.org/pmbio-workshop/references/genome/all/ref_genome.tar
tar -xf ref_genome.tar ref_genome.fa.gz ref_genome.fa.fai
gunzip ref_genome.fa.gz
rm -rf ref_genome.tar
```
This will add the following reference files:
- `ref_genome.fa` and `.fai`: A reference DNA FASTA file and index
- `Homo_sapiens.GRCh38.105.chr.gtf.gz`: A reference GTF file
This course will not cover the required pre-processing steps for the pVACtools
input data but extensive instructions on how to prepare your own data for use
with pVACtools can be found at [pvactools.org](http://www.pvactools.org).