systemPipeR is an R/Bioconductor package for building and running automated end-to-end analysis workflows for a wide range of research applications, including next-generation sequencing (NGS) experiments, such as RNA-Seq, ChIP-Seq, VAR-Seq and Ribo-Seq. Important features include a uniform workflow interface across different data analysis applications, automated report generation, and support for running both R and command-line software, such as NGS aligners or peak/variant callers, on local computers or compute clusters. The latter supports interactive job submissions and batch submissions to queuing systems of clusters. Efficient handling of complex sample sets and experimental designs is facilitated by a well-defined sample annotation infrastructure which improves reproducibility and user-friendliness of many typical analysis workflows in the NGS area.
To install the package, please use the BiocManager::install
command:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("systemPipeR")
To obtain the most recent updates immediately, one can install it directly from github as follow:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("tgirke/systemPipeR", build_vignettes=TRUE, dependencies=TRUE)
Instructions for running systemPipeR are given in its main
vignette (manual).
The sample data set used in the vignette are provided by the data package systemPipeRdata.
The expected format to define NGS samples (e.g. FASTQ files) and their
labels are given in
targets.txt
and
targetsPE.txt
(latter is for PE reads).
With the latest Bioconductor Release 3.9,
we are adopting for this functionality the widely used community standard
Common Workflow Language (CWL) for describing
analysis workflows in a generic and reproducible manner, introducing SYSargs2
workflow control class. Using this community standard in systemPipeR
has many advantages. For instance, the integration of CWL allows running sytemPipeR
workflows from a single specification instance either entirely from within R, from various command-line
wrappers (e.g., cwl-runner) or from other languages (, e.g., Bash or Python).
The run parameters of command-line software are defined by param files that
have a simplified YAML name/value structure. Here is a sample param file
for Hisat2:
hisat2.cwl.
Templates for setting up custom project reports are provided by systemPipeRdata.
The corresponding PDFs of these report templates are linked here:
systemPipeRNAseq,
systemPipeRIBOseq,
systemPipeChIPseq
and
systemPipeVARseq.
WorkFlow | Description | Version | R-CMD-check | |
---|---|---|---|---|
systemPipeChIPseq | ChIP-Seq Workflow Template | |||
systemPipeRIBOseq | RIBO-Seq Workflow Template | |||
systemPipeRNAseq | RNA-Seq Workflow Template | |||
systemPipeVARseq | VAR-Seq Workflow Template | |||
systemPipeMethylseq | Methyl-Seq Workflow Template | |||
systemPipeDeNovo | De novo transcriptome assembly Workflow Template | |||
systemPipeCLIPseq | CLIP-Seq Workflow Template | |||
systemPipeMetaTrans | Metatranscriptomic Sequencing Workflow Template |