STAR/Salmon pipeline for RNA Seq on Slurm cluster

Description

This is a RNA Seq pipeline based on STAR and Salmon designed to run on Slurm clusters

Requirements

Python 3
SLURM cluster
STAR >= 2.70 (https://github.com/alexdobin/STAR)
Salmon 0.13.1 (https://anaconda.org/bioconda/salmon/files)
trim_galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)
htseq (https://htseq.readthedocs.io/en/master/)

Installation

This set of script files is designed to be executed in the same directory. From the directory these files are located in, you can execute

$ ./submit_dir.sh <genomedata> <data-root> <resultdir>

Data preparation

Genome data folder

For each analysis project We need to have a genome folder that was aligned and indexed with STAR. The file should therefore contain

FASTA file
SA, SAIndex, chromosome and Genome files generated by STAR
GFF file (optional, for htseq-count)

RNA Seq FASTQ folders

$ ./submit_dir.sh <genomedata> <data-root> <resultdir>

The submit_dir.sh command takes a data root directory and a result directory as parameters. The data root directory is the directory that contains the directories that contain the FASTQ files. Typically those will follow the naming scheme "R[something]", e.g. "R123" which will contain something like "R123_1.fq.gz" and "R123_2.fq.gz".

The submit_dir.sh file scans the data root directory for every directory that starts with a capital "R", creates a SLURM job for it and submits it to the cluster.

After all the runs are finished, the results are expected in the result directory each in their corresponding "R..." directories.

Optional post-processing: MultiQC

You can run MultiQC to obtain an overview of the resulting run

$ multiqc <resultdir>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STAR/Salmon pipeline for RNA Seq on Slurm cluster

Description

Requirements

Installation

Data preparation

Genome data folder

RNA Seq FASTQ folders

Optional post-processing: MultiQC

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

STAR/Salmon pipeline for RNA Seq on Slurm cluster

Description

Requirements

Installation

Data preparation

Genome data folder

RNA Seq FASTQ folders

Optional post-processing: MultiQC