Skip to content

Latest commit

 

History

History
67 lines (45 loc) · 1.79 KB

File metadata and controls

67 lines (45 loc) · 1.79 KB

STAR/Salmon pipeline for RNA Seq on Slurm cluster

Description

This is a RNA Seq pipeline based on STAR and Salmon designed to run on Slurm clusters

Requirements

Installation

This set of script files is designed to be executed in the same directory. From the directory these files are located in, you can execute

$ ./submit_dir.sh <genomedata> <data-root> <resultdir>

Data preparation

Genome data folder

For each analysis project We need to have a genome folder that was aligned and indexed with STAR. The file should therefore contain

  • FASTA file
  • SA, SAIndex, chromosome and Genome files generated by STAR
  • GFF file (optional, for htseq-count)

RNA Seq FASTQ folders

$ ./submit_dir.sh <genomedata> <data-root> <resultdir>

The submit_dir.sh command takes a data root directory and a result directory as parameters. The data root directory is the directory that contains the directories that contain the FASTQ files. Typically those will follow the naming scheme "R[something]", e.g. "R123" which will contain something like "R123_1.fq.gz" and "R123_2.fq.gz".

The submit_dir.sh file scans the data root directory for every directory that starts with a capital "R", creates a SLURM job for it and submits it to the cluster.

After all the runs are finished, the results are expected in the result directory each in their corresponding "R..." directories.

Optional post-processing: MultiQC

You can run MultiQC to obtain an overview of the resulting run

$ multiqc <resultdir>