Skip to content

baliga-lab/rnaseq_starsalmon_slurm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STAR/Salmon pipeline for RNA Seq on Slurm cluster

Description

This is a RNA Seq pipeline based on STAR and Salmon designed to run on Slurm clusters

Requirements

Installation

This set of script files is designed to be executed in the same directory. From the directory these files are located in, you can execute

$ ./submit_dir.sh <genomedata> <data-root> <resultdir>

Data preparation

Genome data folder

For each analysis project We need to have a genome folder that was aligned and indexed with STAR. The file should therefore contain

  • FASTA file
  • SA, SAIndex, chromosome and Genome files generated by STAR
  • GFF file (optional, for htseq-count)

RNA Seq FASTQ folders

$ ./submit_dir.sh <genomedata> <data-root> <resultdir>

The submit_dir.sh command takes a data root directory and a result directory as parameters. The data root directory is the directory that contains the directories that contain the FASTQ files. Typically those will follow the naming scheme "R[something]", e.g. "R123" which will contain something like "R123_1.fq.gz" and "R123_2.fq.gz".

The submit_dir.sh file scans the data root directory for every directory that starts with a capital "R", creates a SLURM job for it and submits it to the cluster.

After all the runs are finished, the results are expected in the result directory each in their corresponding "R..." directories.

Optional post-processing: MultiQC

You can run MultiQC to obtain an overview of the resulting run

$ multiqc <resultdir>

About

RNA-Seq pipeline based on STAR Salmon for SLURM clusters

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors