Skip to content

Latest commit

 

History

History
27 lines (14 loc) · 10.7 KB

File metadata and controls

27 lines (14 loc) · 10.7 KB
metaLinks
alternates

Genomics

Bulk Sequencing

LogoTitleDescriptionData inputs
STAR Generate Genome Index capsule

Generates necessary files to run STAR RNA alignment.

  • Genome DNA .fasta
  • Genome gene annotation .gtf/.gff
STAR AlignmentRNA-Seq alignment. STAR addresses many of the challenges of RNA-seq data mapping by accounting for spliced alignments. This means that RNA sequences can successfully align to the DNA genome.

  • Short/long read .fastq
  • STAR Index
Salmon Preparing Transcriptome Indices for Mapping-Based ModeGenerates necessary files to run Salmon RNA alignment from genome RNA transcript fasta file and genome DNA genome fasta file.

  • Genome DNA .fasta
  • Transcripts RNA .fasta
Salmon: mapping-based quantificationRNA-Seq quantification. Salmon specifically is designed for speed and is more geared towards quantification of transcripts specifically than precise read alignment.

  • Short/long read .fastq
  • Salmon Index
BWA Generate Genome IndexGenerates necessary files to run BWA DNA alignment from a DNA fasta file.

  • Genome DNA .fasta
BWA MemBWA is a software package for mapping sequences against a large reference genome, such as the human genome.

  • Short/long read .fastq (designed for short reads)
  • BWA Index
Bowtie2 Generate Genome IndexGenerates necessary files to run Bowtie DNA alignment from a DNA fasta file.

  • Genome DNA .fasta
Bowtie2Bowtie is a software package for mapping sequences against a large reference genome, such as the human genome.

  • Short/long read .fastq (designed for short reads)
  • Bowtie2 Index

Single Cell

LogoTitleDescriptionData Inputs
STAR-Solo AlignmentSTAR-Solo analyzes droplet single cell RNA sequencing data for example, 10X Genomics Chromium System. It is intended to be a drop in replacement for CellRanger from 10X.
  • Single cell RNA-seq .fastq
  • STAR Index
RShiny CellShinyCell is an R package that allows users to create interactive Shiny-based web applications to visualize single-cell data.
  • Single cell .rds inputs from Seurat (see README)
1-3. Single Cell Analysis Tutorial (Scanpy & Seurat)

Tutorials to describe working with Single Cell data for Scanpy and Seurat:

1. Preprocessing and clustering 3k PBMCs

2. Core Plotting Functions

3. How to preprocess UMI count data with analytic Pearson residuals


  • Tutorial datasets (see README for details)

4. Single Cell Tutorial Seurat to AnnData (Scanpy) tutorialTutorial demonstrating an example of how a Seurat object can easily be converted to AnnData (Scanpy).
  • Tutorial datasets (see README for details)
5-6. Single Cell Analysis Tutorial (Scanpy)Tutorials demonstrating how to regress cell cycle effect and how to simulate data using a literature-curated boolean gene regulatory network.
  • Tutorial datasets (see README for details)
7-10. Single Cell Analysis Tutorial (Scanpy) AdvancedTutorials for advanced Single Cell processing.
  • Tutorial datasets (see README for details)

Utilities

LogoTitle DescriptionData Inputs
Download data from BaseSpaceDownload demultiplexed (fastq.gz) or raw (bcl) Illumina sequencing data through the Illumina BaseSpace CLI. This capsule requires a BaseSpace account and NGS data owned or shared with the user.

  • None
Sambamba Filtering (Duplicates, Multimappers, Unaligned)Remove optical and PCR duplicates from Illumina data using the software tool Sambamba. Sambamba is intended to be a drop in replacement for Picard MarkDuplicates but more performant.
  • .bam alignment files.
Sambamba Sort and IndexSort and Index Illumina data using the software tool Sambamba. Sambamba is intended to be a drop in replacement for samtools but more performant.
  • .bam alignment files.
Trim GaloreTrim Galore is a wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data.

  • .fastq files
fastpA tool designed to provide fast all-in-one preprocessing for FastQ files (adapter trimming, downsampling etc.). This tool is developed in C++ with multithreading supported to afford high performance.

  • .fastq files

Other

TitleDescriptionInput Data
MACS PeakCallingMACS3 is a peak calling tool generally used on ChIP seq data to identify transcript factor binding sites.
  • .bam alignment files
  • compare_sheet.csv (see README)
featureCountsThis capsule will run featureCounts from the Subreads R package to generate an expression matrix.
  • Gene annotation .gtf file
  • .bam alignments
HOMERHomer contains a useful, all-in-one program for performing peak annotation called annotatePeaks.pl. This capsule uses annotatePeaks.pl to annotate *.bed coordinates with gene features.
  • .bed files containing peaks
  • Genome reference .fasta
  • Gene annotation .gtf file.
Gene Enrichment Analysis (GEA)This capsule presents a user-friendly Streamlit application designed to facilitate gene enrichment analysis. The analysis results are sourced from reliable and widely-used platforms, namely g-profiler and Panther.
  • File containing gene names
GATK RNAseq short variant discovery (SNPs + Indels)Based on GATK RNASeq short variant discovery pipeline. Takes in alignments and outputs vcf containing SNPs and indels.
  • .bam RNA alignments
Delly somatic complete analysisStructural variant (SV) prediction to discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data of somatic cells.
  • Genome reference .fasta
  • .bam DNA alignment files
Delly germline complete analysisStructural variant (SV) prediction to discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data of germline cells.
  • Genome reference .fasta
  • .bam DNA alignment files
ART-Simulation-IlluminaART is a set of simulation tools to generate synthetic next-generation sequencing reads.

  • .fasta containing the sequence to simulate reads from
PySpark and EMR ServerlessThis capsule runs an example PySpark job on EMR Serverless.
  • NOAA Global Surface Summary of Day dataset