27 lines (14 loc) · 10.7 KB

metaLinks

alternates

https://app.gitbook.com/s/PvA82xvbvyt7rVs0IKXN/code-ocean-apps/genomics

Genomics

Bulk Sequencing

Logo	Title	Description	Data inputs
	STAR Generate Genome Index capsule	Generates necessary files to run STAR RNA alignment.	Genome DNA .fasta Genome gene annotation .gtf/.gff
	STAR Alignment	RNA-Seq alignment. STAR addresses many of the challenges of RNA-seq data mapping by accounting for spliced alignments. This means that RNA sequences can successfully align to the DNA genome.	Short/long read .fastq STAR Index
	Salmon Preparing Transcriptome Indices for Mapping-Based Mode	Generates necessary files to run Salmon RNA alignment from genome RNA transcript fasta file and genome DNA genome fasta file.	Genome DNA .fasta Transcripts RNA .fasta
	Salmon: mapping-based quantification	RNA-Seq quantification. Salmon specifically is designed for speed and is more geared towards quantification of transcripts specifically than precise read alignment.	Short/long read .fastq Salmon Index
	BWA Generate Genome Index	Generates necessary files to run BWA DNA alignment from a DNA fasta file.	Genome DNA .fasta
	BWA Mem	BWA is a software package for mapping sequences against a large reference genome, such as the human genome.	Short/long read .fastq (designed for short reads) BWA Index
	Bowtie2 Generate Genome Index	Generates necessary files to run Bowtie DNA alignment from a DNA fasta file.	Genome DNA .fasta
	Bowtie2	Bowtie is a software package for mapping sequences against a large reference genome, such as the human genome.	Short/long read .fastq (designed for short reads) Bowtie2 Index

Single Cell

Logo	Title	Description	Data Inputs
	STAR-Solo Alignment	STAR-Solo analyzes droplet single cell RNA sequencing data for example, 10X Genomics Chromium System. It is intended to be a drop in replacement for CellRanger from 10X.	Single cell RNA-seq .fastq STAR Index
	RShiny Cell	ShinyCell is an R package that allows users to create interactive Shiny-based web applications to visualize single-cell data.	Single cell .rds inputs from Seurat (see README)
	1-3. Single Cell Analysis Tutorial (Scanpy & Seurat)	Tutorials to describe working with Single Cell data for Scanpy and Seurat: 1. Preprocessing and clustering 3k PBMCs 2. Core Plotting Functions 3. How to preprocess UMI count data with analytic Pearson residuals	Tutorial datasets (see README for details)
	4. Single Cell Tutorial Seurat to AnnData (Scanpy) tutorial	Tutorial demonstrating an example of how a Seurat object can easily be converted to AnnData (Scanpy).	Tutorial datasets (see README for details)
	5-6. Single Cell Analysis Tutorial (Scanpy)	Tutorials demonstrating how to regress cell cycle effect and how to simulate data using a literature-curated boolean gene regulatory network.	Tutorial datasets (see README for details)
	7-10. Single Cell Analysis Tutorial (Scanpy) Advanced	Tutorials for advanced Single Cell processing.	Tutorial datasets (see README for details)

Utilities

Logo	Title	Description	Data Inputs
	Download data from BaseSpace	Download demultiplexed (fastq.gz) or raw (bcl) Illumina sequencing data through the Illumina BaseSpace CLI. This capsule requires a BaseSpace account and NGS data owned or shared with the user.	None
	Sambamba Filtering (Duplicates, Multimappers, Unaligned)	Remove optical and PCR duplicates from Illumina data using the software tool Sambamba. Sambamba is intended to be a drop in replacement for Picard MarkDuplicates but more performant.	.bam alignment files.
	Sambamba Sort and Index	Sort and Index Illumina data using the software tool Sambamba. Sambamba is intended to be a drop in replacement for samtools but more performant.	.bam alignment files.
	Trim Galore	Trim Galore is a wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data.	.fastq files
	fastp	A tool designed to provide fast all-in-one preprocessing for FastQ files (adapter trimming, downsampling etc.). This tool is developed in C++ with multithreading supported to afford high performance.	.fastq files

Other

	Title	Description	Input Data
	MACS PeakCalling	MACS3 is a peak calling tool generally used on ChIP seq data to identify transcript factor binding sites.	.bam alignment files compare_sheet.csv (see README)
	featureCounts	This capsule will run featureCounts from the Subreads R package to generate an expression matrix.	Gene annotation .gtf file .bam alignments
	HOMER	Homer contains a useful, all-in-one program for performing peak annotation called annotatePeaks.pl. This capsule uses annotatePeaks.pl to annotate *.bed coordinates with gene features.	.bed files containing peaks Genome reference .fasta Gene annotation .gtf file.
	Gene Enrichment Analysis (GEA)	This capsule presents a user-friendly Streamlit application designed to facilitate gene enrichment analysis. The analysis results are sourced from reliable and widely-used platforms, namely g-profiler and Panther.	File containing gene names
	GATK RNAseq short variant discovery (SNPs + Indels)	Based on GATK RNASeq short variant discovery pipeline. Takes in alignments and outputs vcf containing SNPs and indels.	.bam RNA alignments
	Delly somatic complete analysis	Structural variant (SV) prediction to discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data of somatic cells.	Genome reference .fasta .bam DNA alignment files
	Delly germline complete analysis	Structural variant (SV) prediction to discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data of germline cells.	Genome reference .fasta .bam DNA alignment files
	ART-Simulation-Illumina	ART is a set of simulation tools to generate synthetic next-generation sequencing reads.	.fasta containing the sequence to simulate reads from
	PySpark and EMR Serverless	This capsule runs an example PySpark job on EMR Serverless.	NOAA Global Surface Summary of Day dataset