A generalized workflow for processing 16S rRNA gene amplicon data using the DADA2 package in R. This workflow identifies exact amplicon sequence variants (ASVs) with higher resolution than traditional OTU-based methods.
This repository contains an RMarkdown workflow that implements the complete DADA2 pipeline:
- Quality filtering - Remove low-quality reads
- Error rate learning - Learn error rates from the data
- Dereplication - Combine identical reads
- Sample inference - Apply the DADA2 algorithm
- Merger of paired-end reads - Align and merge paired reads
- Chimera removal - Remove artifactual sequences
- Taxonomic assignment - Classify ASVs
- Output generation - Create tables and visualizations
- R ≥ 4.0.0
- Required R packages:
- dada2
- ggplot2
- phyloseq
- Biostrings
- ShortRead
- tidyverse
-
Clone this repository:
git clone https://github.com/yourusername/dada2-workflow.git cd dada2-workflow
-
Install required R packages:
install.packages(c("ggplot2", "tidyverse")) if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(c("dada2", "phyloseq", "Biostrings", "ShortRead"))
-
Reference database:
- The workflow uses the Silva database (v138.1) for taxonomy assignments
- The database is either accessed from the DADA2 package or downloaded from Zenodo
- Both genus-level and species-level assignments are performed
-
Open
dada2_workflow.Rmd
in RStudio and update the following:- Path to your sequence files (supports both
_R1_001.fastq.gz
and_R1.fastq.gz
naming conventions) - Read trimming parameters based on your data quality
- Path to your sample metadata file
- Path to your sequence files (supports both
-
Execute the workflow by running all code chunks in the RMarkdown document.
-
View results in the interactive dashboard:
# Install required packages if needed install.packages(c("shiny", "shinydashboard", "plotly", "DT", "vegan", "viridis")) # Make sure phyloseq is installed if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("phyloseq") # Run the dashboard (will open in your browser) source("run_dashboard.R") # If the dashboard doesn't open automatically, you can access it at: # http://127.0.0.1:4321 # Note: If you encounter a greyed-out dashboard, try restarting R # and running the dashboard again
The main parameters to adjust in the workflow:
- Read trimming (
truncLen
,maxEE
): Modify based on your sequencing quality - Taxonomy thresholds: Adjust confidence thresholds for taxonomic assignments
- Visualization options: Adjust plotting parameters as needed
The workflow produces several output files in the results/
directory:
seqtab_nochim.csv
: ASV count tabletaxonomy.csv
: Taxonomic assignments for each ASV (with species-level assignments)phyloseq_object.rds
: R object for downstream analysisASVs.fasta
: FASTA file containing ASV sequences
The included Shiny dashboard (dashboard.R
) provides interactive visualization of DADA2 results:
- Overview: Quick summary statistics and sample metrics
- Sample Quality: Read count distributions and filtering statistics
- Alpha Diversity: Shannon, Simpson, Observed, and Chao1 diversity metrics
- Beta Diversity: PCoA, NMDS, t-SNE, and UMAP ordinations with PERMANOVA
- Taxonomy: Interactive bar plots and heatmaps of taxonomic composition
- ASV Table: Browse ASV sequences and abundance data
- Differential Abundance: Identify taxa that differ between sample groups using DESeq2 or ALDEx2
Launch the dashboard by running:
source("run_dashboard.R")
Contributions to improve this workflow are welcome. Please feel free to submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP (2016). "DADA2: High-resolution sample inference from Illumina amplicon data." Nature Methods, 13, 581-583. doi: 10.1038/nmeth.3869
- McMurdie PJ, Holmes S (2013). "phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data." PLoS ONE, 8(4):e61217
- Quast C, et al. (2013). "The SILVA ribosomal RNA gene database project: improved data processing and web-based tools." Nucleic Acids Research, 41(D1), D590-D596. doi: 10.1093/nar/gks1219