quantmsdiann is a bigbio bioinformatics pipeline, built following nf-core guidelines, for quantitative mass spectrometry analysis using DIA-NN. It supports Data-Independent Acquisition (DIA) workflows including label-free, plexDIA (mTRAQ, SILAC, Dimethyl), phosphoproteomics with site localization, and Bruker timsTOF/PASEF data.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a portable manner. It uses Docker/Singularity containers making results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process, making it easy to maintain and update software dependencies.
The pipeline takes SDRF metadata and mass spectrometry data files (.raw, .mzML, .d, .dia) as input and performs:
- Input validation — SDRF parsing and validation via sdrf-pipelines
- File preparation — RAW to mzML conversion (ThermoRawFileParser), indexing, Bruker
.dhandling (tdf2mzml) - In-silico spectral library generation — deep learning-based prediction, or use a user-provided library (
--diann_speclib) - Preliminary analysis — per-file calibration and mass accuracy estimation (parallelized)
- Empirical library assembly — consensus library from preliminary results with RT profiling
- Individual analysis — per-file search with the empirical library (parallelized)
- Final quantification — protein/peptide/gene group matrices with cross-run normalization
- MSstats conversion — DIA-NN report to MSstats-compatible format
- Quality control — interactive QC report via pmultiqc
| Version | Profile | Container | Key features |
|---|---|---|---|
| 1.8.1 (default) | diann_v1_8_1 |
docker.io/biocontainers/diann:v1.8.1_cv1 |
Core DIA analysis, TSV output |
| 2.1.0 | diann_v2_1_0 |
ghcr.io/bigbio/diann:2.1.0 |
Native .raw support, Parquet output |
| 2.2.0 | diann_v2_2_0 |
ghcr.io/bigbio/diann:2.2.0 |
Speed optimizations (up to 1.6x on HPC) |
| 2.3.2 | diann_v2_3_2 |
ghcr.io/bigbio/diann:2.3.2 |
DDA support (beta), InfinDIA, up to 9 var mods |
Switch versions with e.g. -profile diann_v2_2_0,docker. See the DIA-NN Version Selection guide and full parameter reference for details.
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set up Nextflow.
Run with test data:
nextflow run bigbio/quantmsdiann -profile test_dia,docker --outdir resultsRun with your own data:
nextflow run bigbio/quantmsdiann \
--input 'experiment.sdrf.tsv' \
--database 'proteins.fasta' \
--outdir './results' \
-profile dockerRun with a specific DIA-NN version:
nextflow run bigbio/quantmsdiann \
--input 'experiment.sdrf.tsv' \
--database 'proteins.fasta' \
--outdir './results' \
-profile docker,diann_v2_2_0Warning
Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files specified with -c must only be used for tuning process resource specifications, not for defining parameters.
- Usage — How to run the pipeline, input formats, optional outputs, and custom configuration
- Parameters — Complete reference of all pipeline parameters organised by category
- Output — Description of all output files produced by the pipeline
quantmsdiann is developed and maintained by:
- Yasset Perez-Riverol (EMBL-EBI)
- Dai Chengxin (Beijing Proteome Research Center)
- Julianus Pfeuffer (Freie Universitat Berlin)
- Vadim Demichev (Charite Universitaetsmedizin Berlin)
- Qi-Xuan Yue (Chongqing University of Posts and Telecommunications)
If you would like to contribute to this pipeline, please see the contributing guidelines.
If you use quantmsdiann in your research, please cite:
Dai et al. "quantms: a cloud-based pipeline for quantitative proteomics" (2024). DOI: 10.5281/zenodo.15573386
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.