bactQC is a comprehensive command-line tool designed for quality control (QC) of bacterial genome data. It integrates multiple QC checks, including Bracken, MLST, CheckM, Assembly Scan, and fastp, to ensure the integrity and quality of your genomic assemblies. Whether you're analyzing a single sample or multiple samples from Bactopia outputs, bactQC provides an efficient and user-friendly interface to streamline your QC workflow.
bactQC is a robust tool for performing quality control on bacterial genome assemblies. By leveraging various bioinformatics tools, it provides a comprehensive assessment of genome quality, ensuring reliable downstream analyses. Key functionalities include:
- Bracken Analysis: Assess the abundance of primary and secondary species.
- MLST Checking: Validate multi-locus sequence typing results.
- CheckM Evaluation: Evaluate genome completeness and contamination.
- Assembly Scan: Analyze assembly metrics like contig counts and N50.
- fastp Assessment: Examine sequencing quality metrics post-filtering.
- Automated QC Pipeline: Run all quality checks with a single command.
- Detailed Reporting: Generates comprehensive TSV reports for results and thresholds.
- Rich Console Output: Enhanced terminal output with tables and emojis for better readability.
- Modular Commands: Execute individual QC checks as needed.
- Customizable Thresholds: Adjust QC parameters to fit specific project requirements.
- Species-Specific Analysis: Tailor QC thresholds based on detected species.
- Python 3.6 or higher
- pip package installer
-
Clone the Repository:
git clone https://github.com/maxlcummins/bactQC.git cd bactQC -
Set Up a Virtual Environment (Optional but Recommended):
python3 -m venv venv source venv/bin/activate -
Install Dependencies:
pip install -r requirements.txt
If
requirements.txtis not provided, install dependencies manually:pip install pandas requests click rich emoji
-
Install bactQC:
pip install .Alternatively, you can run the CLI directly without installation:
python bactQC/cli.py
bactQC expects particular inputs generated by bactopia to run. Generate them as follows. If you want to run bactopia on lots of your own genomes, using a sample sheet is best.
# Run bactopia's main workflow
bactopia -profile test,docker
# Run Bracken module
bactopia -profile docker --wf bracken --kraken2_db /Users/maxcummins/Documents/RDH/databases/k2_plus_pf_16gb --bactopia bactopia
# Run CheckM module
bactopia -profile docker --wf checkm --bactopia bactopia
bactQC provides a command-line interface (CLI) with multiple commands to perform quality control on bacterial genome data.
Run all quality control checks for a specific sample or all samples in the input directory.
bactQC run [OPTIONS]Options:
-
--sample_name TEXT
Name of a sample to analyze.
Example:--sample_name sample1 -
--input_dir PATH
Directory containing Bactopia outputs.
Default:bactopia
Example:--input_dir /path/to/bactopia -
--min_primary_abundance FLOAT
Minimum required abundance for the primary species.
Default:0.80 -
--min_completeness INTEGER
Minimum required completeness threshold.
Default:80 -
--max_contamination INTEGER
Maximum allowed contamination threshold.
Default:10 -
--maximum_contigs INTEGER
Maximum allowed number of contigs.
Default:500 -
--minimum_n50 INTEGER
Minimum required N50 contig length.
Default:15000 -
--min_q30_bases FLOAT
Minimum required proportion of Q30 bases after filtering.
Default:0.90 -
--min_coverage INTEGER
Minimum required coverage after filtering.
Default:30
Analyze a specific sample:
bactQC run --sample_name sample1 --input_dir /path/to/bactopiaAnalyze all samples in the default bactopia directory:
bactQC runbactQC also provides individual commands to perform specific QC checks. This is useful for debugging or when you only need to verify a particular aspect of your data.
Check Bracken results for a sample.
bactQC check_bracken [OPTIONS]Options:
-
--sample_name TEXT
Required. Name of a sample to analyze. -
--input_dir PATH
Directory containing Bactopia outputs.
Default:bactopia -
--min_primary_abundance FLOAT
Minimum required abundance for the primary species.
Default:0.80
Example:
bactQC check_bracken --sample_name sample1Check MLST results for a sample.
bactQC check_mlst [OPTIONS]Options:
-
--sample_name TEXT
Required. Name of a sample to analyze. -
--input_dir PATH
Directory containing Bactopia outputs.
Default:bactopia
Example:
bactQC check_mlst --sample_name sample1Check CheckM results for a sample.
bactQC check_checkm [OPTIONS]Options:
-
--sample_name TEXT
Required. Name of a sample to analyze. -
--input_dir PATH
Directory containing Bactopia outputs.
Default:bactopia -
--min_completeness INTEGER
Minimum required completeness threshold.
Default:80 -
--max_contamination INTEGER
Maximum allowed contamination threshold.
Default:10
Example:
bactQC check_checkm --sample_name sample1Check assembly scan results for a sample.
bactQC check_assembly_scan [OPTIONS]Options:
-
--sample_name TEXT
Required. Name of a sample to analyze. -
--input_dir PATH
Directory containing Bactopia outputs.
Default:bactopia -
--maximum_contigs INTEGER
Maximum allowed number of contigs.
Default:500 -
--minimum_n50 INTEGER
Minimum required N50 contig length.
Default:15000
Example:
bactQC check_assembly_scan --sample_name sample1Check fastp quality control data for a sample.
bactQC check_fastp [OPTIONS]Options:
-
--sample_name TEXT
Required. Name of a sample to analyze. -
--input_dir PATH
Directory containing Bactopia outputs.
Default:bactopia -
--min_q30_bases FLOAT
Minimum required proportion of Q30 bases after filtering.
Default:0.90 -
--min_coverage INTEGER
Minimum required coverage after filtering.
Default:30
Example:
bactQC check_fastp --sample_name sample1bactQC generates two primary output files in TSV (Tab-Separated Values) format for each analyzed sample:
-
QC Results:
- Filename:
<sample_name>_qc_results.tsvorBactQC_results.tsvfor multiple samples. - Contents: Contains the QC results for each sample, including status (Passed/Failed) for individual checks and detected species information.
- Filename:
-
QC Thresholds:
- Filename:
<sample_name>_qc_thresholds.tsvorBactQC_thresholds.tsvfor multiple samples. - Contents: Contains the QC thresholds used for each check, which can be species-specific.
- Filename:
Additionally, individual commands display detailed QC metrics in the terminal using Rich tables with colored formatting and emojis for easy interpretation.
bactQC run --sample_name sample1 --input_dir /path/to/bactopiaOutput:
- Displays ASCII art and a welcome message.
- Runs all QC checks with specified thresholds.
- Generates
sample1_qc_results.tsvandsample1_qc_thresholds.tsv. - Displays summarized QC thresholds and results in the terminal.
bactQC runOutput:
- Processes all samples in the default
bactopiadirectory. - Generates
BactQC_results.tsvandBactQC_thresholds.tsv. - Displays summarized QC thresholds and results for all samples in the terminal.
Check Bracken results:
bactQC check_bracken --sample_name sample1Check MLST results:
bactQC check_mlst --sample_name sample1Check CheckM results:
bactQC check_checkm --sample_name sample1Check Assembly Scan results:
bactQC check_assembly_scan --sample_name sample1Check FastP results:
bactQC check_fastp --sample_name sample1Contributions are welcome! To contribute to bactQC, please follow these steps:
-
Fork the Repository
Click the "Fork" button at the top-right corner of the repository page to create a forked copy of the repository.
-
Clone the Forked Repository
git clone https://github.com/your-username/bactQC.git cd bactQC -
Create a New Branch
git checkout -b feature/YourFeature
-
Make Your Changes
Implement your feature or bug fix. Ensure your code adheres to the project's coding standards.
-
Commit Your Changes
git commit -m "Add feature: YourFeature" -
Push to Your Fork
git push origin feature/YourFeature
-
Create a Pull Request
Navigate to the original repository and create a pull request from your forked repository.
Please ensure your contributions follow the Code of Conduct and Contributing Guidelines if available.
This project is licensed under the MIT License.
For questions, suggestions, or support, please open an issue on GitHub or contact the maintainer:
- Maintainer: Max L. Cummins
- Email: [email protected]
Thank you for using bactQC! 🦠🧬
