Conversation
There was a problem hiding this comment.
Pull request overview
This PR is a broad update to the Snakemake-based “Dumpling” workflow, expanding configuration/schema support, refactoring processing/analysis scripts (Python + R), and adding substantial automated test coverage for the pipeline components.
Changes:
- Refactors variant-count processing and Rosace execution scripts, adds new utilities for generating Enrich2/MultiQC configs, and adds new variant-generation scripts.
- Updates Snakemake rules, schemas, and conda environments to support new inputs/resources and revised file layouts.
- Adds unit/integration tests plus fixtures (Python
pytestand Rtestthat) to validate parsing and pipeline behavior.
Reviewed changes
Copilot reviewed 53 out of 58 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| workflow/schemas/experiments.schema.yaml | Fixes schema formatting and corrects numeric type (integer). |
| workflow/schemas/config.schema.yaml | Adds new config keys (variants/oligo files, Rosace settings, ASM thresholds, memory knobs). |
| workflow/rules/scripts/run_rosace.R | Major refactor into functions (logging, counts building, Rosace object building, parsing, execution). |
| workflow/rules/scripts/remove_zeros.py | Refactors zero-removal logic into testable functions; improves path handling & tiling support. |
| workflow/rules/scripts/process_variants.py | Adds typing and refactors variant parsing/processing; introduces noprocess flow changes. |
| workflow/rules/scripts/process_oligo_list.py | Switches warnings from print() to logging and adds processing log statements. |
| workflow/rules/scripts/process_counts.py | Refactors into functions and rewires processing loop; aligns outputs with Snakemake rules. |
| workflow/rules/scripts/install_rosace.R | Adds conditional “local rosace” checks vs renv-based installation; improves toolchain checks. |
| workflow/rules/scripts/generate_variants_2.py | Adds an alternate variants-generation implementation (currently appears unused). |
| workflow/rules/scripts/generate_variants.py | Adds a full variants-generation script with circular-genome handling and deduplication. |
| workflow/rules/scripts/generate_multiqc_configs.py | Adds script to generate Enrich2-style config for MultiQC aggregation. |
| workflow/rules/scripts/generate_enrich_configs.py | Refactors Enrich2 config generation to support tiled/untiled experiments consistently. |
| workflow/rules/scripts/generate_baseline_file_list.py | Minor import cleanup (but currently introduces a duplicate import). |
| workflow/rules/scripts/generate_baseline_configs.py | Removes stray whitespace in header/docstring block. |
| workflow/rules/scripts/init.py | Adds package marker for script imports. |
| workflow/rules/rosace.smk | Updates Rosace rule inputs to use Enrich-format TSVs and adds mem resource. |
| workflow/rules/ref.smk | Reworks BBMap index rule to build into a per-experiment directory and passes params to bbmap. |
| workflow/rules/process.smk | Adds generate_variants rule and wires variants file into process_counts/remove_zeros. |
| workflow/rules/map.smk | Switches BBMap mapping to use path= index dir input instead of ref=... directly. |
| workflow/rules/filter.smk | Changes bbtools overwrite behavior to overwrite=false. |
| workflow/rules/enrich.smk | Minor formatting + trailing comma fixes; keeps get_enrich2_input as input provider. |
| workflow/rules/common.smk | Improves sample→file resolution, adds config validation, sets tiling defaults, updates logging call. |
| workflow/rules/baseline_qc.smk | Updates baseline QC inputs to use Enrich-format TSVs. |
| workflow/rules/asm.smk | Adds min_variant_obs parameter pass-through to GATK ASM invocation. |
| workflow/rules/init.py | Adds package marker for rules imports. |
| workflow/envs/rosace.yaml | Pins cmake <3.25 with rationale for CmdStan compatibility. |
| workflow/envs/multiqc-baseline.yaml | Bumps MultiQC version and adds numpy dependency. |
| workflow/Snakefile | Adds numpy import (for workflow/runtime usage). |
| tests/unit/test_remove_zeros.py | Adds unit tests for zero-removal functions and file outputs. |
| tests/unit/test_process_variants.py | Adds unit tests for variant parsing/processing functions and outputs. |
| tests/unit/test_process_counts.py | Adds unit tests for count-processing orchestration via mocking. |
| tests/unit/test_generate_variants.py | Adds unit tests for variants generation helpers and integration-style checks. |
| tests/unit/test_enrich_configs.py | Adds unit tests for Enrich2 config filtering/generation in tiled/untiled modes. |
| tests/unit/init.py | Adds unit-test package marker. |
| tests/r/testthat/test-parse_hgvs.R | Adds R unit tests for HGVS parsing behavior. |
| tests/r/testthat/test-experiment_utils.R | Adds R unit tests for experiment utility functions. |
| tests/r/testthat/test-build_counts.R | Adds R unit tests for joining Enrich-format counts into Rosace-ready matrices. |
| tests/r/testthat/helper-functions.R | Extracts “pure” R helpers for testing (parser, counts builder, experiment helpers). |
| tests/r/testthat.R | Adds a testthat runner script for R tests. |
| tests/integration/test_processing_pipeline.py | Adds integration tests for processing chain and output formats. |
| tests/integration/test_file_resolution.py | Adds integration tests around path patterns, fixtures, and output layout expectations. |
| tests/integration/test_dag_construction.py | Adds Snakemake DAG/dry-run tests and a lightweight runtime smoke test. |
| tests/integration/test_config_validation.py | Adds integration tests for config/fixture consistency and schema expectations. |
| tests/integration/init.py | Adds integration-test package marker. |
| tests/fixtures/mock_variants.csv | Adds mock designed variants fixture. |
| tests/fixtures/mock_reference.fasta | Adds mock reference FASTA fixture. |
| tests/fixtures/mock_oligos.csv | Adds mock oligo CSV fixture. |
| tests/fixtures/mock_gatk_output.variantCounts | Adds mock GATK output fixture. |
| tests/fixtures/mock_experiment.csv | Adds mock experiment definition fixture. |
| tests/fixtures/mock_config.yaml | Adds mock pipeline config fixture. |
| tests/conftest.py | Adds shared pytest fixtures and a Snakemake import shim for script modules. |
| tests/init.py | Adds test-suite package marker. |
| pytest.ini | Adds pytest configuration (markers, discovery, warnings). |
| dumpling_env.yaml | Bumps MultiQC version in the top-level environment definition. |
| .gitignore | Adds caches/venv artifacts to gitignore. |
Comments suppressed due to low confidence (1)
workflow/rules/enrich.smk:19
rule run_enrichrelies onget_enrich2_input, butget_enrich2_input()(incommon.smk) currently points toresults/{experiment_name}/processed_counts/{sample}.tsvwhenremove_zerosis false. The pipeline produces Enrich2 TSVs underprocessed_counts/enrich_format/(orprocessed_counts/removed_zeros/). As-is,run_enrichwill fail to find its inputs unlessget_enrich2_inputis updated to match the actual output paths.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.