Skip to content

Big ol PR - fixing bugs, adding tests, refactor, etc etc....#29

Open
odcambc wants to merge 50 commits intomainfrom
testing
Open

Big ol PR - fixing bugs, adding tests, refactor, etc etc....#29
odcambc wants to merge 50 commits intomainfrom
testing

Conversation

@odcambc
Copy link
Owner

@odcambc odcambc commented Mar 12, 2026

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a broad update to the Snakemake-based “Dumpling” workflow, expanding configuration/schema support, refactoring processing/analysis scripts (Python + R), and adding substantial automated test coverage for the pipeline components.

Changes:

  • Refactors variant-count processing and Rosace execution scripts, adds new utilities for generating Enrich2/MultiQC configs, and adds new variant-generation scripts.
  • Updates Snakemake rules, schemas, and conda environments to support new inputs/resources and revised file layouts.
  • Adds unit/integration tests plus fixtures (Python pytest and R testthat) to validate parsing and pipeline behavior.

Reviewed changes

Copilot reviewed 53 out of 58 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
workflow/schemas/experiments.schema.yaml Fixes schema formatting and corrects numeric type (integer).
workflow/schemas/config.schema.yaml Adds new config keys (variants/oligo files, Rosace settings, ASM thresholds, memory knobs).
workflow/rules/scripts/run_rosace.R Major refactor into functions (logging, counts building, Rosace object building, parsing, execution).
workflow/rules/scripts/remove_zeros.py Refactors zero-removal logic into testable functions; improves path handling & tiling support.
workflow/rules/scripts/process_variants.py Adds typing and refactors variant parsing/processing; introduces noprocess flow changes.
workflow/rules/scripts/process_oligo_list.py Switches warnings from print() to logging and adds processing log statements.
workflow/rules/scripts/process_counts.py Refactors into functions and rewires processing loop; aligns outputs with Snakemake rules.
workflow/rules/scripts/install_rosace.R Adds conditional “local rosace” checks vs renv-based installation; improves toolchain checks.
workflow/rules/scripts/generate_variants_2.py Adds an alternate variants-generation implementation (currently appears unused).
workflow/rules/scripts/generate_variants.py Adds a full variants-generation script with circular-genome handling and deduplication.
workflow/rules/scripts/generate_multiqc_configs.py Adds script to generate Enrich2-style config for MultiQC aggregation.
workflow/rules/scripts/generate_enrich_configs.py Refactors Enrich2 config generation to support tiled/untiled experiments consistently.
workflow/rules/scripts/generate_baseline_file_list.py Minor import cleanup (but currently introduces a duplicate import).
workflow/rules/scripts/generate_baseline_configs.py Removes stray whitespace in header/docstring block.
workflow/rules/scripts/init.py Adds package marker for script imports.
workflow/rules/rosace.smk Updates Rosace rule inputs to use Enrich-format TSVs and adds mem resource.
workflow/rules/ref.smk Reworks BBMap index rule to build into a per-experiment directory and passes params to bbmap.
workflow/rules/process.smk Adds generate_variants rule and wires variants file into process_counts/remove_zeros.
workflow/rules/map.smk Switches BBMap mapping to use path= index dir input instead of ref=... directly.
workflow/rules/filter.smk Changes bbtools overwrite behavior to overwrite=false.
workflow/rules/enrich.smk Minor formatting + trailing comma fixes; keeps get_enrich2_input as input provider.
workflow/rules/common.smk Improves sample→file resolution, adds config validation, sets tiling defaults, updates logging call.
workflow/rules/baseline_qc.smk Updates baseline QC inputs to use Enrich-format TSVs.
workflow/rules/asm.smk Adds min_variant_obs parameter pass-through to GATK ASM invocation.
workflow/rules/init.py Adds package marker for rules imports.
workflow/envs/rosace.yaml Pins cmake <3.25 with rationale for CmdStan compatibility.
workflow/envs/multiqc-baseline.yaml Bumps MultiQC version and adds numpy dependency.
workflow/Snakefile Adds numpy import (for workflow/runtime usage).
tests/unit/test_remove_zeros.py Adds unit tests for zero-removal functions and file outputs.
tests/unit/test_process_variants.py Adds unit tests for variant parsing/processing functions and outputs.
tests/unit/test_process_counts.py Adds unit tests for count-processing orchestration via mocking.
tests/unit/test_generate_variants.py Adds unit tests for variants generation helpers and integration-style checks.
tests/unit/test_enrich_configs.py Adds unit tests for Enrich2 config filtering/generation in tiled/untiled modes.
tests/unit/init.py Adds unit-test package marker.
tests/r/testthat/test-parse_hgvs.R Adds R unit tests for HGVS parsing behavior.
tests/r/testthat/test-experiment_utils.R Adds R unit tests for experiment utility functions.
tests/r/testthat/test-build_counts.R Adds R unit tests for joining Enrich-format counts into Rosace-ready matrices.
tests/r/testthat/helper-functions.R Extracts “pure” R helpers for testing (parser, counts builder, experiment helpers).
tests/r/testthat.R Adds a testthat runner script for R tests.
tests/integration/test_processing_pipeline.py Adds integration tests for processing chain and output formats.
tests/integration/test_file_resolution.py Adds integration tests around path patterns, fixtures, and output layout expectations.
tests/integration/test_dag_construction.py Adds Snakemake DAG/dry-run tests and a lightweight runtime smoke test.
tests/integration/test_config_validation.py Adds integration tests for config/fixture consistency and schema expectations.
tests/integration/init.py Adds integration-test package marker.
tests/fixtures/mock_variants.csv Adds mock designed variants fixture.
tests/fixtures/mock_reference.fasta Adds mock reference FASTA fixture.
tests/fixtures/mock_oligos.csv Adds mock oligo CSV fixture.
tests/fixtures/mock_gatk_output.variantCounts Adds mock GATK output fixture.
tests/fixtures/mock_experiment.csv Adds mock experiment definition fixture.
tests/fixtures/mock_config.yaml Adds mock pipeline config fixture.
tests/conftest.py Adds shared pytest fixtures and a Snakemake import shim for script modules.
tests/init.py Adds test-suite package marker.
pytest.ini Adds pytest configuration (markers, discovery, warnings).
dumpling_env.yaml Bumps MultiQC version in the top-level environment definition.
.gitignore Adds caches/venv artifacts to gitignore.
Comments suppressed due to low confidence (1)

workflow/rules/enrich.smk:19

  • rule run_enrich relies on get_enrich2_input, but get_enrich2_input() (in common.smk) currently points to results/{experiment_name}/processed_counts/{sample}.tsv when remove_zeros is false. The pipeline produces Enrich2 TSVs under processed_counts/enrich_format/ (or processed_counts/removed_zeros/). As-is, run_enrich will fail to find its inputs unless get_enrich2_input is updated to match the actual output paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants