Skip to content

Add native Bruker .d file support for QC statistics #34

@ypriverol

Description

@ypriverol

Summary

Currently, the MZML_STATISTICS step only processes .mzML files. Bruker .d files are skipped entirely, meaning no QC statistics are generated for timsTOF datasets.

The previous --convert_dotd option (removed in PR #27) converted .d to mzML so statistics could be computed, but this is wasteful — converting hundreds of GB just for QC metrics.

Proposed Solution

Use pyopenms's .d reading capabilities (or another library that supports Bruker .d natively) to compute QC statistics directly from .d files without mzML conversion.

Context

Per @jpfeuffer's review in PR #27:

"I think the number one priority for this should now be to not convert hundreds of gigabytes of .d to mzml just for a bit of qc. Use the new .d reading capabilities of pyopenms if they are sufficient, otherwise there are many other libraries that can read .d these days."

Acceptance Criteria

  • MZML_STATISTICS (or a new module) can compute MS1/MS2 statistics from .d files
  • No mzML conversion required for Bruker data QC
  • Output format remains *_ms_info.parquet for consistency

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions