-
Notifications
You must be signed in to change notification settings - Fork 3
Add native Bruker .d file support for QC statistics #34
Description
Summary
Currently, the MZML_STATISTICS step only processes .mzML files. Bruker .d files are skipped entirely, meaning no QC statistics are generated for timsTOF datasets.
The previous --convert_dotd option (removed in PR #27) converted .d to mzML so statistics could be computed, but this is wasteful — converting hundreds of GB just for QC metrics.
Proposed Solution
Use pyopenms's .d reading capabilities (or another library that supports Bruker .d natively) to compute QC statistics directly from .d files without mzML conversion.
Context
Per @jpfeuffer's review in PR #27:
"I think the number one priority for this should now be to not convert hundreds of gigabytes of .d to mzml just for a bit of qc. Use the new .d reading capabilities of pyopenms if they are sufficient, otherwise there are many other libraries that can read .d these days."
Acceptance Criteria
-
MZML_STATISTICS(or a new module) can compute MS1/MS2 statistics from.dfiles - No mzML conversion required for Bruker data QC
- Output format remains
*_ms_info.parquetfor consistency