Skip to content

Conversation

@Shen-YuFei
Copy link
Collaborator

Enhance mzIdentML Converter with Multi-mzML Support

Summary

Add comprehensive mzIdentML converter with support for multiple mzML file integration and upgrade pyopenms compatibility.

Changes

New Features

  • Add mzIdentML to PSM parquet converter (qpxc convert mzidentml)
  • Support --mzml-folder option for multi-file mzML spectral data integration
  • Auto-detect native ID formats (scan=, cycle=, index=, spectrum=)
  • Case-insensitive mzML file matching with .mzML and .mzML.gz support

Improvements

  • Upgrade pyopenms dependency to 3.5.0+
  • Add backward compatibility for pyopenms < 3.5.0 API changes
  • Update IdXML loading to use new PeptideIdentificationList API

Tests

  • Add unit tests for scan number extraction formats
  • Add tests for multi-mzML folder support

Documentation

  • Add mzIdentML converter documentation to CLI reference

Testing

  • Validated on PXD000923 dataset (117,891 PSMs with spectral data)

Copilot AI review requested due to automatic review settings January 18, 2026 08:29
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 18, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive mzIdentML converter to QPX with support for multiple mzML file integration and upgrades pyopenms compatibility to version 3.5.0+.

Changes:

  • New mzIdentML to PSM parquet converter with multi-mzML folder support
  • pyopenms API compatibility layer for 3.5.0+ with backward compatibility
  • Auto-detection of native ID formats for various instrument vendors
  • SQL query formatting cleanup and whitespace normalization across test files

Reviewed changes

Copilot reviewed 20 out of 24 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
qpx/core/mzidentml.py New mzIdentML parser with multi-mzML support and native ID format detection
qpx/core/openms.py Enhanced OpenMSHandler with native ID pattern auto-detection and multi-pattern support
qpx/core/idxml.py Updated IdXML loader with pyopenms 3.5.0+ PeptideIdentificationList API support
qpx/core/idxml_utils/idxml.py Added backward compatibility for pyopenms < 3.5.0
qpx/commands/convert/mzidentml.py CLI command for mzIdentML conversion with validation
qpx/qpxc.py Registered mzidentml command in CLI
docs/cli-convert.md Comprehensive documentation for mzIdentML converter
tests/core/mzidentml/test_mzidentml.py Test suite covering parser, scan extraction, and multi-mzML features
tests/examples/mzidentml/test_sample.mzid Test fixture with valid mzIdentML structure
Multiple test files SQL query formatting and whitespace cleanup

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

def _parse_modification(self, mod_elem, sequence: str) -> Optional[Dict]:
"""Parse a single Modification element"""
location = mod_elem.get("location")
residues = mod_elem.get("residues", "")
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable residues is not used.

Suggested change
residues = mod_elem.get("residues", "")

Copilot uses AI. Check for mistakes.
if sii.get("calculatedMassToCharge")
else None
)
rank = int(sii.get("rank", 1))
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable rank is not used.

Suggested change
rank = int(sii.get("rank", 1))

Copilot uses AI. Check for mistakes.
@Shen-YuFei Shen-YuFei force-pushed the dev branch 4 times, most recently from 15ab654 to a0702f5 Compare January 18, 2026 10:33
@ypriverol ypriverol merged commit 5c826af into bigbio:dev Jan 20, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants