Skip to content

Conversation

@ldalcolmo
Copy link
Contributor

Description

This PR fixes Issue #19 by adding native support for dataset collections in Galaxy MCP, enabling agent workflows to correctly discover, inspect, and navigate histories that contain collections.

It introduces:

  • Proper exposure of dataset collections in get_history_contents()
  • A new MCP tool get_collection_details() for inspecting collection structure and members
  • A robust, API-based guard in get_dataset_details() to prevent agents from treating collections as datasets

These changes make Galaxy MCP fully usable in agent-native environments where histories frequently contain list, paired, or nested dataset collections.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📚 Documentation update
  • ⬆️ Dependency update
  • 🧰 Maintenance/chore

Checklist

  • I have performed a self-review of my code
  • I have added tests for my changes
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Related Issues

Closes #19

@ldalcolmo
Copy link
Contributor Author

This PR fixes Issue #19 and enables robust handling of dataset collections in Galaxy MCP, which is critical for agent-native workflows and LLM integrations.

What it does:

  • get_history_contents() now returns both datasets and dataset collections with a clear history_content_type field.
  • Added the new get_collection_details() tool, which returns normalized collection and member dataset info for agent traversal.
  • get_dataset_details() now includes an API-based guard to detect collection IDs and provide a helpful error message, avoiding fragile text parsing.

Tests included:

  • Mixed history contents
  • Truncation for large collections
  • Correct guard behavior when passing a collection ID
  • All existing tests remain untouched and pass

This change is backward-compatible and purely additive, and should help unblock agent workflows that need to navigate collections.

Happy to make any adjustments if needed! 🙌

@bgruening
Copy link
Member

Welcome @ldalcolmo! I started the CI :)

@ldalcolmo
Copy link
Contributor Author

Welcome @ldalcolmo! I started the CI :)

Thanks @bgruening!
I cleaned up all ruff/E501 and formatting issues in a few follow-up commits so CI should be green now.
The functional changes are in the initial commit (collections + guards + new MCP tool), the others are formatting only.
Happy to adjust anything if you have suggestions.

@dannon
Copy link
Member

dannon commented Jan 5, 2026

Thanks for this contribution! Adding collection support is a welcome improvement!

A couple of things I noticed:

  1. Performance consideration: The change from gi.datasets.get_datasets() (server-side pagination) to gi.histories.show_history(contents=True) means we now fetch all items and paginate client-side. For large histories this could be slower, though I understand it's necessary to include collections. Might be worth a comment noting this tradeoff.

  2. Unused parameter: The details parameter in get_history_contents() is no longer used in the new implementation

  3. Minor test nit: A few test methods have their docstrings after the first line of code rather than immediately after the signature (e.g., test_get_collection_details_list_collection).

The new get_collection_details() tool looks well-structured, and the error handling that detects when someone passes a collection ID to get_dataset_details() is a nice touch for agent UX.

@ldalcolmo
Copy link
Contributor Author

Thanks a lot for the review and the kind words! 🙏
Great points.

Performance: agreed — switching from gi.datasets.get_datasets() (server-side pagination) to show_history(contents=True) does fetch all items and paginates client-side. I’ll add an explicit comment/docstring note explaining the tradeoff and why it’s needed to include dataset collections.

Unused details parameter: good catch — it’s now redundant in the new implementation. I’ll remove it (and update call sites/tests accordingly).

Test docstring placement: also agreed — I’ll move those docstrings to be the first statement in the test methods.

I’ll push a small follow-up commit with these changes shortly. Will ping you once CI is green again.

Thanks for this contribution! Adding collection support is a welcome improvement!

A couple of things I noticed:

  1. Performance consideration: The change from gi.datasets.get_datasets() (server-side pagination) to gi.histories.show_history(contents=True) means we now fetch all items and paginate client-side. For large histories this could be slower, though I understand it's necessary to include collections. Might be worth a comment noting this tradeoff.
  2. Unused parameter: The details parameter in get_history_contents() is no longer used in the new implementation
  3. Minor test nit: A few test methods have their docstrings after the first line of code rather than immediately after the signature (e.g., test_get_collection_details_list_collection).

The new get_collection_details() tool looks well-structured, and the error handling that detects when someone passes a collection ID to get_dataset_details() is a nice touch for agent UX.

Copy link
Member

@dannon dannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@ldalcolmo
Copy link
Contributor Author

Thanks a lot for the review and approval!
Really appreciate the feedback.

If there are other MCP or Galaxy integration issues where I can help, feel free to point me to them — happy to contribute.

@dannon dannon merged commit 508a6b0 into galaxyproject:main Jan 13, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When encountering dataset collections in history, agents use get_dataset_details with unexpected outputs

3 participants