Skip to content

Conversation

@jpshackelford
Copy link
Contributor

@jpshackelford jpshackelford commented Jan 9, 2026

Summary

Fixes test isolation issues and replaces --forked with pytest-xdist for accurate coverage reporting in CI.

Problem

The --forked pytest flag was causing two issues:

  1. Artificially low coverage: Coverage data from forked child processes was not being collected, resulting in near-0% coverage reports
  2. Test pollution: Local classes defined inside test functions could not be properly pickled/unpickled across process boundaries, causing test failures when running in parallel

Solution

1. Move local test classes to module level

Moved classes that are defined inside test functions to module level with underscore prefixes to indicate they are test-internal:

  • tests/sdk/conversation/local/test_state_serialization.py: _DifferentAgentForVerifyTest
  • tests/sdk/llm/test_reasoning_content.py: _TestActionForReasoningContent
  • tests/sdk/tool/test_schema_immutability.py: _SchemaImmutabilityCustomAction, _SchemaImmutabilityCustomObservation

2. Replace --forked with -n auto

Switched from pytest-forked to pytest-xdist for parallel test execution:

- CI=true uv run python -m pytest -vvs --forked ...
+ CI=true uv run python -m pytest -vvs -n auto ...

This provides:

  • ✅ Parallel test execution (same benefit as --forked)
  • ✅ Proper coverage collection from all worker processes
  • ✅ Better test isolation through process separation

Testing

  • All 1679 tests pass with pytest -n auto
  • Coverage data is now properly collected

Related Issues


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d615f5b-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d615f5b-python \
  ghcr.io/openhands/agent-server:d615f5b-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d615f5b-golang-amd64
ghcr.io/openhands/agent-server:d615f5b-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d615f5b-golang-arm64
ghcr.io/openhands/agent-server:d615f5b-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d615f5b-java-amd64
ghcr.io/openhands/agent-server:d615f5b-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d615f5b-java-arm64
ghcr.io/openhands/agent-server:d615f5b-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d615f5b-python-amd64
ghcr.io/openhands/agent-server:d615f5b-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:d615f5b-python-arm64
ghcr.io/openhands/agent-server:d615f5b-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:d615f5b-golang
ghcr.io/openhands/agent-server:d615f5b-java
ghcr.io/openhands/agent-server:d615f5b-python

About Multi-Architecture Support

  • Each variant tag (e.g., d615f5b-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., d615f5b-python-amd64) are also available if needed

…compatibility

Move local class definitions from inside test functions to module level to
prevent test pollution when running tests in parallel with pytest-xdist.

When classes like DifferentAgent, TestAction, etc. are defined inside test
functions, they get registered with Pydantic's discriminated union system but
are not importable, causing failures in other tests running in the same worker.

Files changed:
- test_state_serialization.py: Move _DifferentAgentForVerifyTest to module level
- test_reasoning_content.py: Move _TestActionForReasoningContent to module level
- test_schema_immutability.py: Move _SchemaImmutabilityCustomAction and
  _SchemaImmutabilityCustomObservation to module level

This fixes test pollution that was masked by --forked flag, enabling the
switch to pytest-xdist for accurate coverage reporting.

Fixes #1656

Co-authored-by: openhands <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
TOTAL15587488168% 
report-only-changed-files is enabled. No files were changed during this commit :)

The --forked flag prevents coverage collection from child processes,
resulting in artificially low coverage reports (near 0% in some cases).

Switching to pytest-xdist (-n auto) provides:
- Parallel test execution (same benefit as --forked)
- Proper coverage collection from all worker processes
- Better test isolation through process separation

This change affects sdk-tests, tools-tests, and agent-server-tests.

Closes #1656

Co-authored-by: openhands <[email protected]>
@jpshackelford jpshackelford changed the title fix(tests): Move local test classes to module level for pytest-xdist compatibility fix(tests): Enable accurate coverage reporting with pytest-xdist Jan 9, 2026
…lity

Another local class in test_event_immutability.py was causing test pollution
when running tests in parallel. Moved _TestEventForImmutability to module level.

Co-authored-by: openhands <[email protected]>
…dist

Additional local classes causing test pollution when running in parallel:
- _UnknownEventForVisualizerTest in test_visualizer.py
- _NestedActionForMalformedArgs in test_fix_malformed_tool_arguments.py
- _ChildMCPToolActionForSerialization in test_mcp_action_serialization.py

Co-authored-by: openhands <[email protected]>
@openhands-ai
Copy link

openhands-ai bot commented Jan 9, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Run tests
    • Agent Server

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1658 at branch `fix/test-isolation-module-level-agent`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@jpshackelford
Copy link
Contributor Author

With this fix coverage reported in CI is now not the incorrect 52% but the more accurate 77%.

jpshackelford and others added 3 commits January 9, 2026 01:24
Terminal tests use shared paths (/tmp/test_dir) and subprocess management
that conflicts when run in parallel with pytest-xdist.

Co-authored-by: openhands <[email protected]>
@jpshackelford
Copy link
Contributor Author

We had to put --forked back in for tools-tests because they were hanging likely because the terminal tests have conflicts when run in parallel:

  • They use hardcoded paths like /tmp/test_dir which clash between parallel workers
    *They spawn subprocesses and tmux sessions that may interfere with each other

As a consequence coverage dropped back down to 68% but that is still closer to accurate than the 52% we were seeing before.

@jpshackelford jpshackelford marked this pull request as ready for review January 9, 2026 07:17
@xingyaoww xingyaoww added the review-this This label triggers a PR review by OpenHands label Jan 9, 2026
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good! The dependencies are properly configured with pytest-cov>=5.0.0 and pytest-xdist>=3.6.0.

Suggested verification:

  • Compare coverage reports before/after this change to confirm the improvement
  • Ensure no tests are being skipped due to xdist compatibility issues
  • Verify that all 1679 tests pass consistently with -n auto

2. Test Parallel Execution Safety

With pytest-xdist, tests run in parallel across multiple workers. Verify that tests don't have race conditions from:

  • Shared file system access (temp files, test fixtures)
  • Environment variable modifications
  • Shared database connections
  • Global state mutations

3. Consider Coverage Configuration

For optimal pytest-xdist coverage collection, consider adding explicit coverage configuration:

[tool.coverage.run]
parallel = true
concurrency = ["multiprocessing"]

This ensures coverage data from parallel workers is properly combined.

✨ Positive Observations

  1. Consistent naming convention: All moved classes use underscore prefix (_ClassName) clearly indicating they're internal test utilities
  2. Excellent documentation: Each moved class has a clear docstring explaining why it's at module level
  3. Minimal changes: The refactoring is focused and doesn't introduce unnecessary modifications
  4. Proper dependency management: Both pytest-xdist and pytest-forked are properly declared in dependencies
  5. Clear comments in workflow: The workflow changes include helpful comments explaining the rationale

📋 Summary

Issues requiring attention:

  • Minor: Import inconsistency in test_reasoning_content.py (low priority, style issue)

Recommendations:

  • Verify coverage improvement claims
  • Test for parallel execution race conditions
  • Consider adding explicit coverage.run configuration for multiprocessing

The PR is well-structured and should achieve its stated goals of fixing test isolation and enabling accurate coverage reporting. The changes are safe to merge after addressing the minor import inconsistency (if desired) and performing the recommended verification steps.

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@xingyaoww xingyaoww enabled auto-merge (squash) January 9, 2026 16:30
@xingyaoww xingyaoww merged commit 013583f into main Jan 9, 2026
21 checks passed
@xingyaoww xingyaoww deleted the fix/test-isolation-module-level-agent branch January 9, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-this This label triggers a PR review by OpenHands

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci: Replace --forked with pytest-xdist for accurate coverage reporting

5 participants