Skip to content

Fix dependency resolution order and reduce transient reconciliation errors#208

Merged
chrisguidry merged 2 commits intomainfrom
199-fix-dependency-reconciliation
Sep 22, 2025
Merged

Fix dependency resolution order and reduce transient reconciliation errors#208
chrisguidry merged 2 commits intomainfrom
199-fix-dependency-reconciliation

Conversation

@chrisguidry
Copy link
Collaborator

Summary

Fixes cascading dependency failures and error noise that occurred when applying Prefect resources simultaneously. The main issue was deployments attempting to sync before their referenced work pools existed in the Prefect API, causing repeated "work pool not found" errors and confusing log spam.

Key Changes

  • Work pool dependency validation: Deployments now check if their referenced work pool exists before attempting sync operations, preventing cascade failures
  • Improved Ready status semantics: Resources are marked Ready when successfully synced to Prefect API, not dependent on Prefect's internal status
  • Faster user feedback: Reduced requeue intervals from 5 minutes to 10 seconds for status updates
  • Clean error handling: 5-second backoff for dependency failures instead of aggressive immediate retries

Testing Methodology

The fix was developed using systematic integration testing:

  1. Reproduction: Used minikube + make install run + deploy/samples/deployment_end-to-end.yaml to reproduce the exact errors described in the issue
  2. Iterative improvement: Applied fixes and re-ran the integration test until all resources reached Ready state with clean logs
  3. Validation: Comprehensive test suite (186 tests) passes with zero failures
  4. Manual verification: Tested both normal edits and breaking changes to ensure proper error handling

Before/After Comparison

Before:

  • Dozens of "port-forwarding failed to become ready" errors from dependent controllers
  • Cascading "Work pool 'process-pool' not found" failures
  • Aggressive 1-second retry loops creating log noise
  • 5-minute delays for status updates

After:

  • Clean dependency ordering prevents cascade failures
  • Targeted error messages with 5-second backoff
  • 10-second status update intervals
  • All resources reach Ready state reliably

Integration Test Results

PrefectServer: Ready immediately
PrefectWorkPool: Ready when workers are available
PrefectDeployment: Ready within ~10 seconds after sync
Clean logs: Minimal error noise with actionable messages
No regressions: All existing functionality preserved

Closes #199

🤖 Generated with Claude Code

…rrors

Addresses cascading dependency failures and error noise that occurred when applying
Prefect resources simultaneously. The main issue was deployments attempting to sync
before their referenced work pools existed in the Prefect API, causing repeated
"work pool not found" errors.

Changes made:
- Add work pool dependency validation in deployment controller before sync attempts
- Improve Ready status semantics to reflect sync success rather than Prefect API status
- Reduce requeue intervals from 5 minutes to 10 seconds for faster user feedback
- Add utility functions for server health checking and backoff strategies

Testing methodology involved systematic reproduction of the issue using minikube +
`make install run` + `deployment_end-to-end.yaml`, then iterative testing to verify
the fix eliminates error spam while maintaining functionality. Integration tests
confirm all resources reach Ready state reliably with clean logs.

Closes #199

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking thought): I wonder if either of these packages would help here to avoid writing our own backoff logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's smart, captured it here: #210

@chrisguidry chrisguidry merged commit f92ef7d into main Sep 22, 2025
3 checks passed
@chrisguidry chrisguidry deleted the 199-fix-dependency-reconciliation branch September 22, 2025 20:48
@mitchnielsen mitchnielsen added the enhancement New feature or request label Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix dependency resolution order and reduce transient errors during resource reconciliation

2 participants