Skip to content

Conversation

@smahima27
Copy link
Contributor

  • Implement dead-letter queue (DLQ) to capture failed VM operations
  • Implement auto-purge to clean up stale queue entries
  • Implement health checks to monitor queue health
  • Add comprehensive tests and documentation

Features:

  • DLQ captures failures from pending, clone, and ready queues
  • Auto-purge removes stale VMs with configurable thresholds
  • Health checks expose metrics for monitoring and alerting
  • All features opt-in via configuration (backward compatible)

- Implement dead-letter queue (DLQ) to capture failed VM operations
- Implement auto-purge to clean up stale queue entries
- Implement health checks to monitor queue health
- Add comprehensive tests and documentation

Features:
- DLQ captures failures from pending, clone, and ready queues
- Auto-purge removes stale VMs with configurable thresholds
- Health checks expose metrics for monitoring and alerting
- All features opt-in via configuration (backward compatible)
@smahima27 smahima27 requested a review from a team as a code owner December 19, 2025 07:47
- Add skip_metrics parameter to move_to_dlq to avoid double-counting when called from purge
- Fix purge_pending_queue to only increment count when not in dry-run mode
- Add nil check for config redis before accessing data_ttl
- Update health check tests to allow all gauge calls before checking specific metrics
- Reorder push_health_metrics to emit error/queue/task metrics before status

All 851 tests now pass including 40 queue reliability tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants