Skip to content

fix(poller): use sleep_coalesce for housekeeping loops#69

Closed
HonestMajority wants to merge 4 commits intomainfrom
task/fix-manual-clock-freeze-019d243d
Closed

fix(poller): use sleep_coalesce for housekeeping loops#69
HonestMajority wants to merge 4 commits intomainfrom
task/fix-manual-clock-freeze-019d243d

Conversation

@HonestMajority
Copy link
Copy Markdown
Contributor

Summary

  • Switch keep-alive heartbeat and lost-job detector sleeps from sleep() to sleep_coalesce() so they actually wake when the global manual clock is advanced
  • Bump es-entity from 0.10 to 0.10.31 and adapt to its breaking API changes (per-operation error types, maybe_now(), EntityHydrationError, DbOp-based transaction handling)
  • Remove the now-dropped es-entity/sim-time feature flag from the sim-time feature

Closes GaloyMoney/volcano-wip#730

Detail

When lana-bank runs with time.mode=manual, the global es-entity Clock never advances autonomously. The keep-alive and lost-job loops used regular sleep(), which registers wakes at intermediate boundaries — boundaries that never arrive with a manual clock, causing both loops to freeze indefinitely.

sleep_coalesce() fires once at the end of Clock::advance() instead of at every intermediate tick, so:

  1. The loops actually wake up when advance() is called.
  2. Large time jumps don't cause O(advance / interval) spurious wake-ups.

es-entity 0.10.31 migration

The bump required adapting to several breaking changes:

  • EsEntityErrorEntityHydrationError
  • from_es_entity_error! macro removed → manual From impls for generated per-operation error types (JobCreateError, JobFindError, JobModifyError, JobQueryError)
  • AtomicOperation::now()maybe_now()
  • EntityEvents::load_first() now returns Option<T>
  • DbOp no longer implements Into<Transaction> → split complete_job/reschedule_job into DbOp and Transaction variants
  • err = "..." attribute removed from #[es_repo]
  • Added constraint = "idx_unique_job_type" for custom index name matching

Test plan

  • All 26 existing tests pass (cargo nextest run)
  • Clippy clean on both default and --features sim-time
  • cargo fmt clean
  • Manual clock integration tests deferred — they require a more extensive test harness (global Clock::install_manual(), blocking job runners, DB polling assertions)
  • CI green

🤖 Generated with Claude Code

bodymindarts and others added 4 commits November 25, 2025 09:52
…ual clock

When lana-bank runs with time.mode=manual, the global es-entity Clock
never advances autonomously.  The keep-alive heartbeat and lost-job
detector loops used regular sleep(), which registers a wake that fires
at each intermediate boundary — but with a manual clock those
boundaries never arrive, so both loops freeze indefinitely.

Switch the two housekeeping sleeps in start_keep_alive_handler() and
start_lost_handler() to sleep_coalesce().  Coalesceable sleeps fire
once at the end of Clock::advance() instead of at every intermediate
tick, which means:

  1. The loops actually wake up when advance() is called.
  2. Large time jumps don't cause O(advance / interval) spurious
     wake-ups — only one wake per advance().

Also bumps es-entity from 0.10 to 0.10.31 and adapts to its breaking
API changes (per-operation error types, maybe_now(), EntityHydrationError,
DbOp-based transaction handling).

Closes GaloyMoney/volcano-wip#730

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@HonestMajority
Copy link
Copy Markdown
Contributor Author

Closing as superseded by #68, which was already merged and addresses the same issue (sleep_coalesce in housekeeping loops + es-entity 0.10.31 bump).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants