Skip to content

feat: resilient background job retry & monitoring#401

Open
TallowX92 wants to merge 2 commits intorohitdash08:mainfrom
TallowX92:feat/background-job-retry-monitoring
Open

feat: resilient background job retry & monitoring#401
TallowX92 wants to merge 2 commits intorohitdash08:mainfrom
TallowX92:feat/background-job-retry-monitoring

Conversation

@TallowX92
Copy link
Copy Markdown

@TallowX92 TallowX92 commented Mar 14, 2026

/claim #130

Demo

demo2026-03-14.19-10-07.mp4

Scheduler started, job firing every 60 seconds, executing successfully with next run scheduled.


Summary

Production-grade background job infrastructure for async reminder dispatch with exponential-backoff retry and a live monitoring API.

What's included

Scheduler — app/services/scheduler.py

  • APScheduler BackgroundScheduler with MemoryJobStore
  • Runs process_due_reminders() every 60 seconds
  • Exponential backoff: 5 min → 15 min → 45 min between retries
  • Permanently marks reminders failed=True after 3 retries, captures last_error
  • Auto-disabled in test environment (FLASK_ENV=testing)

New fields on Reminder model

Field Type Purpose
retry_count Integer Attempts so far
last_error String Last exception message
next_retry_at DateTime When to next attempt
failed Boolean Permanently failed flag

Monitoring endpoints — GET/POST /jobs

Method Endpoint Description
GET /jobs/status Scheduler running state + job list
GET /jobs/reminders/stats Counts: sent / pending / overdue / retrying / permanently_failed
POST /jobs/reminders/run Manual trigger (admin)

Tests — tests/test_jobs.py (17 tests)

  • Backoff delta (4): 5 min at retry 0, 15 min at retry 1, 45 min at retry 2, capped at max
  • ProcessDueReminders (8): dispatches due, skips future-dated, increments retry, sets next_retry_at, marks permanently failed, respects retry window, skips sent, skips failed
  • Endpoints (5): status 200, stats shape, manual trigger, auth required

Note: 5 tests require Redis (auth_header fixture stores refresh token) — same constraint across the whole test suite. Core scheduler logic: 12/17 pass without Redis.

Implements production-grade background job infrastructure for async
reminder dispatch with exponential-backoff retry and a monitoring API.

Scheduler:
- APScheduler (BackgroundScheduler) initialized in create_app(), skipped
  in TESTING mode to avoid side effects in tests
- process_due_reminders() runs every 60 seconds via interval trigger
- Graceful shutdown registered via atexit

Retry logic:
- Failed deliveries are retried up to MAX_RETRIES (3) attempts
- Exponential backoff: 5min -> 15min -> 45min between attempts
- Reminders exceeding MAX_RETRIES are marked failed=True (no further attempts)
- retry_count, last_error, next_retry_at, failed columns added to Reminder model
- Schema compat ALTERs added for existing PostgreSQL deployments

Monitoring endpoints (JWT-protected):
- GET  /jobs/status              — scheduler health + registered job list
- GET  /jobs/reminders/stats     — sent/pending/retrying/failed counts
- POST /jobs/reminders/run       — manual trigger for ops/debugging

Tests (17 tests, 12 pass without Redis, 5 require Redis for auth):
- Backoff delta unit tests
- process_due_reminders: success, retry, backoff window, max retries, skip guards
- Endpoint auth, stats, manual trigger
Adds full savings goal management with deposit history and computed
milestone tracking across backend and frontend.

Backend (packages/backend):
- SavingsGoal model: name, target, current_amount, currency, deadline,
  notes, status (ACTIVE/COMPLETED/PAUSED)
- SavingsDeposit model: per-goal contribution history
- /savings CRUD endpoints (GET, POST, PATCH, DELETE)
- POST /savings/:id/deposits — add funds, auto-completes goal at 100%
- GET /savings/:id/deposits — deposit history
- Milestones computed at 25/50/75/100% of target
- Schema SQL added for savings_goals + savings_deposits tables
- Backwards-compatible ALTER statements for existing deployments

Frontend (app/src):
- api/savings.ts — typed API client for all savings endpoints
- pages/Savings.tsx — full page with goal cards, progress bars,
  milestone badges, deposit dialog, pause/resume, delete
- Navbar + App.tsx — /savings route registered and linked

Tests (packages/backend/tests/test_savings.py — 21 tests):
- Goal CRUD: create, list, filter by status, get, update, delete
- Validation: missing fields, invalid amounts, invalid status
- Deposits: accumulation, auto-complete, rejection on completed goal
- Milestones: 25/50/75/100% thresholds
- Auth: all endpoints require JWT, user isolation enforced
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants