feat: resilient background job retry & monitoring (#130)#389
feat: resilient background job retry & monitoring (#130)#389sinatragian wants to merge 7 commits intorohitdash08:mainfrom
Conversation
- Add retry fields to Reminder model: retry_count, max_retries, next_retry_at, last_error, failed_permanently - Add JobRun model for job execution audit log - Implement exponential backoff retry (2^n minutes, capped at 60 min) in new services/job_runner.py - POST /reminders/run now returns full stats dict and uses job_runner - Add GET /reminders/job-runs monitoring endpoint - Add SQL migration 002_resilient_job_retry.sql (IF NOT EXISTS safe) - Add 16 unit/integration tests in tests/test_job_runner.py - Add docs/resilient-job-retry.md Closes rohitdash08#130
…var, derive_status logic, test email uniqueness
…return value check)
…User, limit type=int, no_work status, job-runs auth filter, cleanup annotations
|
Design rationale — extending the existing reminder runner vs. a parallel job system This PR extends the existing reminder runner (
The |
Also adds JobMonitor page with live refresh and run-now button. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds resilient background job retry & monitoring to fix #130.
The reminder runner now retries failed reminders up to a configurable max, records every execution in a
JobRunaudit table, and exposes monitoring endpoints. Race conditions in multi-worker deployments are prevented via.with_for_update(skip_locked=True).Backend
POST /reminders/run— dispatches due reminders with retry semantics; returns{processed, failed, retried, status}GET /reminders/job-runs— lists recent job execution records (JWT-protected)JobRunmodel:id, job_name, status, started_at, finished_at, processed, failed, retried, error_message006_job_runs.sqlFrontend
/jobs— Job Monitor page: live table of job execution history with colored status badges (success/partial/failed/no_work), auto-refreshes every 30 sPOST /reminders/runand shows a toast with{processed, failed, retried}countsapp/src/api/reminders.ts(JobRun type + listJobRuns),app/src/pages/JobMonitor.tsx, route inApp.tsx, nav link inNavbar.tsxDesign note
Extended the existing reminder runner rather than building a parallel job system to minimize surface area and stay backward-compatible with existing schedulers.
.with_for_update(skip_locked=True)atjob_runner.py:73prevents double-dispatch in multi-worker deployments.Closes #130