-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
The `GET /project/{id}/page/{pageId}/resolved` endpoint is the single largest performance bottleneck in the TPEN3 stack. It triggers a deep dependency chain across all three services and degrades rapidly under concurrent load. RERUM fails under the amplified load, cascading 502s through the stack.
How it works
When called, `/resolved` fetches the page data, then makes N parallel RERUM requests (one per page item/annotation) to resolve each annotation's full content. For a page with 700 items, that's 700 outgoing HTTP requests fanning out through TinyPEN to RERUM.
Evidence
From load testing (Run 4):
| Phase | VUs | /resolved p95 | Impact |
|---|---|---|---|
| Phase 1 (baseline) | 1 | 2.8-4.5s | Slow even at single user |
| Phase 2 (load) | 20 | ~4.7s | Drives overall read p95 above threshold |
| Phase 3 (stress) | 50+ | 14.5s+ | Causes queuing that affects ALL endpoints |
| Phase 4 (spike) | 80 | 25s+ | Queue saturation, 54s timeouts at 150 VUs |
Key observations:
- At 80+ VUs, `/resolved` causes a cascading queue that blocks unrelated endpoints (simple GETs that would normally take <100ms start taking 5-10s)
- The page used for testing has grown to 700+ items across test runs. Each run adds lines via load/stress/conflict phases, making `/resolved` progressively slower (Finding /save high level route and test #15 in test plan)
- This is the primary reason the stack can't handle 30+ concurrent users
Recommendations
In priority order:
-
Cache resolved responses — Even a 30-second TTL cache would dramatically reduce RERUM load. Annotations rarely change between page loads.
-
Add pagination — Return resolved items in pages (e.g., 50 at a time) so the client can progressively render rather than waiting for all 700 to resolve.
-
Batch RERUM lookups — Instead of N individual `/id/{id}` requests, use a single query like `POST /query` with `{"@id": {"$in": [...ids]}}` to fetch many annotations at once.
-
Limit concurrent outgoing requests — If N=700, don't fire all 700 simultaneously. Use a concurrency pool (e.g., 20 at a time) to avoid overwhelming RERUM.