perf: parallelize sync content processing (4m25s → 2m00s)#1974
Draft
Asi0Flammeus wants to merge 3 commits intodevfrom
Draft
perf: parallelize sync content processing (4m25s → 2m00s)#1974Asi0Flammeus wants to merge 3 commits intodevfrom
Asi0Flammeus wants to merge 3 commits intodevfrom
Conversation
Replace sequential for...of await loops with concurrent processing using a local pMap helper (no external dependency). Three-phase execution respecting FK dependencies: - Phase 1: professors, labs, resources, events, blogs, legals, bcerts - Phase 2: courses, tutorials (depends on professors) - Phase 3: quiz questions, assignments (depends on courses) Also sort tags before insertion to prevent potential deadlocks on the shared content.tags table during concurrent processing. Benchmarked locally: content processing 210s → 61s (3.4x faster). Total sync 4m25s → 2m00s (remaining time is Typesense indexing). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 17 DB queries that fetch content for Typesense search indexing were executed sequentially due to the spread-await pattern used since the initial implementation. Each query is an independent SELECT on a different table — no ordering dependency. Replace [...(await A), ...(await B)] with Promise.all([A, B]).flat() to run all 17 queries concurrently through the connection pool. Expected: ~35s of sequential DB queries → ~3-5s in parallel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… parallel)" This reverts commit 5166ba7.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
The content sync takes ~4m25s, which is an irritant during local review/test workflows where sync is triggered frequently. The bottleneck is sequential DB upserts (
for...of await), not network or I/O.Root cause analysis
The sync architecture uses a mark-and-sweep garbage collection pattern via
last_sync:sync_date = NOW()at startlast_sync = NOW()DELETE WHERE last_sync < sync_date→ removes anything not touchedThis design is deliberately robust (impossible to leave orphaned data) but forces a full sync every time. Changing the GC mechanism was evaluated and rejected due to high risk/effort ratio.
What this PR does
Replaces sequential processing with concurrent execution using a local
pMaphelper (13 lines, no external dependency). The GC mechanism, all update functions, and all business logic remain untouched.Three-phase execution respecting foreign key dependencies:
Deadlock prevention: Tags are sorted before insertion (
.sort()onlowercaseTags) to ensure deterministic lock ordering on the sharedcontent.tagstable.Concurrency is set to 10, matching the default
postgres.jsconnection pool size.Benchmark results
What does NOT change
last_sync)update*/delete*/groupBy*functionsFiles changed
packages/service-content/src/lib/utils/concurrency.ts— NEW: local pMap helper (13 lines)packages/service-content/src/lib/index.ts— sequential loops → 3 parallel phases.sort()added on tag insertionLocal tests
Note on GC deletion test
The GC deletion phase (
processDeleteOldEntities) is skipped whensyncErrors.length > 0. In local dev, 54 assignment PDF uploads fail withECONNREFUSEDbecause there is no S3 service configured locally. This means the GC never runs in local, regardless of this PR.This is a pre-existing gap in the local dev setup. Adding a MinIO container (S3-compatible) to
local-dev.shwould fix this and enable full GC testing. Tracked separately.The GC code itself (
DELETE WHERE last_sync < sync_date) was not modified by this PR.