-
-
Notifications
You must be signed in to change notification settings - Fork 699
feat: v4 deadlock detection #1970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 21d9eee The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
WalkthroughThis update introduces a comprehensive modular refactor and feature expansion of the task run engine, focusing on queue management, idempotency, concurrency control, payload processing, and tracing. It adds new classes and interfaces for each concern, updates the data model and schema to support run chain state and locked queue identifiers, and enhances error handling for deadlock and concurrency scenarios. The trigger task service is refactored to inject these concerns, and extensive integration tests are added. Supporting changes include new error types, improved API error handling, schema and migration updates, and demonstration tasks for deadlock and queue locking scenarios. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant TriggerAPI
participant RunEngineTriggerTaskService
participant QueueManager
participant PayloadProcessor
participant IdempotencyKeyConcern
participant RunNumberIncrementer
participant TraceEventConcern
participant RunChainStateManager
participant RunEngine
participant DB
Client->>TriggerAPI: POST /api.v1/tasks/:taskId/trigger
TriggerAPI->>RunEngineTriggerTaskService: call(request)
RunEngineTriggerTaskService->>TriggerTaskValidator: validate (tags, entitlement, attempts, parent run)
RunEngineTriggerTaskService->>IdempotencyKeyConcern: handleTriggerRequest(request)
alt Cached run exists
IdempotencyKeyConcern-->>RunEngineTriggerTaskService: {cached: true, run}
RunEngineTriggerTaskService->>TraceEventConcern: traceIdempotentRun(...)
TraceEventConcern-->>RunEngineTriggerTaskService: span
RunEngineTriggerTaskService-->>TriggerAPI: Return cached run
else No cached run
IdempotencyKeyConcern-->>RunEngineTriggerTaskService: {cached: false, idempotencyKey, expiration}
RunEngineTriggerTaskService->>QueueManager: validateQueueLimits(env)
RunEngineTriggerTaskService->>PayloadProcessor: process(request)
RunEngineTriggerTaskService->>QueueManager: resolveQueueProperties(request)
RunEngineTriggerTaskService->>RunChainStateManager: validateRunChain(request, context)
RunEngineTriggerTaskService->>TraceEventConcern: traceRun(request, callback)
TraceEventConcern->>RunNumberIncrementer: incrementRunNumber(request, callback)
RunNumberIncrementer->>RunEngine: trigger(request, queue info, runChainState)
RunEngine->>DB: Create TaskRun (w/ lockedQueueId, runChainState)
RunEngine-->>RunNumberIncrementer: run
RunNumberIncrementer-->>TraceEventConcern: run
TraceEventConcern-->>RunEngineTriggerTaskService: run
RunEngineTriggerTaskService-->>TriggerAPI: Return new run
end
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
🔭 Outside diff range comments (2)
apps/webapp/app/v3/models/workerDeployment.server.ts (2)
83-90
:⚠️ Potential issue
$prisma
is ignored for the fallback query – breaks transactional safetyYou correctly derive
const $prisma = prismaClient ?? prisma;
but the second query (lines 127‑154) still calls the globalprisma.workerDeployment.findFirst
.
Inside a transaction this will silently jump out of the txn context, defeating the whole point of the newprismaClient
parameter and can even lead to inconsistent reads.- const latestDeployment = await prisma.workerDeployment.findFirst({ + const latestDeployment = await $prisma.workerDeployment.findFirst({Please sweep the rest of the file (and callers) for similar slips.
Also applies to: 127-154
197-215
: 🛠️ Refactor suggestionPossible type mismatch –
environment.projectId
may not exist
WorkerGroupService#getDefaultWorkerGroupForProject
expects{ projectId }
butAuthenticatedEnvironment
(see relevant snippets) only guarantees a nestedproject
object.
Unless a custom type augmentation addsprojectId
, this will be a TypeScript error oncestrict
mode is enabled and could be a runtimeundefined
at line 207.Guard or derive the id explicitly:
const projectId = environment.project?.id ?? environment.projectId; if (!projectId) { /* throw */ }
🧹 Nitpick comments (23)
apps/webapp/app/runEngine/concerns/runNumbers.server.ts (1)
4-14
: Expose the dependency onautoIncrementCounter
for testability
DefaultRunNumberIncrementer
directly imports the singletonautoIncrementCounter
, which makes unit‑testing harder (no easy way to stub). Passing the counter in through the constructor keeps the class pure and easier to mock.-export class DefaultRunNumberIncrementer implements RunNumberIncrementer { - async incrementRunNumber<T>( +export class DefaultRunNumberIncrementer implements RunNumberIncrementer { + constructor( + private readonly counter = autoIncrementCounter, // default to singleton + ) {} + + async incrementRunNumber<T>( request: TriggerTaskRequest, callback: (num: number) => Promise<T> ): Promise<T | undefined> { - return await autoIncrementCounter.incrementInTransaction( + return await this.counter.incrementInTransaction( `v3-run:${request.environment.id}:${request.taskId}`, callback ); } }apps/webapp/app/v3/services/triggerTask.server.ts (2)
105-107
: Environment flag parsed inline – extract a typed helper
env.RUN_ENGINE_RELEASE_CONCURRENCY_ENABLED === "1"
is repeated in several places across the codebase. Wrapping this in a helper (e.g.,isConcurrencyReleaseEnabled()
) prevents string‑comparison typos and centralises documentation of the flag.
3-7
: Import list becoming long – consider barrel files for concernsWith eight concerns injected, the import section is getting unwieldy. A simple
~/runEngine/concerns/index.ts
barrel export would improve readability:import { DefaultQueueManager, DefaultPayloadProcessor, ... } from "~/runEngine/concerns";references/hello-world/src/trigger/deadlocks.ts (1)
11-16
: Unused context parameter
{ ctx }
is destructured in both tasks but never used. Removing it avoids the unused‑var lint error and clarifies the example.-run: async (payload: any, { ctx }) => { +run: async (payload: any) => {packages/core/test/taskExecutor.test.ts (2)
10-11
: Unused imports can be safely removed
TaskRunExecutionResult
andTaskRunExecutionRetry
are imported but are not referenced anywhere in this test file. Removing them keeps the import list tidy and avoids misleading readers into thinking the types are used.-import { - TaskRunExecutionResult, - TaskRunExecutionRetry, -} from "../src/v3/index.js"; +// (No longer needed – remove)
1773-1813
: Flaky upper bound on 429‑retry delay
delay
is expected to be< 32 000 ms
(2 s of jitter) but the real implementation might legitimately apply more jitter or simply round up to the nearest second. This could cause sporadic CI failures.Consider asserting only the lower bound (must be greater than ≈ 29 s) or widening the upper bound:
- expect(delay).toBeLessThan(32000); // Account for max 2000ms jitter + expect(delay).toBeLessThan(40000); // Allow wider jitter toleranceor
expect(delay).toBeGreaterThan(29500);This keeps the test focused on “did we honour the rate‑limit reset?” rather than a tight jitter window.
apps/webapp/app/runEngine/concerns/payloads.server.ts (1)
52-61
: Broaden#createPayloadPacket
support beyond string & JSONBinary or stream payloads (e.g.
Uint8Array
,ReadableStream
) are silently treated as “no packet data”, meaning they will never be off‑loaded or uploaded. Consider:if (payload instanceof Uint8Array) { return { data: payload, dataType: payloadType }; } if (payload instanceof ReadableStream) { return { data: payload, dataType: payloadType }; }to make the processor future‑proof.
apps/webapp/app/runEngine/validators/triggerTaskValidator.ts (2)
24-36
: Tag validation ignores comma‑separated string formWhen
tags
is a single string (e.g. “foo,bar”) the max‑tags check is skipped.
Either document that only an array is supported or split the string:const tagList = Array.isArray(tags) ? tags : tags.split(","); if (tagList.length > MAX_TAGS_PER_RUN) { … }
87-99
: Parent‑run validation may double‑log on concurrent callsIf multiple children attempt to resume the same completed parent, each will log the “terminal state” message. Consider throttling or downgrading to
trace
to avoid noisy logs in large fan‑out scenarios.apps/webapp/app/runEngine/concerns/traceEvents.server.ts (1)
44-51
: Avoidany
‐casts when forwarding attributes to the underlying event builderThe helper passed to callers casts the attribute key to
any
before forwarding toevent.setAttribute
.
This defeats the compile‑time guarantees theSemanticInternalAttributes
enum is giving you and silently allows misspelled / unknown attributes to creep in.- setAttribute: (key, value) => event.setAttribute(key as any, value), + setAttribute: <K extends keyof typeof SemanticInternalAttributes>( + key: K, + value: string + ) => event.setAttribute( + SemanticInternalAttributes[key], + value + ),By typing the function generically you keep strict type‑safety and the IDE will auto‑complete valid keys.
apps/webapp/app/runEngine/types.ts (2)
10-11
: Remove unused import to keep the build clean
z
is imported but never referenced in this file.-import { z } from "zod";
Leaving dead imports can produce linter noise and slower type‑checking.
112-118
: MakevalidateTags
& friends consistentlyasync
All validation methods except
validateTags
,validateMaxAttempts
, andvalidateParentRun
are asynchronous.
For consistency and future‑proofing (e.g., remote feature flags), you may want to standardise onPromise<ValidationResult>
for the entire interface.No code diff provided to avoid breaking callers, but worth considering.
apps/webapp/app/v3/models/workerDeployment.server.ts (1)
162-185
: Consider allowing dependency‑injection here too
getCurrentWorkerDeploymentEngineVersion
still hard‑codes the globalprisma
instance.
If callers start executing it inside a transaction we’ll hit the same consistency issue as above. Exporting an overload or optionalprismaClient
parameter keeps the API symmetrical.apps/webapp/app/runEngine/concerns/queues.server.ts (3)
37-45
: Inefficient lookup pattern for locked queues
taskQueue.findFirst
is executed with no projection (select
) which pulls every column even though you only needid
andname
. Reducing the payload cuts memory and lowers wire latency.- const specifiedQueue = await this.prisma.taskQueue.findFirst({ - where: { ... } - }); + const specifiedQueue = await this.prisma.taskQueue.findFirst({ + where: { ... }, + select: { id: true, name: true }, + });
75-92
: Duplicate sanitisation & missed empty‑string edge case
queueName
is sanitised twice (sanitizeQueueName(queueName)
intosanitizedQueueName
and potentially once more in the fallback).
Also, ifsanitizeQueueName
returns""
for the fallback string, we end up with an empty queue name again.A simpler, single‑pass approach:
- const sanitized = sanitizeQueueName(queueName || `task/${request.taskId}`); - if (!sanitized) { - throw new EngineServiceValidationError("Queue name cannot be empty"); - } - queueName = sanitized; + queueName = sanitizeQueueName(queueName || `task/${request.taskId}`); + if (!queueName) { + throw new EngineServiceValidationError("Queue name cannot be empty after sanitisation"); + }
201-215
: InstantiateWorkerGroupService
on each call – cache or inject
getMasterQueue
creates a newWorkerGroupService
every invocation. In hot paths this may be hundreds of objects per second. Either:
- Accept a pre‑built service via the constructor, or
- Promote a singleton via
singleton()
helper like elsewhere in the codebase.apps/webapp/test/engine/triggerTask.test.ts (4)
548-556
: Strayconsole.log
pollutes test output
console.log(result);
will spam CI logs and can hide useful Vitest output. Remove or gate behind a DEBUG flag.
632-659
: Engine configuration duplicated 4× – extract a helperThe boilerplate
new RunEngine({ prisma, worker: { …
block is repeated in every test suite.
A factory likecreateTestEngine(redisOptions, prisma)
would reduce >150 LOC and guarantee consistent config across tests.Also applies to: 832-860, 963-990, 1092-1120
740-776
: Potential brittle assertions onrun.runChainState
The deep‑equality checks hard‑code queue ordering (parent queue first, child second). If implementation switches to
Set
/Map
order or sorts differently these tests will fail although behaviour is identical.Consider validating with
expect.arrayContaining([...])
or custom matcher that ignores ordering and asserts presence & counts instead.
430-447
: Error message assertion tightly couples to wordingThe rejection expectation matches the exact string emitted by
EngineServiceValidationError
. Minor copy edits will break the test suite.Prefer a regex or at least
toContain("Specified queue")
.apps/webapp/app/runEngine/concerns/runChainStates.server.ts (3)
165-172
: PreferfindUnique
tofindFirst
when querying by primary keyAll three queries use an
id
equality filter, which is Prisma’s primary key.
findUnique
is semantically clearer, marginally faster (skipsLIMIT 1
scan logic), and prevents accidental “first match” behaviour if future engineers extend thewhere
clause.-const queue = await this.prisma.taskQueue.findFirst({ +const queue = await this.prisma.taskQueue.findUnique({ where: { id: lockedQueueId }, select: { concurrencyLimit: true }, });Apply the same change to the queries at lines 200‑208 and 237‑241.
Also applies to: 200-208, 237-241
25-28
: Return-by-reference risks accidental mutation of the parent chain state
parentRunChainState
(and{}
in the earlier branch) is returned directly.
Because the object is later stored on the child run, any in‑process mutation by callers would also mutate the parent’s cached state, producing subtle race conditions.A cheap defensive copy avoids the issue:
- if (!parentRun) { - return {}; - } + if (!parentRun) { + return {}; + } ... - return parentRunChainState; + return { ...parentRunChainState };(Use a deep clone if nested objects may subsequently be mutated.)
Also applies to: 38-48
140-142
: Verbose debug log can bloat application logs
runChainState
may become large as chains grow. Consider either:
- Logging only a serialised summary (e.g.
JSON.stringify(runChainState).slice(0, 1_000)
), or- Masking fields you don’t need at debug time.
This keeps CloudWatch/Datadog bills and log‑processing latency under control.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (37)
.vscode/launch.json
(1 hunks)apps/webapp/app/components/runs/v3/SpanEvents.tsx
(1 hunks)apps/webapp/app/routes/api.v1.tasks.$taskId.trigger.ts
(2 hunks)apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam.spans.$spanParam/route.tsx
(1 hunks)apps/webapp/app/runEngine/concerns/errors.ts
(1 hunks)apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/payloads.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/queues.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/runChainStates.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/runNumbers.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/traceEvents.server.ts
(1 hunks)apps/webapp/app/runEngine/services/batchTrigger.server.ts
(1 hunks)apps/webapp/app/runEngine/services/triggerTask.server.ts
(3 hunks)apps/webapp/app/runEngine/types.ts
(1 hunks)apps/webapp/app/runEngine/validators/triggerTaskValidator.ts
(1 hunks)apps/webapp/app/v3/models/workerDeployment.server.ts
(5 hunks)apps/webapp/app/v3/services/triggerTask.server.ts
(2 hunks)apps/webapp/test/engine/triggerTask.test.ts
(1 hunks)internal-packages/database/prisma/migrations/20250422152423_add_run_chain_state_to_task_run/migration.sql
(1 hunks)internal-packages/database/prisma/schema.prisma
(1 hunks)internal-packages/run-engine/package.json
(1 hunks)internal-packages/run-engine/src/engine/db/worker.ts
(1 hunks)internal-packages/run-engine/src/engine/errors.ts
(1 hunks)internal-packages/run-engine/src/engine/index.ts
(4 hunks)internal-packages/run-engine/src/engine/tests/setup.ts
(4 hunks)internal-packages/run-engine/src/engine/types.ts
(3 hunks)internal-packages/run-engine/src/index.ts
(1 hunks)internal-packages/testcontainers/package.json
(1 hunks)internal-packages/testcontainers/src/index.ts
(1 hunks)packages/core/src/v3/apiClient/errors.ts
(1 hunks)packages/core/src/v3/errors.ts
(1 hunks)packages/core/src/v3/links.ts
(1 hunks)packages/core/src/v3/schemas/schemas.ts
(1 hunks)packages/core/src/v3/workers/taskExecutor.ts
(2 hunks)packages/core/test/taskExecutor.test.ts
(2 hunks)references/hello-world/src/trigger/deadlocks.ts
(1 hunks)references/hello-world/src/trigger/queues.ts
(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (9)
internal-packages/run-engine/src/engine/types.ts (1)
packages/core/src/v3/schemas/schemas.ts (2)
RunChainState
(301-308)RunChainState
(310-310)
packages/core/src/v3/errors.ts (1)
packages/core/src/v3/links.ts (1)
links
(1-28)
apps/webapp/app/runEngine/validators/triggerTaskValidator.ts (5)
apps/webapp/app/runEngine/types.ts (6)
TriggerTaskValidator
(112-117)TagValidationParams
(84-86)ValidationResult
(103-110)EntitlementValidationParams
(88-90)MaxAttemptsValidationParams
(92-95)ParentRunValidationParams
(97-101)apps/webapp/app/models/taskRunTag.server.ts (1)
MAX_TAGS_PER_RUN
(6-6)apps/webapp/app/runEngine/concerns/errors.ts (1)
EngineServiceValidationError
(1-6)apps/webapp/app/services/platform.v3.server.ts (1)
getEntitlement
(328-346)apps/webapp/app/v3/services/triggerTask.server.ts (2)
OutOfEntitlementError
(34-38)MAX_ATTEMPTS
(45-45)
apps/webapp/app/runEngine/concerns/payloads.server.ts (3)
apps/webapp/app/runEngine/types.ts (2)
PayloadProcessor
(73-75)TriggerTaskRequest
(28-34)apps/webapp/app/v3/r2.server.ts (1)
uploadPacketToObjectStore
(22-62)apps/webapp/app/runEngine/concerns/errors.ts (1)
EngineServiceValidationError
(1-6)
packages/core/test/taskExecutor.test.ts (2)
packages/core/src/v3/lifecycle-hooks-api.ts (1)
lifecycleHooks
(5-5)packages/core/src/v3/workers/taskExecutor.ts (1)
result
(1139-1186)
apps/webapp/app/v3/models/workerDeployment.server.ts (3)
packages/core/src/v3/isomorphic/consts.ts (2)
CURRENT_DEPLOYMENT_LABEL
(1-1)CURRENT_UNMANAGED_DEPLOYMENT_LABEL
(2-2)internal-packages/run-engine/src/engine/tests/setup.ts (1)
AuthenticatedEnvironment
(16-18)apps/webapp/app/services/apiAuth.server.ts (1)
AuthenticatedEnvironment
(30-33)
apps/webapp/app/runEngine/concerns/queues.server.ts (8)
apps/webapp/app/runEngine/types.ts (5)
QueueManager
(63-71)TriggerTaskRequest
(28-34)LockedBackgroundWorker
(57-60)QueueProperties
(52-55)QueueValidationResult
(42-50)apps/webapp/app/v3/runEngine.server.ts (1)
engine
(9-9)apps/webapp/app/runEngine/concerns/errors.ts (1)
EngineServiceValidationError
(1-6)apps/webapp/app/v3/services/triggerTaskV1.server.ts (1)
taskId
(671-718)apps/webapp/app/v3/models/workerDeployment.server.ts (1)
findCurrentWorkerFromEnvironment
(197-223)internal-packages/run-engine/src/engine/tests/setup.ts (1)
AuthenticatedEnvironment
(16-18)apps/webapp/app/v3/queueSizeLimits.server.ts (1)
guardQueueSizeLimitsForEnv
(11-34)apps/webapp/app/v3/services/worker/workerGroupService.server.ts (1)
WorkerGroupService
(7-269)
apps/webapp/app/runEngine/concerns/runNumbers.server.ts (2)
apps/webapp/app/runEngine/types.ts (2)
RunNumberIncrementer
(77-82)TriggerTaskRequest
(28-34)apps/webapp/app/services/autoIncrementCounter.server.ts (1)
autoIncrementCounter
(63-63)
apps/webapp/app/runEngine/concerns/traceEvents.server.ts (3)
apps/webapp/app/runEngine/types.ts (3)
TraceEventConcern
(131-146)TriggerTaskRequest
(28-34)TracedEventSpan
(119-129)apps/webapp/app/v3/eventRepository.server.ts (2)
EventRepository
(189-1262)eventRepository
(1264-1264)packages/core/src/v3/isomorphic/friendlyId.ts (1)
BatchId
(96-96)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: typecheck / typecheck
- GitHub Check: units / 🧪 Unit Tests
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (43)
internal-packages/testcontainers/package.json (1)
23-23
: Trivial change, no functional impact.The only change here is the removal of the trailing newline at the end of the file, which does not affect the package configuration or behavior.
packages/core/src/v3/links.ts (1)
21-21
: Documentation link added for deadlock detection feature.This addition provides a helpful reference link for the new deadlock detection feature mentioned in the PR title. It's properly placed in the concurrency section of documentation links.
internal-packages/run-engine/src/engine/errors.ts (1)
83-88
: New error class for one-time use token validation.This error class follows the same pattern as existing error classes and provides a more semantic error type for token validation failures, improving error handling compared to catching generic Prisma constraint errors.
apps/webapp/app/runEngine/concerns/errors.ts (1)
1-6
: Well-structured validation error class for run engine services.This new error class is properly implemented with support for HTTP status codes, which aligns with its usage in API routes. The class follows TypeScript best practices and will help standardize validation error handling across the run engine.
internal-packages/testcontainers/src/index.ts (1)
64-72
: Improved Prisma initialization with better logging.Extracting the database URL to a local variable improves code readability, and adding a console log helps with debugging test database connections. These are good improvements for test infrastructure.
internal-packages/database/prisma/schema.prisma (1)
1856-1858
: Addition ofrunChainState
field for deadlock detection.This new optional
runChainState
JSON field in theTaskRun
model aligns perfectly with the concurrency control and deadlock detection features introduced in the run engine. Using a nullableJson?
preserves backward compatibility with existing runs.internal-packages/database/prisma/migrations/20250422152423_add_run_chain_state_to_task_run/migration.sql (1)
2-2
: AddrunChainState
JSONB column to TaskRun table.The migration correctly introduces the optional
runChainState
column. Ensure your deployment pipeline applies this migration before rolling out the updated application to avoid schema mismatches.apps/webapp/app/runEngine/services/batchTrigger.server.ts (1)
21-21
: Reposition import ofstartActiveSpan
.The
startActiveSpan
import has been moved to group it with other tracing-related imports, improving consistency with tracing concerns in this service..vscode/launch.json (1)
28-29
: Update debug configuration fortriggerTask.test.ts
.The launch target and command have been updated to point to the new
triggerTask.test.ts
suite, ensuring the debugger runs the correct test file for the trigger task service.internal-packages/run-engine/src/index.ts (1)
2-2
: Export newRunOneTimeUseTokenError
.Exporting
RunOneTimeUseTokenError
alongsideRunDuplicateIdempotencyKeyError
provides upstream services with more precise error handling for one-time token scenarios.apps/webapp/app/components/runs/v3/SpanEvents.tsx (1)
86-92
: Improved error message formatting with preserved whitespaceThe error message is now wrapped in a
<pre>
element with appropriate styling, which preserves whitespace and improves readability with consistent font styling and color.internal-packages/run-engine/src/engine/db/worker.ts (1)
164-166
: Enhanced queue selection logic for locked queuesThe queue selection now prioritizes finding queues by
lockedQueueId
if present, falling back to the name-based lookup. This change properly supports the new locked queue feature for deadlock detection and concurrency management.packages/core/src/v3/schemas/schemas.ts (1)
301-310
: Well-structured schema for run chain concurrency stateThe new
RunChainState
schema provides a clear structure for tracking concurrency state in run chains, including which queues are holding concurrent executions and environment-level concurrency. The optional fields allow for flexibility while maintaining structure.apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam.spans.$spanParam/route.tsx (1)
811-817
: Consistent error message formatting improvementsSimilar to the change in
SpanEvents.tsx
, error messages are now wrapped in a<pre>
element with styling that preserves whitespace and provides consistent formatting. This ensures better readability of error details, especially for structured errors related to deadlocks and concurrency issues.internal-packages/run-engine/src/engine/tests/setup.ts (4)
14-14
: Good change to use type-only importChanging to a type-only import is appropriate since
RunEngine
is only used as a type in function parameters, which helps with tree-shaking and prevents potential circular dependencies.
33-44
: Good addition of worker group setupAdding the worker instance group creation is essential for properly testing the run chain state management and queue locking features. This provides the necessary infrastructure for the tests to validate the deadlock detection capabilities.
53-53
: Correctly associates project with worker groupSetting the project's
defaultWorkerGroupId
to the created worker group's ID establishes the necessary relationship between projects and worker groups, which is required for the run engine's queue management features.
173-177
: Good explicit task-queue relationshipsExplicitly connecting tasks to their queues by ID during both create and update operations ensures proper relational integrity in the test database schema. This is important for testing queue locking and concurrency management features.
Also applies to: 189-193
apps/webapp/app/routes/api.v1.tasks.$taskId.trigger.ts (2)
10-10
: Good import of new error typeAdding the import for
EngineServiceValidationError
from the run engine concerns aligns with the modular refactoring approach of the PR.
121-122
: Enhanced error handling for validation errorsThe addition of error handling for
EngineServiceValidationError
properly surfaces validation errors from the run engine with appropriate status codes. This is important for the deadlock detection feature to provide meaningful error responses to API clients.packages/core/src/v3/errors.ts (1)
693-706
: Excellent deadlock error enhancementThis addition provides user-friendly error handling for concurrency deadlocks by:
- Detecting errors with messages that start with "Deadlock detected:"
- Transforming them into a more readable "Concurrency Deadlock Error"
- Including a link to relevant documentation
This improves developer experience by making deadlock errors more understandable and providing guidance on how to resolve them.
internal-packages/run-engine/src/engine/types.ts (3)
4-10
: Good import organizationThe expanded import to include
RunChainState
is organized in a clean, alphabetical manner. This properly imports the type needed for deadlock detection functionality.
93-93
: Good addition of lockedQueueIdAdding the optional
lockedQueueId
property toTriggerParams
enables queue locking functionality, which is an important aspect of the deadlock detection feature. This property allows tasks to be directed to specific queues, supporting more predictable concurrency control.
121-121
: Good addition of runChainStateThe optional
runChainState
property inTriggerParams
is essential for tracking concurrency state across queues and environments. This enables the run engine to detect and prevent deadlocks by maintaining awareness of resource allocation throughout the run chain.apps/webapp/app/runEngine/concerns/runNumbers.server.ts (1)
9-12
: Key prefix hard‑coded to"v3"
– verify it matches the new engine versionThe increment key is hard‑coded as
v3-run:${request.environment.id}:${request.taskId}
even though this PR introduces v4‑specific functionality (deadlock detection, new run‑chain state column, etc.).
If the run‑number counter is supposed to be version‑scoped, using the wrong prefix can mix counters across versions. Please double‑check the expected namespace and either:- `v3-run:${request.environment.id}:${request.taskId}`, + `v4-run:${request.environment.id}:${request.taskId}`,or extract the prefix to a shared constant so it can’t drift.
internal-packages/run-engine/package.json (1)
15-19
: Confirm the build emits the newtests
files intodist/
The new sub‑path export maps runtime and type files to:
./dist/src/engine/tests/setup.{js,d.ts}
Please ensure
tsconfig.build.json
includessrc/engine/tests/**/*
in itsinclude
patterns; otherwise these files won’t be present afterpnpm run build
, causing consumers to fail at runtime.references/hello-world/src/trigger/deadlocks.ts (1)
21-25
: Demonstration task creates an intentional deadlock – add safeguard for productionThis sample mutually triggers
deadlockTester
anddeadlockNestedTask
on a single‑slot queue withreleaseConcurrencyOnWaitpoint: false
, guaranteeing a deadlock. That’s great for testing, but if someone accidentally deploys it in a non‑test project it will consume a worker indefinitely.Consider adding a prominent comment or a runtime guard (e.g., throw if
process.env.NODE_ENV === "production"
) to avoid surprises.packages/core/src/v3/workers/taskExecutor.ts (2)
1024-1035
: Improved error handling for non-retryable client errorsThe addition of this early exit check for client errors (HTTP 400-499, excluding 408 and 429) enhances the retry logic by preventing unnecessary retry attempts for errors that are unlikely to be resolved by retrying.
This change aligns well with the rest of the error handling strategy and will help reduce resource waste on unproductive retries.
1054-1057
: Refined rate limit retry delay logicThe updated logic only sets a custom delay when the rate limit error includes a valid reset time. This is more robust than the previous implementation, as it falls back to the default delay calculation when the rate limit response doesn't include the expected reset time header.
internal-packages/run-engine/src/engine/index.ts (4)
339-340
: Added support for locked queue IDsThis new parameter supports the deadlock detection feature by allowing tasks to be locked to a specific queue ID.
364-365
: Added support for run chain stateThis new parameter enables the engine to track the state of run chains, which is essential for deadlock detection functionality.
420-421
: Persisting lockedQueueId in task run dataThe lockedQueueId is now properly persisted in the database when creating a task run, allowing for consistent queue locking behavior.
450-451
: Persisting runChainState in task run dataThe runChainState is now stored with the task run, enabling deadlock detection and state management across the run chain.
references/hello-world/src/trigger/queues.ts (3)
239-247
: New task for testing locked queue behaviorThe
lockedQueueTask
demonstrates how to handle non-existent queue scenarios, which is useful for testing error handling in the queue locking system. However, this implementation intentionally triggers errors by using a non-existent queue name.Ensure this is intended behavior for testing or demonstration purposes, as it will generate runtime errors when executed.
249-254
: Added simple child task for queue testingThis straightforward implementation provides a target for the locked queue tests.
256-264
: New task for testing non-existent task identifiersSimilar to
lockedQueueTask
, this task intentionally triggers errors by trying to use a non-existent task identifier. This is useful for testing error handling, but also note that it will generate runtime errors.Verify this is intended behavior for testing the system's response to invalid task identifiers. Consider adding a comment explaining this is for testing error handling.
packages/core/src/v3/apiClient/errors.ts (2)
31-39
: Enhanced error message extractionAdded explicit handling for when the error is a string, improving the error message construction logic. This change ensures proper extraction of error messages regardless of their format.
41-55
: Improved error message fallback logicThe refactored code provides a clearer priority order for constructing error messages, using a more readable if-then-else structure instead of nested ternary expressions. This makes the code more maintainable and the error message construction more predictable.
apps/webapp/app/runEngine/validators/triggerTaskValidator.ts (1)
59-71
: Clarify whetherattempt
is 0‑ or 1‑based
attempt > MAX_ATTEMPTS
will allow exactlyMAX_ATTEMPTS
attempts.
If the caller is 0‑based this off‑by‑one blocks one extra try.
Recommend renaming toattemptNumber
and documenting expected base, or changing the comparison to>=
.- if (attempt > MAX_ATTEMPTS) { + if (attempt >= MAX_ATTEMPTS) {apps/webapp/app/runEngine/concerns/traceEvents.server.ts (1)
96-110
:recordEvent
failure isn’t isolated – add error handlingIf
eventRepository.recordEvent
fails (network/db hiccup) the whole trace callback will reject, bubbling the error to the caller and turning a cache‑hit into a 500. Consider isolating the side‑effect:- await this.eventRepository.recordEvent( + try { + await this.eventRepository.recordEvent( `There's an existing run for idempotencyKey: ${idempotencyKey}`, /* … */ ); + } catch (err) { + // best‑effort – don’t block the cached run path + logger.error("Failed to record idempotent‑run note", err); + }Please double‑check with ops whether losing this auxiliary event is acceptable or whether the error should be re‑thrown.
apps/webapp/app/runEngine/services/triggerTask.server.ts (1)
240-329
: Side‑effects before the payload packet is materialised can skew spans
event.setAttribute("queueName", …)
andincrementRunNumber
occur before the (async)payloadProcessor.process()
completes.
If that promise hangs or fails, the span already advertises queue info although no run exists.Consider moving the side‑effects after
payloadProcessor.process
resolves to ensure attributes belong to a valid run.- event.setAttribute("queueName", queueName); - span.setAttribute("queueName", queueName); - … - const payloadPacket = await this.payloadProcessor.process(triggerRequest); + const payloadPacket = await this.payloadProcessor.process(triggerRequest); + event.setAttribute("queueName", queueName); + span.setAttribute("queueName", queueName);apps/webapp/app/runEngine/concerns/queues.server.ts (1)
122-174
: Un‑sanitised external input whenqueue?.name
is providedWhen the caller supplies
options.queue.name
,getQueueName
returns the value verbatim (line 126) skipping sanitisation, but laterresolveQueueProperties
relies ongetQueueName
for non‑locked paths and only sanitises in the outer method. That’s okay.
However, ifgetQueueName
is used directly elsewhere we may push unsafe names into Redis. Consider always sanitising at the boundary (insidegetQueueName
).apps/webapp/app/runEngine/concerns/runChainStates.server.ts (1)
80-81
: Confirm there are no unwanted shared‑reference side effects after mutatingparentQueueState.holding
parentQueueState
is mutated in‑place:parentQueueState.holding += 1;If the object originates from
runChainState.concurrency?.queues
, the mutation also updates the already‑stored state before you filter it out and re‑insert it. That’s currently benign, but if another part of the function throws between the mutation and the re‑assignment the parent state will be left in a partially modified state.Please double‑check that:
- No code between the mutation and the re‑assignment can throw, and
parentRunChainState
is never reused after this method returns.If either condition might be violated, clone
parentQueueState
before mutating.Also applies to: 108-109
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
apps/webapp/app/runEngine/services/triggerTask.server.ts (2)
162-168
:⚠️ Potential issueCached idempotent runs are returned without a trace span
The early‑return branch still shortcuts the method before
traceEventConcern.traceIdempotentRun()
is invoked, so “cached” executions never appear in the traces/telemetry dashboards.The same issue was raised previously and remains unresolved.
- if (idempotencyKeyConcernResult.isCached) { - return idempotencyKeyConcernResult; - } + if (idempotencyKeyConcernResult.isCached) { + return await this.traceEventConcern.traceIdempotentRun( + triggerRequest, + { + existingRun: idempotencyKeyConcernResult.run, + idempotencyKey: options?.idempotencyKey ?? "", + incomplete: false, + isError: Boolean(idempotencyKeyConcernResult.error), + }, + async () => idempotencyKeyConcernResult + ); + }
331-334
:⚠️ Potential issueUnbounded recursion on duplicate idempotency keys
RunDuplicateIdempotencyKeyError
retries viathis.call()
with an incrementedattempt
but with no hard ceiling.
If the race condition never resolves the stack will overflow and/or the function will loop indefinitely.Previous feedback flagged this – please add a bounded retry or back‑off loop.
const MAX_DUPLICATE_RETRIES = 3; if (error instanceof RunDuplicateIdempotencyKeyError && attempt < MAX_DUPLICATE_RETRIES) { return await this.call({ taskId, environment, body, options, attempt: attempt + 1 }); }apps/webapp/test/engine/triggerTask.test.ts (1)
108-200
: 🛠️ Refactor suggestionEnsure
engine.quit()
always runs to avoid hanging test runners
engine.quit()
is only called on the happy path. If an assertion fails the Redis & worker pools stay alive and Vitest hangs.
Please wrap eachcontainerTest
body intry / finally
(or use anafterEach
hook) to guarantee clean‑up, as suggested in the earlier review.containerTest("should trigger a task with minimal options", async ({ prisma, redisOptions }) => { - const engine = new RunEngine({...}); - ... - expect(queueLength).toBe(1); - await engine.quit(); + const engine = new RunEngine({...}); + try { + ... + expect(queueLength).toBe(1); + } finally { + await engine.quit().catch(() => {}); + } });Also applies to: 202-314, 318-478, 482-638, 641-839, 842-968, 971-1099, 1101-1254
🧹 Nitpick comments (4)
apps/webapp/app/runEngine/concerns/traceEvents.server.ts (1)
49-50
: Avoid theas any
cast when forwarding attributesBoth callback wrappers rely on
event.setAttribute(key as any, value)
which circumvents type‑safety and risks silently accepting unsupported attribute types.
Expose a strongly‑typed helper onTracedEventSpan
(e.g.setInternalAttribute<T extends keyof InternalEventAttributes>(key: T, value: InternalEventAttributes[T])
) or overloadsetAttribute
directly so that callers get compile‑time feedback.This minor tweak tightens contracts without affecting runtime behaviour.
Also applies to: 116-117
apps/webapp/app/runEngine/concerns/runChainStates.server.ts (3)
146-149
: Good fix for unlimited concurrency check, but consider adding a comment.You've correctly fixed the issue mentioned in the previous review by checking
request.environment.maximumConcurrencyLimit > 0
. This prevents false deadlock detection when the environment has unlimited concurrency.if ( request.environment.maximumConcurrencyLimit > 0 && + // Only check for deadlocks if there's a limit (0 means unlimited) environmentConcurrency > request.environment.maximumConcurrencyLimit ) {
Consider adding a comment explaining that 0 means unlimited concurrency for better code readability and maintainability.
246-248
: Consider using a more specific error type.Instead of using a generic Error, consider using the
EngineServiceValidationError
class that's already imported and used elsewhere in this file. This would make error handling more consistent.- throw new Error("Deadlock detection failed, parent queue not found"); + throw new EngineServiceValidationError("Deadlock detection failed, parent queue not found");
61-68
: Consider consistent error handling for parent runs without locked queues.Currently, the code logs an error and returns an empty state when the parent run has no locked queue. For consistency with other error handling in this class, consider throwing an
EngineServiceValidationError
instead.if (!parentLockedQueueId) { logger.error("Parent run has no locked queue, cannot determine run chain state", { runId: parentRun.id, runState: parentRun.runChainState, }); - return {}; + throw new EngineServiceValidationError("Parent run has no locked queue, cannot determine run chain state"); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (37)
.vscode/launch.json
(1 hunks)apps/webapp/app/components/runs/v3/SpanEvents.tsx
(1 hunks)apps/webapp/app/routes/api.v1.tasks.$taskId.trigger.ts
(2 hunks)apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam.spans.$spanParam/route.tsx
(1 hunks)apps/webapp/app/runEngine/concerns/errors.ts
(1 hunks)apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/payloads.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/queues.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/runChainStates.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/runNumbers.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/traceEvents.server.ts
(1 hunks)apps/webapp/app/runEngine/services/batchTrigger.server.ts
(1 hunks)apps/webapp/app/runEngine/services/triggerTask.server.ts
(3 hunks)apps/webapp/app/runEngine/types.ts
(1 hunks)apps/webapp/app/runEngine/validators/triggerTaskValidator.ts
(1 hunks)apps/webapp/app/v3/models/workerDeployment.server.ts
(5 hunks)apps/webapp/app/v3/services/triggerTask.server.ts
(2 hunks)apps/webapp/test/engine/triggerTask.test.ts
(1 hunks)internal-packages/database/prisma/migrations/20250422152423_add_run_chain_state_to_task_run/migration.sql
(1 hunks)internal-packages/database/prisma/schema.prisma
(1 hunks)internal-packages/run-engine/package.json
(1 hunks)internal-packages/run-engine/src/engine/db/worker.ts
(1 hunks)internal-packages/run-engine/src/engine/errors.ts
(1 hunks)internal-packages/run-engine/src/engine/index.ts
(4 hunks)internal-packages/run-engine/src/engine/tests/setup.ts
(4 hunks)internal-packages/run-engine/src/engine/types.ts
(3 hunks)internal-packages/run-engine/src/index.ts
(1 hunks)internal-packages/testcontainers/package.json
(1 hunks)internal-packages/testcontainers/src/index.ts
(1 hunks)packages/core/src/v3/apiClient/errors.ts
(1 hunks)packages/core/src/v3/errors.ts
(1 hunks)packages/core/src/v3/links.ts
(1 hunks)packages/core/src/v3/schemas/schemas.ts
(1 hunks)packages/core/src/v3/workers/taskExecutor.ts
(2 hunks)packages/core/test/taskExecutor.test.ts
(2 hunks)references/hello-world/src/trigger/deadlocks.ts
(1 hunks)references/hello-world/src/trigger/queues.ts
(2 hunks)
✅ Files skipped from review due to trivial changes (4)
- packages/core/src/v3/links.ts
- apps/webapp/app/runEngine/services/batchTrigger.server.ts
- internal-packages/run-engine/src/engine/types.ts
- internal-packages/run-engine/src/engine/index.ts
🚧 Files skipped from review as they are similar to previous changes (29)
- apps/webapp/app/runEngine/concerns/errors.ts
- internal-packages/testcontainers/package.json
- internal-packages/database/prisma/migrations/20250422152423_add_run_chain_state_to_task_run/migration.sql
- .vscode/launch.json
- internal-packages/testcontainers/src/index.ts
- apps/webapp/app/routes/api.v1.tasks.$taskId.trigger.ts
- apps/webapp/app/components/runs/v3/SpanEvents.tsx
- internal-packages/run-engine/src/engine/errors.ts
- references/hello-world/src/trigger/deadlocks.ts
- internal-packages/run-engine/src/index.ts
- packages/core/src/v3/errors.ts
- internal-packages/run-engine/package.json
- apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam.spans.$spanParam/route.tsx
- packages/core/src/v3/schemas/schemas.ts
- packages/core/src/v3/workers/taskExecutor.ts
- references/hello-world/src/trigger/queues.ts
- apps/webapp/app/v3/services/triggerTask.server.ts
- apps/webapp/app/v3/models/workerDeployment.server.ts
- packages/core/src/v3/apiClient/errors.ts
- internal-packages/run-engine/src/engine/db/worker.ts
- internal-packages/database/prisma/schema.prisma
- apps/webapp/app/runEngine/concerns/payloads.server.ts
- packages/core/test/taskExecutor.test.ts
- internal-packages/run-engine/src/engine/tests/setup.ts
- apps/webapp/app/runEngine/validators/triggerTaskValidator.ts
- apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
- apps/webapp/app/runEngine/concerns/runNumbers.server.ts
- apps/webapp/app/runEngine/types.ts
- apps/webapp/app/runEngine/concerns/queues.server.ts
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: units / 🧪 Unit Tests
- GitHub Check: typecheck / typecheck
🔇 Additional comments (3)
apps/webapp/app/runEngine/concerns/runChainStates.server.ts (3)
183-187
: Well-implemented queue concurrency limit check.The queue concurrency check properly handles the case where a concurrency limit of 0 means unlimited concurrency. This is consistent with the environment concurrency check and follows best practices.
257-275
: Excellent error message for deadlock situations.The deadlock error message is comprehensive and educational. It clearly explains:
- What a deadlock is
- The current situation that caused it
- Common scenarios that lead to deadlocks
- Multiple approaches to resolve the issue
This is an excellent example of a user-friendly error message that helps developers understand and solve the problem.
1-276
: Overall well-designed deadlock detection implementation.The implementation of the deadlock detection mechanism is thorough and addresses the core issue described in the PR objectives. Key strengths include:
- Clear separation of concerns with well-named methods
- Proper handling of both environment and queue concurrency limits
- Consideration of concurrency release settings
- Comprehensive error messaging
The code successfully propagates the run chain state through the task hierarchy and accounts for concurrency held in queues along the direct ancestor chain, which was the main goal of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
apps/webapp/app/runEngine/services/triggerTask.server.ts (2)
162-168
:isCached
branch still skips tracing – duplicate of earlier feedbackThe early-return path bypasses
traceEventConcern.traceIdempotentRun()
, so cached executions remain invisible to observability dashboards, exactly as pointed out in the previous review.
Please wrap the result in the tracing helper.
331-334
: Unbounded recursion when the duplicate-idempotency race persistsThe retry still recurses indefinitely if the key collision never resolves, risking stack overflow / lambda timeout. A bounded loop or
MAX_DUPLICATE_RETRIES
guard proposed earlier is still required.apps/webapp/app/runEngine/concerns/runChainStates.server.ts (1)
146-152
: Fixed the unlimited environment concurrency check.You've correctly addressed the previous issue with environment concurrency limits by checking if
maximumConcurrencyLimit > 0
before comparing withenvironmentConcurrency
. This properly handles the case where 0 means unlimited concurrency.
🧹 Nitpick comments (5)
apps/webapp/test/engine/triggerTask.test.ts (1)
559-561
: Remove stray console.log from test output
console.log(result);
pollutes test logs and slows CI. Drop or convert to alogger.debug
behind a verbose flag.- console.log(result);
apps/webapp/app/runEngine/concerns/queues.server.ts (3)
155-170
: Use structured logger instead ofconsole.log
for consistency
console.log
is used twice while the rest of the codebase relies onlogger
.
This bypasses log formatting & log-level control.- console.log("Failed to get queue name: No task found", { + logger.debug("Failed to get queue name: No task found", { taskId, environmentId: environment.id, });Apply the same change in the following
console.log
block.
176-194
:lengthOfEnvQueue
can be expensive – consider caching or rate-limiting
validateQueueLimits()
callsengine.lengthOfEnvQueue()
on every trigger, which may perform a full scan on Redis for large installations.
If queue size rarely changes, cache the value for a short TTL or batch calls per request cycle to reduce load.
126-129
: Sanitize user-supplied queue names before early returnWhen a caller specifies
options.queue.name
, the raw value is returned immediately.
Although laterresolveQueueProperties()
re-sanitises, direct callers ofgetQueueName()
(other concerns/tests) might skip that layer.- if (queue?.name) { - return queue.name; + if (queue?.name) { + return sanitizeQueueName(queue.name); }apps/webapp/app/runEngine/concerns/runChainStates.server.ts (1)
231-255
: Potential unhandled error in parent queue lookup.If the parent queue isn't found, you throw a generic error. Consider enhancing this with more context to aid debugging.
- throw new Error("Deadlock detection failed, parent queue not found"); + throw new Error(`Deadlock detection failed, parent queue not found: ${parentLockedQueueId}`);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (37)
.vscode/launch.json
(1 hunks)apps/webapp/app/components/runs/v3/SpanEvents.tsx
(1 hunks)apps/webapp/app/routes/api.v1.tasks.$taskId.trigger.ts
(2 hunks)apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam.spans.$spanParam/route.tsx
(1 hunks)apps/webapp/app/runEngine/concerns/errors.ts
(1 hunks)apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/payloads.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/queues.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/runChainStates.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/runNumbers.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/traceEvents.server.ts
(1 hunks)apps/webapp/app/runEngine/services/batchTrigger.server.ts
(1 hunks)apps/webapp/app/runEngine/services/triggerTask.server.ts
(3 hunks)apps/webapp/app/runEngine/types.ts
(1 hunks)apps/webapp/app/runEngine/validators/triggerTaskValidator.ts
(1 hunks)apps/webapp/app/v3/models/workerDeployment.server.ts
(5 hunks)apps/webapp/app/v3/services/triggerTask.server.ts
(2 hunks)apps/webapp/test/engine/triggerTask.test.ts
(1 hunks)internal-packages/database/prisma/migrations/20250422152423_add_run_chain_state_to_task_run/migration.sql
(1 hunks)internal-packages/database/prisma/schema.prisma
(1 hunks)internal-packages/run-engine/package.json
(1 hunks)internal-packages/run-engine/src/engine/db/worker.ts
(1 hunks)internal-packages/run-engine/src/engine/errors.ts
(1 hunks)internal-packages/run-engine/src/engine/index.ts
(4 hunks)internal-packages/run-engine/src/engine/tests/setup.ts
(4 hunks)internal-packages/run-engine/src/engine/types.ts
(3 hunks)internal-packages/run-engine/src/index.ts
(1 hunks)internal-packages/testcontainers/package.json
(1 hunks)internal-packages/testcontainers/src/index.ts
(1 hunks)packages/core/src/v3/apiClient/errors.ts
(1 hunks)packages/core/src/v3/errors.ts
(1 hunks)packages/core/src/v3/links.ts
(1 hunks)packages/core/src/v3/schemas/schemas.ts
(1 hunks)packages/core/src/v3/workers/taskExecutor.ts
(2 hunks)packages/core/test/taskExecutor.test.ts
(2 hunks)references/hello-world/src/trigger/deadlocks.ts
(1 hunks)references/hello-world/src/trigger/queues.ts
(2 hunks)
✅ Files skipped from review due to trivial changes (2)
- internal-packages/run-engine/src/engine/errors.ts
- apps/webapp/app/runEngine/services/batchTrigger.server.ts
🚧 Files skipped from review as they are similar to previous changes (31)
- packages/core/src/v3/links.ts
- internal-packages/run-engine/src/index.ts
- internal-packages/testcontainers/package.json
- .vscode/launch.json
- internal-packages/database/prisma/migrations/20250422152423_add_run_chain_state_to_task_run/migration.sql
- apps/webapp/app/components/runs/v3/SpanEvents.tsx
- internal-packages/run-engine/src/engine/tests/setup.ts
- apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam.spans.$spanParam/route.tsx
- internal-packages/database/prisma/schema.prisma
- internal-packages/testcontainers/src/index.ts
- packages/core/src/v3/apiClient/errors.ts
- internal-packages/run-engine/src/engine/db/worker.ts
- packages/core/src/v3/schemas/schemas.ts
- apps/webapp/app/runEngine/concerns/errors.ts
- apps/webapp/app/v3/services/triggerTask.server.ts
- internal-packages/run-engine/package.json
- references/hello-world/src/trigger/deadlocks.ts
- apps/webapp/app/runEngine/concerns/runNumbers.server.ts
- packages/core/src/v3/workers/taskExecutor.ts
- packages/core/src/v3/errors.ts
- apps/webapp/app/runEngine/concerns/payloads.server.ts
- internal-packages/run-engine/src/engine/index.ts
- packages/core/test/taskExecutor.test.ts
- apps/webapp/app/routes/api.v1.tasks.$taskId.trigger.ts
- internal-packages/run-engine/src/engine/types.ts
- apps/webapp/app/runEngine/validators/triggerTaskValidator.ts
- apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
- apps/webapp/app/runEngine/concerns/traceEvents.server.ts
- references/hello-world/src/trigger/queues.ts
- apps/webapp/app/v3/models/workerDeployment.server.ts
- apps/webapp/app/runEngine/types.ts
🧰 Additional context used
🧬 Code Graph Analysis (2)
apps/webapp/app/runEngine/concerns/queues.server.ts (8)
apps/webapp/app/runEngine/types.ts (5)
QueueManager
(63-71)TriggerTaskRequest
(28-34)LockedBackgroundWorker
(57-60)QueueProperties
(52-55)QueueValidationResult
(42-50)apps/webapp/app/v3/runEngine.server.ts (1)
engine
(9-9)apps/webapp/app/runEngine/concerns/errors.ts (1)
EngineServiceValidationError
(1-6)apps/webapp/app/v3/services/triggerTaskV1.server.ts (1)
taskId
(671-718)apps/webapp/app/v3/models/workerDeployment.server.ts (1)
findCurrentWorkerFromEnvironment
(197-223)internal-packages/run-engine/src/engine/tests/setup.ts (1)
AuthenticatedEnvironment
(16-18)apps/webapp/app/v3/queueSizeLimits.server.ts (1)
guardQueueSizeLimitsForEnv
(11-34)apps/webapp/app/v3/services/worker/workerGroupService.server.ts (1)
WorkerGroupService
(7-269)
apps/webapp/app/runEngine/concerns/runChainStates.server.ts (3)
apps/webapp/app/runEngine/types.ts (2)
RunChainStateManager
(148-153)TriggerTaskRequest
(28-34)packages/core/src/v3/schemas/schemas.ts (2)
RunChainState
(301-308)RunChainState
(310-310)apps/webapp/app/runEngine/concerns/errors.ts (1)
EngineServiceValidationError
(1-6)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: typecheck / typecheck
- GitHub Check: units / 🧪 Unit Tests
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (8)
apps/webapp/app/runEngine/concerns/runChainStates.server.ts (8)
1-14
: Well-structured class initialization with feature flag support.The
DefaultRunChainStateManager
is cleanly defined with proper dependency injection for the Prisma client and the concurrency release feature flag. This approach allows for flexible configuration and easier testing.
16-48
: Good defensive programming with validation of parent run chain state.The method properly handles edge cases:
- Returns empty state when there's no parent run
- Safely parses parent run chain state with error logging
- Respects the
resumeParentOnCompletion
flagThis ensures robust behavior even with corrupted or missing data.
50-68
: Proper validation of parent locked queue ID.The code correctly validates the existence of a parent locked queue ID before proceeding with concurrency calculations. The error logging provides helpful context for troubleshooting.
75-98
: Concurrency management with feature flag control.The code handles both the legacy and new concurrency release behaviors through the feature flag, ensuring backward compatibility while allowing for the new functionality.
100-127
: Well-implemented selective concurrency increment based on release settings.The code only increments the queue's holding count when the parent won't release concurrency, which effectively prevents unnecessary concurrency reservation.
183-190
: Proper queue concurrency limit check with unlimited support.The code correctly handles the case where
queueConcurrencyLimit === 0
means unlimited by including an explicit check, preventing false positives in deadlock detection.
195-229
: Comprehensive concurrency release determination logic.The method effectively determines if concurrency should be released by:
- First checking explicit request options
- Then falling back to queue configuration
- Handling undefined/null concurrency limits
- Providing a sensible default
This prioritization gives callers control while maintaining consistent behavior.
257-275
: Excellent error message with clear explanation and actionable suggestions.The deadlock error message is comprehensive and helpful:
- Explains what a deadlock is
- Describes the specific situation
- Lists common causes
- Provides multiple remediation options
This will significantly improve the developer experience when encountering this error.
describe("RunEngineTriggerTaskService", () => { | ||
containerTest("should trigger a task with minimal options", async ({ prisma, redisOptions }) => { | ||
const engine = new RunEngine({ | ||
prisma, | ||
worker: { | ||
redis: redisOptions, | ||
workers: 1, | ||
tasksPerWorker: 10, | ||
pollIntervalMs: 100, | ||
}, | ||
queue: { | ||
redis: redisOptions, | ||
}, | ||
runLock: { | ||
redis: redisOptions, | ||
}, | ||
machines: { | ||
defaultMachine: "small-1x", | ||
machines: { | ||
"small-1x": { | ||
name: "small-1x" as const, | ||
cpu: 0.5, | ||
memory: 0.5, | ||
centsPerMs: 0.0001, | ||
}, | ||
}, | ||
baseCostInCents: 0.0005, | ||
}, | ||
tracer: trace.getTracer("test", "0.0.0"), | ||
}); | ||
|
||
const authenticatedEnvironment = await setupAuthenticatedEnvironment(prisma, "PRODUCTION"); | ||
|
||
const taskIdentifier = "test-task"; | ||
|
||
//create background worker | ||
await setupBackgroundWorker(engine, authenticatedEnvironment, taskIdentifier); | ||
|
||
const queuesManager = new DefaultQueueManager(prisma, engine); | ||
|
||
const idempotencyKeyConcern = new IdempotencyKeyConcern( | ||
prisma, | ||
engine, | ||
new MockTraceEventConcern() | ||
); | ||
|
||
const runChainStateManager = new DefaultRunChainStateManager(prisma, true); | ||
|
||
const triggerTaskService = new RunEngineTriggerTaskService({ | ||
engine, | ||
prisma, | ||
runNumberIncrementer: new MockRunNumberIncrementer(), | ||
payloadProcessor: new MockPayloadProcessor(), | ||
queueConcern: queuesManager, | ||
idempotencyKeyConcern, | ||
validator: new MockTriggerTaskValidator(), | ||
traceEventConcern: new MockTraceEventConcern(), | ||
runChainStateManager, | ||
tracer: trace.getTracer("test", "0.0.0"), | ||
}); | ||
|
||
const result = await triggerTaskService.call({ | ||
taskId: taskIdentifier, | ||
environment: authenticatedEnvironment, | ||
body: { payload: { test: "test" } }, | ||
}); | ||
|
||
expect(result).toBeDefined(); | ||
expect(result?.run.friendlyId).toBeDefined(); | ||
expect(result?.run.status).toBe("PENDING"); | ||
expect(result?.isCached).toBe(false); | ||
|
||
const run = await prisma.taskRun.findUnique({ | ||
where: { | ||
id: result?.run.id, | ||
}, | ||
}); | ||
|
||
expect(run).toBeDefined(); | ||
expect(run?.friendlyId).toBe(result?.run.friendlyId); | ||
expect(run?.engine).toBe("V2"); | ||
expect(run?.queuedAt).toBeDefined(); | ||
expect(run?.queue).toBe(`task/${taskIdentifier}`); | ||
|
||
// Lets make sure the task is in the queue | ||
const queueLength = await engine.runQueue.lengthOfQueue( | ||
authenticatedEnvironment, | ||
`task/${taskIdentifier}` | ||
); | ||
expect(queueLength).toBe(1); | ||
|
||
await engine.quit(); | ||
}); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Ensure engine.quit()
always runs to avoid hanging test runners
engine.quit()
is invoked only on the happy path; an assertion failure will leak Redis connections and leave Vitest hanging.
Wrap each containerTest
body in try / finally
, or register an afterEach()
that gracefully stops the engine.
-containerTest("should trigger a task with minimal options", async ({ prisma, redisOptions }) => {
- const engine = new RunEngine({ /* … */ });
- /* tests */
- await engine.quit();
-});
+containerTest("should trigger a task with minimal options", async ({ prisma, redisOptions }) => {
+ const engine = new RunEngine({ /* … */ });
+ try {
+ /* tests */
+ } finally {
+ await engine.quit().catch(() => {});
+ }
+});
Repeat for the remaining containerTest
blocks.
Also applies to: 202-314, 316-479, 481-638, 640-839, 841-969, 971-1099, 1100-1256
where: { | ||
name: specifiedQueueName, | ||
workers: { some: { id: lockedBackgroundWorker.id } }, // Ensure the queue is associated with any task of the locked worker | ||
}, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericallam you should add the environmentId in here because the index on this table is name + runtimeEnvironmentId
@@ -61,10 +61,14 @@ const prisma = async ( | |||
{ postgresContainer }: { postgresContainer: StartedPostgreSqlContainer }, | |||
use: Use<PrismaClient> | |||
) => { | |||
const url = postgresContainer.getConnectionUri(); | |||
|
|||
console.log("Initializing Prisma with URL:", url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericallam did you mean to leave this in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
apps/webapp/test/engine/triggerTask.test.ts (1)
108-201
: Ensureengine.quit()
always executes – still outstandingPrevious reviews flagged the resource-leak risk. Assertions thrown before the final line skip the shutdown call, leaving Redis connections open and hanging Vitest.
Wrap each
containerTest
body intry / finally
(or add anafterEach
) soawait engine.quit()
runs no matter what.This comment repeats earlier feedback.
apps/webapp/app/runEngine/services/triggerTask.server.ts (2)
162-168
:⚠️ Potential issueIdempotent-run path still exits without a trace span
The early-return for cached runs bypasses
TraceEventConcern
, so these executions never show up in observability dashboards.
A virtually identical issue was raised in the previous review and still applies.Consider wrapping the cached-run branch in
traceIdempotentRun()
(or an equivalent helper) before returning. This preserves the fast path while keeping tracing complete.
331-334
:⚠️ Potential issueUnbounded recursion on
RunDuplicateIdempotencyKeyError
can overflow the stackThe method self-recurses indefinitely if the duplicate idempotency key condition persists.
The earlier review suggested capping the retries, but the guard is still missing.+const MAX_DUPLICATE_RETRIES = 3; ... - if (error instanceof RunDuplicateIdempotencyKeyError) { - //retry calling this function, because this time it will return the idempotent run - return await this.call({ taskId, environment, body, options, attempt: attempt + 1 }); - } + if (error instanceof RunDuplicateIdempotencyKeyError && attempt < MAX_DUPLICATE_RETRIES) { + // Retry (bounded) – next call should return the cached run + return await this.call({ taskId, environment, body, options, attempt: attempt + 1 }); + }After the limit, either surface a clear error or return the cached run through the tracing pathway.
🧹 Nitpick comments (3)
apps/webapp/app/runEngine/concerns/traceEvents.server.ts (1)
18-42
: Minor duplication – extract common attribute builder
traceRun
andtraceIdempotentRun
build almost-identicalattributes
,context
, andspan*
objects. A small helper reduces copy-paste risk:private buildBaseOptions(request: TriggerTaskRequest) { return { context: request.options?.traceContext, spanParentAsLink: request.options?.spanParentAsLink, parentAsLinkType: request.options?.parentAsLinkType, kind: "SERVER" as const, environment: request.environment, taskSlug: request.taskId, }; }Then spread:
this.eventRepository.traceEvent( request.taskId, { ...this.buildBaseOptions(request), attributes: { /* … */ }, incomplete: true, immediate: true, }, … );Reduces maintenance overhead when the attribute list evolves.
apps/webapp/test/engine/triggerTask.test.ts (2)
559-561
: Remove strayconsole.log
to keep test output cleanLeft-over logging clutters CI output and can mask important debug traces.
- console.log(result);
40-70
: Centralise mock & engine setup to cut duplicationEach test block re-declares identical
Mock*
classes and lengthynew RunEngine({...})
configs. Extract helpers intotest/utils.ts
, e.g.:export function createTestEngine(prisma, redis) { /* … */ } export function createTriggerService(engine, prisma) { /* … */ }Benefits: smaller tests, single source of truth for defaults, easier future refactors.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (38)
.changeset/tender-jobs-collect.md
(1 hunks).vscode/launch.json
(1 hunks)apps/webapp/app/components/runs/v3/SpanEvents.tsx
(1 hunks)apps/webapp/app/routes/api.v1.tasks.$taskId.trigger.ts
(2 hunks)apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam.spans.$spanParam/route.tsx
(1 hunks)apps/webapp/app/runEngine/concerns/errors.ts
(1 hunks)apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/payloads.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/queues.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/runChainStates.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/runNumbers.server.ts
(1 hunks)apps/webapp/app/runEngine/concerns/traceEvents.server.ts
(1 hunks)apps/webapp/app/runEngine/services/batchTrigger.server.ts
(14 hunks)apps/webapp/app/runEngine/services/triggerTask.server.ts
(3 hunks)apps/webapp/app/runEngine/types.ts
(1 hunks)apps/webapp/app/runEngine/validators/triggerTaskValidator.ts
(1 hunks)apps/webapp/app/v3/models/workerDeployment.server.ts
(5 hunks)apps/webapp/app/v3/services/triggerTask.server.ts
(2 hunks)apps/webapp/test/engine/triggerTask.test.ts
(1 hunks)internal-packages/database/prisma/migrations/20250422152423_add_run_chain_state_to_task_run/migration.sql
(1 hunks)internal-packages/database/prisma/schema.prisma
(1 hunks)internal-packages/run-engine/package.json
(1 hunks)internal-packages/run-engine/src/engine/db/worker.ts
(1 hunks)internal-packages/run-engine/src/engine/errors.ts
(1 hunks)internal-packages/run-engine/src/engine/index.ts
(4 hunks)internal-packages/run-engine/src/engine/tests/setup.ts
(4 hunks)internal-packages/run-engine/src/engine/types.ts
(3 hunks)internal-packages/run-engine/src/index.ts
(1 hunks)internal-packages/testcontainers/package.json
(1 hunks)internal-packages/testcontainers/src/index.ts
(1 hunks)packages/core/src/v3/apiClient/errors.ts
(1 hunks)packages/core/src/v3/errors.ts
(1 hunks)packages/core/src/v3/links.ts
(1 hunks)packages/core/src/v3/schemas/schemas.ts
(1 hunks)packages/core/src/v3/workers/taskExecutor.ts
(2 hunks)packages/core/test/taskExecutor.test.ts
(2 hunks)references/hello-world/src/trigger/deadlocks.ts
(1 hunks)references/hello-world/src/trigger/queues.ts
(2 hunks)
✅ Files skipped from review due to trivial changes (6)
- .changeset/tender-jobs-collect.md
- internal-packages/run-engine/src/engine/errors.ts
- internal-packages/run-engine/package.json
- apps/webapp/app/runEngine/services/batchTrigger.server.ts
- apps/webapp/app/runEngine/concerns/runNumbers.server.ts
- packages/core/src/v3/links.ts
🚧 Files skipped from review as they are similar to previous changes (28)
- internal-packages/testcontainers/package.json
- apps/webapp/app/runEngine/concerns/errors.ts
- internal-packages/database/prisma/migrations/20250422152423_add_run_chain_state_to_task_run/migration.sql
- packages/core/src/v3/errors.ts
- apps/webapp/app/components/runs/v3/SpanEvents.tsx
- internal-packages/run-engine/src/engine/db/worker.ts
- apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam.spans.$spanParam/route.tsx
- .vscode/launch.json
- apps/webapp/app/runEngine/concerns/payloads.server.ts
- packages/core/src/v3/schemas/schemas.ts
- internal-packages/testcontainers/src/index.ts
- apps/webapp/app/routes/api.v1.tasks.$taskId.trigger.ts
- apps/webapp/app/runEngine/validators/triggerTaskValidator.ts
- references/hello-world/src/trigger/queues.ts
- packages/core/test/taskExecutor.test.ts
- internal-packages/run-engine/src/engine/types.ts
- internal-packages/run-engine/src/engine/index.ts
- packages/core/src/v3/workers/taskExecutor.ts
- internal-packages/database/prisma/schema.prisma
- internal-packages/run-engine/src/engine/tests/setup.ts
- internal-packages/run-engine/src/index.ts
- apps/webapp/app/v3/services/triggerTask.server.ts
- packages/core/src/v3/apiClient/errors.ts
- apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
- apps/webapp/app/runEngine/concerns/queues.server.ts
- apps/webapp/app/runEngine/types.ts
- apps/webapp/app/v3/models/workerDeployment.server.ts
- apps/webapp/app/runEngine/concerns/runChainStates.server.ts
🧰 Additional context used
🧬 Code Graph Analysis (1)
apps/webapp/app/runEngine/concerns/traceEvents.server.ts (3)
apps/webapp/app/runEngine/types.ts (3)
TraceEventConcern
(131-146)TriggerTaskRequest
(28-34)TracedEventSpan
(119-129)apps/webapp/app/v3/eventRepository.server.ts (2)
EventRepository
(189-1262)eventRepository
(1264-1264)packages/core/src/v3/isomorphic/friendlyId.ts (1)
BatchId
(96-96)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: typecheck / typecheck
- GitHub Check: units / 🧪 Unit Tests
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (4)
references/hello-world/src/trigger/deadlocks.ts (2)
14-34
: Prefer explicit payload typing instead ofany
Both the
payload
parameter and the{ ctx }
object are typed asany
, forfeiting compile-time safety and IntelliSense.
Define a small interface for your message shape and pass it totask<Payload>()
so callers and maintainers know exactly what the task expects.-import { task, queue } from "@trigger.dev/sdk"; +import { task, queue } from "@trigger.dev/sdk"; +interface MessagePayload { + message: string; +} -export const deadlockTester = task({ +export const deadlockTester = task<MessagePayload>({ id: "deadlock-tester", - run: async (payload: any, { ctx }) => { + run: async (payload, { ctx }) => { /* … */ }, });Repeat for
deadlockNestedTask
.
[ suggest_optional_refactor ]
36-61
: Mutual batch-triggering risks unbounded recursion
deadlockTester
waits for twodeadlockNestedTask
runs, each of which immediately triggers two additionaldeadlockTester
runs.
Without a guard (depth counter, unique idempotency key, etc.) this can explode exponentially or recurse forever outside the new deadlock-detection path, exhausting queue capacity or blowing up the DB.If this file is only a synthetic demonstration that deliberately causes a deadlock, add a comment that it must never be imported in production code. Otherwise, add a maximum depth or a “rootRunId” check.
Would you like an example of such a guard?
apps/webapp/app/runEngine/concerns/traceEvents.server.ts (1)
68-96
: Mismatch betweenincomplete
flag and recorded “cached” event
traceIdempotentRun
forwards the caller-suppliedincomplete
boolean to the repository, but always records an additional child event (“There’s an existing run …”).
Whenincomplete === false
that parent span is closed, yet the child event is attached and might violate trace semantics (child must finish before parent). Ensure yourEventRepository
tolerates this ordering or flipincomplete
totrue
while adding the child record, then mark the span complete afterwards.apps/webapp/app/runEngine/services/triggerTask.server.ts (1)
215-219
:lockedQueueId
can beundefined
while the run is flagged as version-locked
resolveQueueProperties()
is called withlockedToBackgroundWorker ?? undefined
, but if the worker lookup fails (e.g., typo inlockToVersion
), the service continues and eventually triggers a run without a validlockedQueueId
.Downstream consumers may assume the field is non-null when
lockToVersion
was provided, potentially leading to mismatches or dead-locks.Please verify that
resolveQueueProperties
(and subsequent logic) either:
- Throws when
lockedToBackgroundWorker
isnull
whilebody.options?.lockToVersion
is specified, or- Validates the presence of
lockedQueueId
before proceeding.If not, augment the validation layer accordingly.
const [parseDelayError, delayUntil] = await tryCatch(parseDelay(body.options?.delay)); | ||
|
||
if (parseDelayError) { | ||
throw new EngineServiceValidationError(`Invalid delay ${body.options?.delay}`); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
parseDelay
validation can silently accept invalid strings
tryCatch
only captures thrown errors. If parseDelay()
returns undefined
for an unparsable string (the common case), parseDelayError
is undefined
and the code proceeds, creating a run with delayUntil = undefined
.
-const [parseDelayError, delayUntil] = await tryCatch(parseDelay(body.options?.delay));
-
-if (parseDelayError) {
- throw new EngineServiceValidationError(`Invalid delay ${body.options?.delay}`);
-}
+const [parseDelayError, delayUntil] = await tryCatch(() =>
+ parseDelay(body.options?.delay)
+);
+
+if (
+ parseDelayError || // threw
+ (body.options?.delay && !delayUntil) // returned undefined for an input
+) {
+ throw new EngineServiceValidationError(
+ `Invalid delay value "${body.options?.delay}"`
+ );
+}
This prevents silent fall-through and gives users immediate feedback.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const [parseDelayError, delayUntil] = await tryCatch(parseDelay(body.options?.delay)); | |
if (parseDelayError) { | |
throw new EngineServiceValidationError(`Invalid delay ${body.options?.delay}`); | |
} | |
const [parseDelayError, delayUntil] = await tryCatch(() => | |
parseDelay(body.options?.delay) | |
); | |
if ( | |
parseDelayError || | |
(body.options?.delay && !delayUntil) | |
) { | |
throw new EngineServiceValidationError( | |
`Invalid delay value "${body.options?.delay}"` | |
); | |
} |
This PR implements robust deadlock detection when triggering and waiting for a child task chain. It's possible, when not releasing concurrency (e.g. on a queue with a concurrency limit), to get into a situation where the parent tasks are holding all the concurrency available on a queue, so the child task could never be executed, causing the parent task chain to become stuck. We now detect this situation by propagating "Run chain state" through the task hierarchy that accounts for the amount of concurrency behind held in queues along the direct ancestor chain, allowing us to detect issues.
Some required work needed for this PR to work are also useful additions:
TODO:
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Refactor
Chores