Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run Engine 2.0 (alpha) #1575

Open
wants to merge 500 commits into
base: main
Choose a base branch
from
Open

Run Engine 2.0 (alpha) #1575

wants to merge 500 commits into from

Conversation

nicktrn
Copy link
Collaborator

@nicktrn nicktrn commented Dec 17, 2024

Good luck @coderabbitai

Summary by CodeRabbit

  • New Features
    • Introduced new interactive UI elements (e.g., live countdown timers, animated icons) to enhance visual feedback.
    • Added several API endpoints to improve task run management, waitpoint operations, and real-time developer presence.
    • Upgraded engine configuration and background processing for smoother, more resilient performance.
  • Tests
    • Extended automated test suites to validate engine behavior, including task triggering, dequeuing, heartbeat handling, priority, and cancellation.
  • Chores
    • Updated dependency management and configuration settings to boost performance and support asynchronous operations.

nicktrn and others added 30 commits December 2, 2024 16:09
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
apps/webapp/app/components/runs/v3/LiveTimer.tsx (1)

70-76: Add className prop and validate endTime.

For consistency with LiveCountUp, consider adding the className prop. Also, add validation for endTime to prevent invalid dates.

 export function LiveCountdown({
   endTime,
   updateInterval = 100,
+  className,
 }: {
   endTime: Date;
   updateInterval?: number;
+  className?: string;
 }) {
+  if (!(endTime instanceof Date) || isNaN(endTime.getTime())) {
+    throw new Error("Invalid endTime provided");
+  }
apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx (2)

704-712: Improve error and output display logic.

The error and output display logic has been reordered, but the conditional checks could be simplified.

Consider combining the conditions:

-{run.error && <RunError error={run.error} />}
-
-{run.payload !== undefined && (
-  <PacketDisplay data={run.payload} dataType={run.payloadType} title="Payload" />
-)}
-
-{run.error === undefined && run.output !== undefined ? (
-  <PacketDisplay data={run.output} dataType={run.outputType} title="Output" />
-) : null}
+{run.error ? (
+  <RunError error={run.error} />
+) : (
+  <>
+    {run.payload !== undefined && (
+      <PacketDisplay data={run.payload} dataType={run.payloadType} title="Payload" />
+    )}
+    {run.output !== undefined && (
+      <PacketDisplay data={run.output} dataType={run.outputType} title="Output" />
+    )}
+  </>
+)}

1079-1251: Improve SpanEntity component structure.

The component handles different span types well, but there are a few potential improvements:

  1. The waitpoint case has deep nesting that could be simplified
  2. Consider extracting the waitpoint and default cases into separate components for better maintainability

Consider refactoring the component:

function SpanEntity({ span }: { span: Span }) {
  const organization = useOrganization();
  const project = useProject();

+  const renderWaitpoint = () => {
+    if (!span.waitpoint) {
+      return <Paragraph>No waitpoint found: {span.entity.id}</Paragraph>;
+    }
+    
+    return (
+      <>
+        <WaitpointHeader />
+        <WaitpointProperties waitpoint={span.waitpoint} />
+        <WaitpointOutput waitpoint={span.waitpoint} />
+      </>
+    );
+  };

+  const renderDefaultSpan = () => {
+    return (
+      <>
+        <SpanTimeline {...span} />
+        <SpanProperties span={span} />
+        <SpanEvents events={span.events} />
+      </>
+    );
+  };

  switch (span.entityType) {
    case "waitpoint":
-      // Current implementation
+      return renderWaitpoint();
    default:
-      // Current implementation
+      return renderDefaultSpan();
  }
}
apps/webapp/app/assets/icons/AnimatedHourglassIcon.tsx (1)

5-11: Consider extracting the props interface.

For better reusability and documentation, consider extracting the props interface.

+interface AnimatedHourglassIconProps {
+  className?: string;
+  delay?: number;
+}
+
-export function AnimatedHourglassIcon({
-  className,
-  delay,
-}: {
-  className?: string;
-  delay?: number;
-}) {
+export function AnimatedHourglassIcon({ className, delay }: AnimatedHourglassIconProps) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9600ee8 and 05bdbf5.

📒 Files selected for processing (4)
  • apps/webapp/app/assets/icons/AnimatedHourglassIcon.tsx (1 hunks)
  • apps/webapp/app/components/runs/v3/LiveTimer.tsx (1 hunks)
  • apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.waitpoints.$waitpointFriendlyId.complete/route.tsx (1 hunks)
  • apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx (11 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.waitpoints.$waitpointFriendlyId.complete/route.tsx
🔇 Additional comments (9)
apps/webapp/app/components/runs/v3/LiveTimer.tsx (2)

72-72: Consider standardizing the update interval.

The default updateInterval is 100ms while other components use 250ms. Consider standardizing this value across components unless there's a specific need for faster updates.

Would you like to explain the reasoning behind using a different update interval for this component?


94-99: LGTM! Good formatting configuration.

The formatting options are well-configured with:

  • Short style for concise display
  • No decimal points for cleaner numbers
  • Comprehensive unit selection (d, h, m, s)
  • Maximum of 4 units for balanced detail
apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx (5)

51-51: LGTM!

The new imports are correctly added to support admin access functionality and waitpoint form integration.

Also applies to: 70-70


314-314: LGTM!

The admin access check is correctly implemented using the useHasAdminAccess hook.

Also applies to: 314-314


503-518: LGTM!

The idempotency key display is well-structured with proper expiration date handling.


551-566: LGTM!

Admin-specific queue information is correctly gated behind the admin access check.


193-193: LGTM!

The SpanTitle component is correctly used with the new props structure.

Also applies to: 193-193

apps/webapp/app/assets/icons/AnimatedHourglassIcon.tsx (2)

12-13: LGTM!

The animation setup using useAnimate is implemented correctly.


26-27: LGTM!

The render implementation is clean and correctly applies the animation ref and className prop.

endTime: Date;
updateInterval?: number;
}) {
const [now, setNow] = useState<Date>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Initialize the now state with current date.

The now state should be initialized to prevent undefined rendering on the first render.

-  const [now, setNow] = useState<Date>();
+  const [now, setNow] = useState<Date>(new Date());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const [now, setNow] = useState<Date>();
const [now, setNow] = useState<Date>(new Date());

Comment on lines 14 to 24
useEffect(() => {
animate(
[
[scope.current, { rotate: 0 }, { duration: 0.7 }],
[scope.current, { rotate: 180 }, { duration: 0.3 }],
[scope.current, { rotate: 180 }, { duration: 0.7 }],
[scope.current, { rotate: 360 }, { duration: 0.3 }],
],
{ repeat: Infinity, delay }
);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add dependency array and optimize animation sequence.

  1. The useEffect hook is missing its dependency array, which could cause unnecessary re-renders. Add [] as the dependency array since the animation should only run once on mount.

  2. The animation sequence could be optimized by combining the duration of similar rotations.

-  useEffect(() => {
+  useEffect(() => {
     animate(
       [
-        [scope.current, { rotate: 0 }, { duration: 0.7 }],
-        [scope.current, { rotate: 180 }, { duration: 0.3 }],
-        [scope.current, { rotate: 180 }, { duration: 0.7 }],
-        [scope.current, { rotate: 360 }, { duration: 0.3 }],
+        [scope.current, { rotate: [0, 180] }, { duration: 1 }],
+        [scope.current, { rotate: [180, 360] }, { duration: 1 }],
       ],
       { repeat: Infinity, delay }
     );
-  });
+  }, [animate, delay, scope]);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
useEffect(() => {
animate(
[
[scope.current, { rotate: 0 }, { duration: 0.7 }],
[scope.current, { rotate: 180 }, { duration: 0.3 }],
[scope.current, { rotate: 180 }, { duration: 0.7 }],
[scope.current, { rotate: 360 }, { duration: 0.3 }],
],
{ repeat: Infinity, delay }
);
});
useEffect(() => {
animate(
[
[scope.current, { rotate: [0, 180] }, { duration: 1 }],
[scope.current, { rotate: [180, 360] }, { duration: 1 }],
],
{ repeat: Infinity, delay }
);
}, [animate, delay, scope]);

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (14)
internal-packages/run-engine/src/engine/tests/triggerAndWait.test.ts (2)

17-44: Consider extracting common engine setup to reduce duplication

Both test cases share nearly identical RunEngine configuration. Consider extracting this to a helper function to reduce code duplication.

+function createTestRunEngine({ prisma, redisOptions }) {
+  return new RunEngine({
+    prisma,
+    worker: {
+      redis: redisOptions,
+      workers: 1,
+      tasksPerWorker: 10,
+      pollIntervalMs: 100,
+    },
+    queue: {
+      redis: redisOptions,
+    },
+    runLock: {
+      redis: redisOptions,
+    },
+    machines: {
+      defaultMachine: "small-1x",
+      machines: {
+        "small-1x": {
+          name: "small-1x" as const,
+          cpu: 0.5,
+          memory: 0.5,
+          centsPerMs: 0.0001,
+        },
+      },
+      baseCostInCents: 0.0001,
+    },
+    tracer: trace.getTracer("test", "0.0.0"),
+  });
+}
+
+// Then in your tests:
+const engine = createTestRunEngine({ prisma, redisOptions });

Also applies to: 206-233


171-171: Use a consistent approach for waiting for asynchronous operations

The tests use await setTimeout(500) to wait for background processing after the child run completes. Consider:

  1. Using a more deterministic approach like polling for a specific state
  2. Adding a comment explaining why the delay is necessary
  3. Extracting the wait time to a named constant to make it configurable
-await setTimeout(500);
+// Wait for waitpoint state to propagate through the system
+const WAITPOINT_PROPAGATION_DELAY_MS = 500;
+await setTimeout(WAITPOINT_PROPAGATION_DELAY_MS);

Also applies to: 410-410

apps/webapp/app/v3/services/triggerTaskV2.server.ts (3)

468-474: Use logger instead of console.log for consistency.

The console.log call is inconsistent with the logger usage pattern established elsewhere in this file.

-      console.log("Failed to get queue name: No task found", {
+      logger.debug("Failed to get queue name: No task found", {
        taskId,
        environmentId: environment.id,
      });

479-485: Use logger instead of console.log for consistency.

The console.log call is inconsistent with the logger usage pattern established elsewhere in this file.

-      console.log("Failed to get queue name: Invalid queue config", {
+      logger.debug("Failed to get queue name: Invalid queue config", {
        taskId,
        environmentId: environment.id,
        queueConfig: task.queueConfig,
      });

233-241: Provide more specific error messages for parent run terminal states.

The error message could be more informative by specifying which terminal states are considered problematic and why they prevent triggering a child task.

        throw new ServiceValidationError(
-          `Cannot trigger ${taskId} as the parent run has a status of ${parentRun.status}`
+          `Cannot trigger ${taskId} as the parent run has already completed with status ${parentRun.status}. Child tasks can only be triggered for active parent runs.`
        );
apps/webapp/app/routes/engine.v1.runs.$runFriendlyId.waitpoints.tokens.$waitpointFriendlyId.wait.ts (1)

66-68: Enhance error feedback.

You currently return a generic “Failed to wait for waitpoint token” message. If feasible, consider returning a more descriptive message to help diagnose errors without exposing sensitive details.

apps/webapp/app/routes/engine.v1.runs.$runFriendlyId.wait.duration.ts (2)

24-30: Clarify 401 vs. 403 for permissions.

Throwing a 401 (Unauthorized) is acceptable, but consider returning a 403 (Forbidden) if the user is authenticated but lacks permission to access this run. This minor difference can help clarify the nature of the access issue.


36-38: Document the idempotency TTL logic.

The logic for resolving idempotencyKeyTTL is sound, but adding a short comment describing its usage can help future maintainers understand the necessity and implementation details of resolveIdempotencyKeyTTL.

apps/webapp/app/v3/services/worker/workerGroupTokenService.server.ts (6)

38-53: Consider returning only the newly generated plaintext token.
Storing only the hashed token in the database is a secure approach, but returning the tokenHash in response may be unnecessary. Hiding the tokenHash from the final return object could mitigate any theoretical leakage risk.

 return {
   id: workerGroupToken.id,
-  tokenHash: workerGroupToken.tokenHash,
   plaintext: rawToken.plaintext,
 };

73-107: Validate workflow when old tokens become invalid.
Currently, once the token is rotated, the old token is overwritten. Suggest verifying downstream usage to ensure no references to the old token remain in flight, preventing unexpected authentication failures.


122-253: Break down the authenticate method for maintainability.
This method handles multiple responsibilities (token validation, instance retrieval, environment logic) and could benefit from extracting sub-logic into helper methods, improving readability and testability.


255-464: Good concurrency handling with Prisma transactions.
The graceful fallback on unique constraint violations is commendable. Consider further splitting logic for MANAGED vs. UNMANAGED flows into separate helper methods or classes to reduce complexity.


506-506: Address the “FIXME” for unmanaged workers.
The comment indicates a partial implementation. Offer to help finalize or refactor:

Would you like to open a new issue or integrate a fix directly for the isLatestDeployment logic?


541-620: Reduce potential duplication in dequeue logic.
The function already branches significantly for managed vs. unmanaged. Consider a unified approach or a small shared helper for retrieving messages while still respecting distinct business logic.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05bdbf5 and 32b89a2.

📒 Files selected for processing (8)
  • apps/webapp/app/assets/icons/PauseIcon.tsx (1 hunks)
  • apps/webapp/app/components/runs/v3/RunIcon.tsx (2 hunks)
  • apps/webapp/app/routes/engine.v1.runs.$runFriendlyId.wait.duration.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.runs.$runFriendlyId.waitpoints.tokens.$waitpointFriendlyId.wait.ts (1 hunks)
  • apps/webapp/app/v3/services/triggerTaskV2.server.ts (1 hunks)
  • apps/webapp/app/v3/services/worker/workerGroupTokenService.server.ts (1 hunks)
  • internal-packages/run-engine/src/engine/tests/triggerAndWait.test.ts (1 hunks)
  • internal-packages/run-engine/src/engine/tests/waitpoints.test.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal-packages/run-engine/src/engine/tests/waitpoints.test.ts
🔇 Additional comments (21)
internal-packages/run-engine/src/engine/tests/triggerAndWait.test.ts (4)

13-196: Well-structured test for the basic triggerAndWait scenario

The test thoroughly validates the parent-child relationship with waitpoints by:

  1. Setting up an authenticated environment
  2. Creating and executing parent and child runs
  3. Verifying execution status transitions
  4. Validating waitpoint behavior and completion

Good use of assertions and proper cleanup in the finally block.


198-199: Good documentation of test purpose

The comment clearly explains the specific scenario being tested - when two runs share the same awaited child run, which happens with idempotencyKey reuse.


200-455: Comprehensive test for shared child waitpoint scenario

This test effectively validates that multiple parent runs can be blocked by and subsequently unblocked by the same child run. The test:

  1. Properly sets up two parent runs
  2. Links both to the same child run
  3. Verifies both parents get unblocked upon child completion
  4. Validates the execution status transitions and waitpoint data

Good use of assertions and proper resource cleanup.


1-456: Consider adding tests for error scenarios

The current tests focus on happy path scenarios where everything succeeds. Consider adding tests for error cases such as:

  1. When a child run fails to complete
  2. When a parent run is canceled while waiting
  3. When waitpoints time out (if applicable)

This would ensure the error handling paths are also validated.

apps/webapp/app/assets/icons/PauseIcon.tsx (1)

1-12: Well-structured SVG icon component.

The PauseIcon component is correctly implemented as a React functional component that renders an SVG with appropriate attributes. It accepts an optional className prop for styling flexibility, and uses currentColor to inherit color from its parent element.

apps/webapp/app/components/runs/v3/RunIcon.tsx (4)

10-10: Clean import of TaskCachedIcon.

The import statement is correctly added.


13-13: Clean import of PauseIcon.

The import statement is correctly added.


46-47: Well-implemented new case for "task-cached".

The switch case for "task-cached" follows the same pattern as other cases, using the TaskCachedIcon with blue styling.


53-53: Icon change from ClockIcon to PauseIcon.

The "wait" case now uses PauseIcon instead of ClockIcon, which is a better visual representation for the wait state.

apps/webapp/app/v3/services/triggerTaskV2.server.ts (5)

35-35: Good practice: Class is properly marked as deprecated.

The class is clearly marked as deprecated with JSDoc comment, directing users to the preferred alternative.


42-44: Check default attempts and over-retries.

There's an attempt parameter that defaults to 0. Combined with MAX_ATTEMPTS, the code raises an error after attempt > MAX_ATTEMPTS. Carefully verify that you don't introduce an endless loop if an unexpected race condition resets it.


404-407: Good handling of idempotency key race conditions.

The recursive retry approach for handling RunDuplicateIdempotencyKeyError is a good pattern to resolve race conditions that might occur with concurrent requests using the same idempotency key.


410-435: Good error handling for database constraints.

The code properly detects and handles Prisma unique constraint violations, providing useful error messages for one-time use tokens and idempotency keys.


545-564: Well-implemented queue size limit check.

The guardQueueSizeLimitsForEnv function provides a clean way to verify that adding new items won't exceed the environment's queue size limits, helping prevent system overload.

apps/webapp/app/routes/engine.v1.runs.$runFriendlyId.waitpoints.tokens.$waitpointFriendlyId.wait.ts (3)

20-30: Looks Good Overall

The route setup, parameter validation via z.object, and usage of WaitForWaitpointTokenRequestBody appear consistent and well-structured. Nothing critical to note here.


35-36: Validate negative or excessive timeouts.

You might want to confirm parseDelay properly handles negative values or excessively large timeouts. Returning an error or capping the value could prevent unexpected behavior or resource usage spikes.


50-57: Check concurrency for multiple blocking calls.

When calling engine.blockRunWithWaitpoint, verify the behavior if multiple clients simultaneously issue waitpoint requests for the same run. Consider concurrency or race conditions to ensure consistent run state.

apps/webapp/app/routes/engine.v1.runs.$runFriendlyId.wait.duration.ts (2)

40-46: Validate date/time boundaries.

When calling engine.createDateTimeWaitpoint, ensure the provided body.date is within valid expected ranges, and handle the possibility of past or far-future dates that could cause unusual scheduling behavior.


48-57: Check concurrency release logic.

The releaseQueue: true parameter may have implications for concurrency management if multiple runs are queued. Confirm that this setting aligns with your intended concurrency strategy.

apps/webapp/app/v3/services/worker/workerGroupTokenService.server.ts (2)

1-37: Imports appear consistent and complete.
All necessary dependencies (e.g., nanoid, crypto, zod, etc.) are included, and naming remains clear.


809-826: Type definitions look clean.
The union-based approach for managed and unmanaged worker responses is well organized and improves clarity.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
apps/webapp/app/presenters/v3/SpanPresenter.server.ts (1)

437-501: ⚠️ Potential issue

Fix switch statement variable scoping issues.

The switch statement has potential variable scoping issues with declarations that can leak between cases.

Wrap each case in block scope to prevent variable declaration leakage:

 switch (span.entity.type) {
   case "waitpoint": {
+    // Wrap in block to prevent variable declaration leakage
     const waitpoint = await this._replica.waitpoint.findFirst({
       where: {
         friendlyId: span.entity.id,
       },
       // ...
     });

     if (!waitpoint) {
       logger.error(`SpanPresenter: Waitpoint not found`, {
         spanId,
         waitpointFriendlyId: span.entity.id,
       });
       return { ...data, entity: null };
     }

     const output =
       waitpoint.outputType === "application/store"
         ? `/resources/packets/${environmentId}/${waitpoint.output}`
         : typeof waitpoint.output !== "undefined" && waitpoint.output !== null
         ? await prettyPrintPacket(waitpoint.output, waitpoint.outputType ?? undefined)
         : undefined;

     let isTimeout = false;
     if (waitpoint.outputIsError && output) {
       if (isWaitpointOutputTimeout(output)) {
         isTimeout = true;
       }
     }

     return {
       ...data,
       entity: {
         type: "waitpoint" as const,
         object: {
           // ...
         },
       },
     };
+    }

   default:
+    // Validate entity type
+    if (!["waitpoint", "span"].includes(span.entity.type)) {
+      logger.warn(`Unknown entity type: ${span.entity.type}`, { spanId });
+    }
     return { ...data, entity: null };
 }
🧰 Tools
🪛 Biome (1.9.4)

[error] 439-455: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Unsafe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 465-470: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Unsafe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 472-472: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Unsafe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🧹 Nitpick comments (14)
CONTRIBUTING.md (3)

233-236: Migration Creation Instruction Clarity and Code Block Language Specifier
The updated instruction "Create a migration" along with the command pnpm run db:migrate:dev:create clearly separates migration creation from its execution. This change improves clarity by prompting users to review the generated migration before applying it. Additionally, it would be beneficial to add a language specifier (e.g., sh) to the fenced code block to enhance readability and conform with markdownlint guidelines.

-   ```
-   pnpm run db:migrate:dev:create
-   ```
+   ```sh
+   pnpm run db:migrate:dev:create
+   ```
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

235-235: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


239-240: Improve Indexing Guidance Wording
The note advising the use of CONCURRENTLY for adding indexes is crucial to prevent unintended table locks. However, the sentence could be rephrased for better clarity and to address the static analysis feedback regarding punctuation. For example:

-   This creates a migration file. Check the migration file does only what you want. If you're adding any database indexes they must use `CONCURRENTLY`, otherwise they'll lock the table when executed.
+   This creates a migration file. Please review it to ensure it matches your requirements. If you're adding any database indexes, be sure to use `CONCURRENTLY` to avoid locking the table during execution.
🧰 Tools
🪛 LanguageTool

[typographical] ~239-~239: The word “otherwise” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...ding any database indexes they must use CONCURRENTLY, otherwise they'll lock the table when executed. ...

(THUS_SENTENCE)


241-247: Separation of Migration Execution Steps and Code Block Enhancements
The new step "Run the migration" with the subsequent commands (pnpm run db:migrate:deploy and pnpm run generate) enhances the instructional clarity by distinctly separating migration creation from its execution. To further improve the document, adding a language specifier to the fenced code block is recommended:

-  ```
-  pnpm run db:migrate:deploy
-  pnpm run generate
-  ```
+  ```sh
+  pnpm run db:migrate:deploy
+  pnpm run generate
+  ```
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

243-243: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx (1)

1076-1284: Well-structured span entity rendering.

The new SpanEntity component cleanly handles different entity types, particularly for waitpoints, with appropriate status displays and controls.

There's an unnecessary Fragment at lines 1235-1252 that could be removed since it only contains one child element. You can simplify this by removing the fragment:

- {span.entity.object.type === "MANUAL" && (
-   <>
-     <Property.Item>
-       <Property.Label>Timeout at</Property.Label>
-       <Property.Value>
-         <div className="flex w-full flex-wrap items-center justify-between gap-1">
-           {span.entity.object.completedAfter ? (
-             <DateTimeAccurate date={span.entity.object.completedAfter} />
-           ) : (
-             "–"
-           )}
-           {span.entity.object.status === "PENDING" && (
-             <ForceTimeout waitpoint={span.entity.object} />
-           )}
-         </div>
-       </Property.Value>
-     </Property.Item>
-   </>
- )}
+ {span.entity.object.type === "MANUAL" && (
+   <Property.Item>
+     <Property.Label>Timeout at</Property.Label>
+     <Property.Value>
+       <div className="flex w-full flex-wrap items-center justify-between gap-1">
+         {span.entity.object.completedAfter ? (
+           <DateTimeAccurate date={span.entity.object.completedAfter} />
+         ) : (
+           "–"
+         )}
+         {span.entity.object.status === "PENDING" && (
+           <ForceTimeout waitpoint={span.entity.object} />
+         )}
+       </div>
+     </Property.Value>
+   </Property.Item>
+ )}
🧰 Tools
🪛 Biome (1.9.4)

[error] 1235-1252: Avoid using unnecessary Fragment.

A fragment is redundant if it contains only one child, or if it is the child of a html element, and is not a keyed fragment.
Unsafe fix: Remove the Fragment

(lint/complexity/noUselessFragments)

internal-packages/database/prisma/schema.prisma (5)

454-464: New Project Fields for Versioning and Worker Group Relationships
The Project model now includes several new fields:

  • version with a default of V2
  • engine with a default of V1
  • Associations with workerGroups, workers, and the optional defaultWorkerGroup (plus its id field)
    These additions extend the model to support Run Engine versioning and attach worker group information. Please verify that the relationships and default values (especially for run engine versioning) align with downstream logic and that any queries or mutations have been updated accordingly.

1714-1719: Enhanced Queue Fields in TaskRun
A new block has been added in TaskRun:

  • masterQueue (defaulting to "main")
  • An optional secondaryMasterQueue
  • attemptNumber (to be defined after a run is dequeued for engine v2+)
    These additions will support finer control of queue positioning and tracking of attempts in Run Engine 2.0. Please ensure that the computation and downstream processing of these fields are consistent and that any related indexes or query logic are updated for optimal performance.

1770-1775: Waitpoint Relationships in TaskRun
Two new relationships have been introduced on TaskRun:

  • associatedWaitpoint, which will be marked when a run finishes
  • blockedByWaitpoints, which holds any waitpoints preventing the run’s execution
    These changes integrate task run execution with waitpoint handling. It is important to review how these relationships are used in job scheduling—ensure that they do not introduce circular dependencies or unwanted cascade effects.

1933-1988: TaskRunExecutionSnapshot Model for Run Engine v2
The TaskRunExecutionSnapshot model has been added to track execution states. Key points include:

  • The engine field now defaults to V2, signifying that snapshots apply to the new engine.
  • It captures execution status, debug description, validity indication, and relationships to TaskRun, batch, environment, and optionally a checkpoint and worker.
    This model is central for tracking execution state changes; please ensure that maintenance (such as data retention policies) is addressed elsewhere in your system design.

2037-2099: Enhanced Waitpoint Model for Run Blocking and Idempotency
The updated Waitpoint model now includes several new fields:

  • idempotencyKey with an expiration and a flag for user-provided input
  • An inactiveIdempotencyKey field for when keys are rotated after completion
  • Relations to mark completion by a task run or batch
  • Indexes ensuring uniqueness on [environmentId, idempotencyKey]
    These changes support robust waitpoint management. One note: the commented-out property (deactivateIdempotencyKeyWhenCompleted) is still present as a comment—if it’s intended for future use, consider adding a “TODO” with an issue reference.
apps/webapp/app/routes/engine.v1.runs.$runFriendlyId.waitpoints.tokens.$waitpointFriendlyId.wait.ts (1)

19-19: Unused request body parameter.
The body parameter in the callback is not referenced. If there is no plan to consume request data here, consider removing it to keep the code clean and avoid confusion.

apps/webapp/app/presenters/v3/SpanPresenter.server.ts (4)

45-57: Improve error handling in early return path.

The early return approach for non-existent parentRun silently fails without providing any error detail. Since other code paths have explicit error throws, this inconsistency could make debugging more difficult.

Consider replacing the silent return with an appropriate error:

 if (!parentRun) {
-  return;
+  throw new Error("Parent run not found");
 }

217-236: Simplify conditional logic with a cleaner approach.

The current approach creates redundant code between the finishedData variable and the existing finishedAttempt. The engine-based condition creates additional complexity.

Consider consolidating the logic:

 const finishedData =
-  run.engine === "V2"
-    ? run
-    : isFinished
-    ? await this._replica.taskRunAttempt.findFirst({
-        select: {
-          output: true,
-          outputType: true,
-          error: true,
-        },
-        where: {
-          status: { in: FINAL_ATTEMPT_STATUSES },
-          taskRunId: run.id,
-        },
-        orderBy: {
-          createdAt: "desc",
-        },
-      })
-    : null;
+  run.engine === "V2" ? run : finishedAttempt;

457-463: Enhance error handling logic for better traceability.

The current error logging is good, but the return path doesn't adequately capture the error state for the UI layer.

Consider enhancing the error return path to provide more context:

 if (!waitpoint) {
   logger.error(`SpanPresenter: Waitpoint not found`, {
     spanId,
     waitpointFriendlyId: span.entity.id,
   });
-  return { ...data, entity: null };
+  return {
+    ...data,
+    entity: {
+      type: "waitpoint" as const,
+      error: {
+        message: "Waitpoint not found",
+        code: "NOT_FOUND"
+      },
+      object: null
+    }
+  };
 }

472-477: Simplify timeout detection logic.

The current approach uses a variable assignment and conditional check that could be consolidated.

Consider simplifying the timeout detection:

-let isTimeout = false;
-if (waitpoint.outputIsError && output) {
-  if (isWaitpointOutputTimeout(output)) {
-    isTimeout = true;
-  }
-}
+const isTimeout = waitpoint.outputIsError && output ? isWaitpointOutputTimeout(output) : false;
🧰 Tools
🪛 Biome (1.9.4)

[error] 472-472: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Unsafe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32b89a2 and c6f6eac.

📒 Files selected for processing (13)
  • CONTRIBUTING.md (1 hunks)
  • apps/coordinator/src/checkpointer.ts (2 hunks)
  • apps/docker-provider/src/index.ts (1 hunks)
  • apps/webapp/app/assets/icons/AnimatedHourglassIcon.tsx (1 hunks)
  • apps/webapp/app/presenters/v3/SpanPresenter.server.ts (13 hunks)
  • apps/webapp/app/routes/api.v1.waitpoints.tokens.$waitpointFriendlyId.complete.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.runs.$runFriendlyId.waitpoints.tokens.$waitpointFriendlyId.wait.ts (1 hunks)
  • apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.waitpoints.$waitpointFriendlyId.complete/route.tsx (1 hunks)
  • apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx (11 hunks)
  • internal-packages/database/package.json (1 hunks)
  • internal-packages/database/prisma/schema.prisma (19 hunks)
  • internal-packages/redis-worker/src/queue.ts (1 hunks)
  • internal-packages/run-engine/src/engine/tests/waitpoints.test.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • apps/docker-provider/src/index.ts
🚧 Files skipped from review as they are similar to previous changes (4)
  • internal-packages/database/package.json
  • apps/webapp/app/assets/icons/AnimatedHourglassIcon.tsx
  • internal-packages/run-engine/src/engine/tests/waitpoints.test.ts
  • apps/coordinator/src/checkpointer.ts
🧰 Additional context used
🪛 LanguageTool
CONTRIBUTING.md

[typographical] ~239-~239: The word “otherwise” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...ding any database indexes they must use CONCURRENTLY, otherwise they'll lock the table when executed. ...

(THUS_SENTENCE)

🪛 markdownlint-cli2 (0.17.2)
CONTRIBUTING.md

235-235: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


243-243: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

🪛 Biome (1.9.4)
apps/webapp/app/presenters/v3/SpanPresenter.server.ts

[error] 439-455: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Unsafe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 465-470: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Unsafe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 472-472: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Unsafe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx

[error] 1235-1252: Avoid using unnecessary Fragment.

A fragment is redundant if it contains only one child, or if it is the child of a html element, and is not a keyed fragment.
Unsafe fix: Remove the Fragment

(lint/complexity/noUselessFragments)

🔇 Additional comments (29)
internal-packages/redis-worker/src/queue.ts (1)

60-60: Increased Redis retry resilience for better reliability

The change from the default maxRetriesPerRequest: 3 to 20 significantly improves the queue's resilience to transient Redis connection issues. This enhancement is valuable for the Run Engine 2.0 alpha, especially in distributed environments where network hiccups are common.

While this change improves reliability, it's worth verifying how it might affect failure detection and timeout behavior. Please confirm that increased retries won't mask persistent Redis issues or significantly delay final failure responses. You might want to consider adding metrics/monitoring for Redis connection quality to identify persistent issues that are being masked by retries.

apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx (10)

51-51: Access control implementation looks good.

Adding useHasAdminAccess hook for permission-based rendering is a good security practice to ensure admin-only features are properly gated.


196-196: Code looks good.

Proper use of SpanTitle with appropriate props.


232-292: Improved structure with conditional rendering.

The refactoring of SpanBody to conditionally render either detailed properties or the new SpanEntity component improves code organization and separation of concerns.


311-311: Proper authorization check.

Good implementation of the admin access check that will be used for conditional rendering of admin-only information.


322-330: Enhanced UI for cached runs.

The improved RunIcon usage with conditional naming for cached runs provides better visual feedback to users. This makes the UI more informative by clearly distinguishing cached runs.


500-515: Good addition of idempotency information.

Adding idempotency key and expiration time improves observability, allowing users to track and debug idempotency-related issues more effectively.


548-551: Enhanced run information with engine version.

Adding the engine version display is useful for debugging and understanding which engine processed the run.


552-563: Admin-only information properly gated.

Good implementation of conditional rendering for admin-only properties (masterQueue and secondaryMasterQueue) based on user permissions.


701-709: Improved error and output handling.

The code now handles errors and outputs more cleanly with proper conditional rendering, ensuring outputs are only shown when there's no error and output data exists.


728-728: Better UX with context-aware button text.

Changing the button text to "Jump to original run" for cached runs improves clarity and user understanding.

internal-packages/database/prisma/schema.prisma (10)

151-152: Positive Naming for Worker Associations
The newly added fields workerGroups and workerInstances on the Organization model now use plural names, which addresses the previous comment (“Ideally this would be plural”). This naming clearly indicates that these fields hold arrays of associated entities.


1578-1579: BackgroundWorker Engine Version
The addition of the engine field (defaulting to V1) in the BackgroundWorker model explicitly captures the run engine version for background workers. This clear versioning will help differentiate behavior as features evolve.


1603-1605: Optional Worker Group Association in BackgroundWorker
The new optional relation fields workerGroup and workerGroupId in BackgroundWorker allow workers to be associated with a worker group. Be sure that the onDelete: SetNull behavior is correct for your use case and that any related business logic is updated to handle a missing worker group gracefully.


1741-1742: First Attempt Start Timestamp for Run Engine 2.0+
The new nullable field firstAttemptStartedAt in TaskRun provides an explicit marker for when the first attempt begins under Run Engine 2.0. This will be valuable for performance tracking and debugging. Consider whether indexing this field is required if you plan to filter or sort on it.


1755-1758: Priority Offset Field in TaskRun
The addition of the priorityMs field introduces a mechanism for adjusting the run’s queue positioning via a negative time offset (in milliseconds). This design can offer flexible scheduling; however, please confirm that all components that compute or utilize the effective queue timestamp correctly accommodate both positive and negative values.


1868-1872: RunEngineVersion Enum Introduction
A new enum RunEngineVersion now distinguishes between legacy and new engine versions (V1 and V2). The inline comment (“The original version that uses marqs v1 and Graphile”) is helpful. Consider documenting plans for future versions if applicable.


2010-2030: TaskRunCheckpoint Model Consistency
The TaskRunCheckpoint model now encapsulates details around checkpoints, including the friendly identifier, type, location, image reference, associated project, and runtime environment. It also relates to multiple execution snapshots. The structure is clear and appears consistent with other models.


2115-2145: TaskRunWaitpoint Model for Associating Runs with Waitpoints
The new TaskRunWaitpoint model bridges TaskRun and Waitpoint, adding support for batch-related fields (such as batchId, batchIndex) and a span identifier for cache completion. The unique constraint on [taskRunId, waitpointId, batchIndex] is critical to ensure each association is unique even in a batched context. Double-check that your SQL engine enforces this constraint as expected.


2720-2724: WorkerDeploymentType Enum Extension
The WorkerDeploymentType enum now includes three values: MANAGED, UNMANAGED, and V1. This explicit inclusion of V1 clarifies which deployment type follows legacy behavior. Confirm that all logic conditioned on the deployment type is updated to handle the V1 value properly.


2739-2740: Explicit Deployment Versioning in WorkerDeployment
A new field type of type WorkerDeploymentType has been added to the WorkerDeployment model, with a default value of V1. This change ensures that deployments are explicitly versioned and can be differentiated in the system. Review any deployment workflows to guarantee they respect this new field.

apps/webapp/app/routes/engine.v1.runs.$runFriendlyId.waitpoints.tokens.$waitpointFriendlyId.wait.ts (2)

16-16: Question the 10KB content length limit.
While 10KB may be sufficient in most scenarios, consider verifying with stakeholders whether 10KB adequately covers all potential request data for this endpoint. If larger payloads are expected, increasing this limit or making it configurable might help avoid unexpected 413 errors.


37-39: Consider concurrency scenarios for blocking runs.
If multiple clients or processes can request to block the same run with the same waitpoint simultaneously, ensure that engine.blockRunWithWaitpoint handles concurrency gracefully (e.g., verifying the run’s current state and re-checking waitpoint availability).

apps/webapp/app/routes/api.v1.waitpoints.tokens.$waitpointFriendlyId.complete.ts (2)

22-22: Verify environment variable usage.
env.TASK_PAYLOAD_MAXIMUM_SIZE governs the maxContentLength for this route. Confirm that this environment variable is set appropriately across all deployments to avoid unexpected 413 errors or overly permissive sizes.


62-64: Enhance error handling specificity.
Consider differentiating between validation errors, database conflicts, or other potential error types, rather than catching everything in a single generic 500 handler.

apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.waitpoints.$waitpointFriendlyId.complete/route.tsx (2)

285-303: Complete the dependency array for submitForm callback.
You reference both submit and formAction inside submitForm, but only [currentJson] is in the dependency array. Including all referenced variables (e.g., submit and formAction) in the dependency array would prevent stale references if those ever changed.


346-348: Fix invalid HTML attributes.
React doesn't recognize min-height or max-height as valid JSX props. Replace them with minHeight and maxHeight to avoid console warnings and ensure proper styling.

apps/webapp/app/presenters/v3/SpanPresenter.server.ts (2)

82-84: Good job updating method signatures to match implementation needs.

The updated method signatures for getRun and getSpan now properly reflect the parameters they need. This improves the API's clarity and maintainability.

Also applies to: 405-407


472-477:

✅ Verification successful

Add test coverage for the timeout detection functionality.

The timeout detection logic is important but doesn't appear to have associated test coverage.


🏁 Script executed:

#!/bin/bash
# Check for tests covering the waitpoint timeout detection

# First, let's find files that might contain tests for this functionality
fd "test|spec" --type f | grep -i "waitpoint\|timeout" | xargs grep -l "isWaitpointOutputTimeout" || echo "No direct test references found"

# Let's also check if the isWaitpointOutputTimeout function itself is tested
fd "test|spec" --type f | xargs grep -l "isWaitpointOutputTimeout" || echo "No tests for isWaitpointOutputTimeout found"

Length of output: 294


Test coverage for timeout detection already exists.
The timeout detection behavior—via the isWaitpointOutputTimeout function—is exercised in the test file at internal-packages/run-engine/src/engine/tests/waitpoints.test.ts. This confirms that the functionality is appropriately covered. If any integration-specific cases within SpanPresenter.server.ts remain untested, please consider adding additional tests; otherwise, no further action is necessary.

🧰 Tools
🪛 Biome (1.9.4)

[error] 472-472: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Unsafe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

Comment on lines +2532 to +2548
/// Friendly IDs
runIds String[] @default([])
runCount Int @default(0)
payload String?
payloadType String @default("application/json")
options Json?
batchVersion String @default("v1")

//engine v2
/// Snapshots that reference this batch
executionSnapshots TaskRunExecutionSnapshot[]
/// Specific run blockers,
runsBlocked TaskRunWaitpoint[]
/// Waitpoints that are blocked by this batch.
/// When a Batch is created it blocks execution of the associated parent run (for andWait)
waitpoints Waitpoint[]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

BatchTaskRun Model Updates for Engine v2 Batching
The BatchTaskRun model now has several new columns:

  • Arrays for runIds and a counter runCount for friendly ID purposes
  • Payload-related fields (payload, payloadType, options, batchVersion)
  • Engine v2–specific fields: executionSnapshots, runsBlocked, and waitpoints
    These additions aim to better manage batch processing in the new engine mode. Please review the integration points where batch runs are created and processed to ensure these fields are appropriately updated and queried.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.waitpoints.$waitpointFriendlyId.complete/route.tsx (2)

314-314: 🛠️ Refactor suggestion

Complete the dependency array for submitForm.

The submitForm callback is missing dependencies in its dependency array.

   const submitForm = useCallback(
     (e: React.FormEvent<HTMLFormElement>) => {
       // ... function body ...
     },
-    [currentJson]
+    [currentJson, submit, formAction]
   );

358-359: 🛠️ Refactor suggestion

Fix invalid HTML attributes.

The JSONEditor component has invalid HTML attributes.

     height="100%"
-    min-height="100%"
-    max-height="100%"
+    minHeight="100%"
+    maxHeight="100%"
   />
🧹 Nitpick comments (2)
apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.waitpoints.$waitpointFriendlyId.complete/route.tsx (2)

340-341: Fix typo in the tooltip content.

There's a small typo in the tooltip text.

-              "This is will immediately complete this waitpoint with the payload you specify. This is useful during development for testing."
+              "This will immediately complete this waitpoint with the payload you specify. This is useful during development for testing."

345-347: Remove duplicate scrollable container.

There are two nested scrollable containers with identical styles, which is unnecessary and could cause scrolling issues.

-        <div className="overflow-y-auto bg-charcoal-900 scrollbar-thin scrollbar-track-transparent scrollbar-thumb-charcoal-600">
           <div className="max-h-[70vh] min-h-40 overflow-y-auto bg-charcoal-900 scrollbar-thin scrollbar-track-transparent scrollbar-thumb-charcoal-600">
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6f6eac and e8b7bee.

📒 Files selected for processing (1)
  • apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.waitpoints.$waitpointFriendlyId.complete/route.tsx (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Analyze (javascript-typescript)

<div className="flex items-center gap-1">
<AnimatedHourglassIcon
className="text-dimmed-dimmed size-4"
delay={(waitpoint.completedAfter.getMilliseconds() - Date.now()) / 1000}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix delay calculation in AnimatedHourglassIcon.

The delay calculation for the AnimatedHourglassIcon appears incorrect. It uses milliseconds from completedAfter instead of calculating the difference correctly.

-              delay={(waitpoint.completedAfter.getMilliseconds() - Date.now()) / 1000}
+              delay={(waitpoint.completedAfter.getTime() - Date.now()) / 1000}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
delay={(waitpoint.completedAfter.getMilliseconds() - Date.now()) / 1000}
delay={(waitpoint.completedAfter.getTime() - Date.now()) / 1000}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (18)
apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.logs.debug.ts (4)

37-39: Improve error response details.

The error response doesn't provide enough information about what went wrong. Consider adding more specific error details to help API consumers understand the issue.

  if (!run) {
-    throw new Response("You don't have permissions for this run", { status: 401 });
+    return new Response(
+      JSON.stringify({ error: "Run not found or you don't have access to it" }),
+      { status: 401, headers: { "Content-Type": "application/json" } }
+    );
  }

56-63: Add response headers and body for error cases.

The response objects for error cases lack Content-Type headers and error details in the body, which could make debugging harder for API consumers.

  switch (eventResult.code) {
    case "FAILED_TO_RECORD_EVENT":
-     return new Response(null, { status: 400 }); // send a 400 to prevent retries
+     return new Response(
+       JSON.stringify({ error: "Failed to record event" }),
+       { status: 400, headers: { "Content-Type": "application/json" } }
+     ); // send a 400 to prevent retries
    case "RUN_NOT_FOUND":
-     return new Response(null, { status: 404 });
+     return new Response(
+       JSON.stringify({ error: "Run not found" }),
+       { status: 404, headers: { "Content-Type": "application/json" } }
+     );
    default:
      return assertExhaustive(eventResult.code);
  }

64-70: Enhance error handling with structured logging.

The current error logging might not properly serialize complex error objects. Consider using a more structured approach.

  try {
    // ...existing code
  } catch (error) {
    logger.error("Failed to record dev log", {
      environmentId: authentication.environment.id,
-     error,
+     error: error instanceof Error ? 
+       { message: error.message, stack: error.stack } : 
+       String(error),
    });
    throw error;
  }

53-54: Add Content-Type header to success response.

The success response lacks a Content-Type header, which is good practice even for empty responses.

  if (eventResult.success) {
-   return new Response(null, { status: 204 });
+   return new Response(null, { 
+     status: 204,
+     headers: { "Content-Type": "application/json" }
+   });
  }
apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.latest.ts (1)

1-28: Consider enhancing the error handling with detailed logs

The implementation for retrieving the latest snapshot is well-structured, but the error handling could be improved. When the snapshot retrieval fails, the code throws a generic error without logging details that would be helpful for debugging.

Consider adding error logging before throwing the error to make troubleshooting easier:

  if (!executionData) {
+     logger.error("Failed to retrieve latest snapshot", { runFriendlyId });
      throw new Error("Failed to retrieve latest snapshot");
  }

Also, consider adding a debug log at the beginning of the function to track when this API is being called, similar to other routes in this PR.

apps/webapp/app/routes/engine.v1.dev.config.ts (1)

18-23: Add validation for environment variables

The code retrieves configuration values from environment variables but doesn't validate that they exist or have valid values.

Consider adding validation to ensure the environment variables are defined and have appropriate values:

  try {
+     if (typeof env.DEV_DEQUEUE_INTERVAL_WITH_RUN !== 'number' || 
+         typeof env.DEV_DEQUEUE_INTERVAL_WITHOUT_RUN !== 'number') {
+       logger.warn("Missing or invalid dequeue interval environment variables", {
+         withRun: env.DEV_DEQUEUE_INTERVAL_WITH_RUN,
+         withoutRun: env.DEV_DEQUEUE_INTERVAL_WITHOUT_RUN,
+       });
+     }
      return json({
        environmentId: authentication.environment.id,
        dequeueIntervalWithRun: env.DEV_DEQUEUE_INTERVAL_WITH_RUN,
        dequeueIntervalWithoutRun: env.DEV_DEQUEUE_INTERVAL_WITHOUT_RUN,
      });
apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.snapshots.latest.ts (2)

34-36: Improve the unauthorized error message.

The error message could be more specific about why permissions are insufficient. Consider providing more context about what permissions are required.

-        throw new Response("You don't have permissions for this run", { status: 401 });
+        throw new Response("You don't have permissions to access this run in the current environment", { status: 401 });

47-53: Enhance error handling for better debugging.

The current error handling catches, logs, and re-throws the error without adding context. Consider enriching the error with additional information before re-throwing.

    } catch (error) {
      logger.error("Failed to get latest snapshot", {
        environmentId: authentication.environment.id,
        error,
      });
-      throw error;
+      throw error instanceof Error 
+        ? new Error(`Failed to get latest snapshot: ${error.message}`) 
+        : new Error("Failed to get latest snapshot");
    }
apps/webapp/app/routes/engine.v1.worker-actions.deployments.$deploymentFriendlyId.dequeue.ts (1)

39-47: Improve readability with if-else instead of ternary.

The ternary operator makes this code section harder to read. Consider using an if-else statement for better readability.

-    const dequeuedMessages = (await isCurrentDeployment(deployment.id, deployment.environmentId))
-      ? await authenticatedWorker.dequeueFromEnvironment(
-          deployment.worker.id,
-          deployment.environmentId
-        )
-      : await authenticatedWorker.dequeueFromVersion(
-          deployment.worker.id,
-          searchParams.maxRunCount
-        );
+    let dequeuedMessages;
+    if (await isCurrentDeployment(deployment.id, deployment.environmentId)) {
+      dequeuedMessages = await authenticatedWorker.dequeueFromEnvironment(
+        deployment.worker.id,
+        deployment.environmentId
+      );
+    } else {
+      dequeuedMessages = await authenticatedWorker.dequeueFromVersion(
+        deployment.worker.id,
+        searchParams.maxRunCount
+      );
+    }
apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.logs.debug.ts (1)

15-37: Consider logging success scenarios.
While successful recording returns a 204, adding a debug/info log upon success may help trace the lifecycle of debug logs and facilitate future troubleshooting.

apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.heartbeat.ts (1)

24-56: Consider using 403 or 404 response code.
Currently returning 401 if the run is not found or not allowed. If authentication is valid but permissions are insufficient, using 403 typically better reflects the scenario. Alternatively, using 404 can help obscure resource existence.

apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.complete.ts (1)

27-61: Refine unauthorized response code.
Like in the heartbeat route, consider returning 403 or 404 instead of 401 if the user lacks permissions or the resource is not found, to differentiate between authentication vs. authorization issues.

apps/webapp/app/routes/engine.v1.dev.presence.ts (2)

43-45: Address the in-code TODO comment
The code references a plan to set a string with expire on the same call. This can reduce complexity and avoid potential race conditions. Consider addressing this TODO or removing the comment if no longer necessary.


66-78: Ensure proper cleanup for unexpected terminations
The cleanup handler is triggered during normal disconnections, but might not fire if the process abruptly terminates. Evaluate whether additional safeguards or keepalive checks are needed to handle abrupt termination scenarios.

apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.start.ts (1)

38-48: Secure error handling when run is not found
Throwing a 401 is correct for unauthorized access, but consider returning a more descriptive message or distinct status code (e.g., 404 for not found if that is more aligned with domain logic). This helps clarify the reason for failures.

apps/webapp/app/routes/engine.v1.dev.dequeue.ts (3)

16-16: Be cautious with a fixed max dequeue count
Hardcoding env.DEV_DEQUEUE_MAX_RUNS_PER_PULL could limit throughput unexpectedly for certain workloads. Provide a configuration surface or documented rationale for the chosen limit.


30-62: Add error handling for background worker dequeue operations
If any error happens while calling engine.dequeueFromBackgroundWorkerMasterQueue, the loop will break silently. Consider catching and logging errors more explicitly to ensure issues are visible.


64-74: Consider factoring out resource checks to a dedicated module
The resource-based queuing logic is repeated for background workers and environment master queue. Refactoring it into a reusable method can improve maintainability and reduce duplication.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e8b7bee and c63915f.

📒 Files selected for processing (20)
  • apps/webapp/app/routes/engine.v1.dev.config.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.dev.dequeue.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.dev.presence.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.logs.debug.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.complete.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.start.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.heartbeat.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.snapshots.latest.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.connect.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.deployments.$deploymentFriendlyId.dequeue.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.dequeue.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.heartbeat.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.logs.debug.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.complete.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.start.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.heartbeat.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.restore.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.suspend.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.latest.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (23)
apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.logs.debug.ts (2)

29-35: LGTM! Good database query security.

The database query correctly filters by both friendlyId and runtimeEnvironmentId, ensuring that users can only access runs from their own environment.


41-50: LGTM! Good use of structured logging parameters.

The function passes the necessary parameters to recordRunDebugLog in a structured manner, including properly formatting the run ID using RunId.fromFriendlyId().

apps/webapp/app/routes/engine.v1.worker-actions.heartbeat.ts (1)

1-13: Clean implementation of worker heartbeat API route.

This implementation follows a good pattern for API route handlers with proper type safety using TypeScript. The route has a single responsibility and handles the worker heartbeat action effectively.

apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.heartbeat.ts (1)

1-26: Well-structured run snapshot heartbeat API route with good parameter validation.

This implementation effectively uses Zod for route parameter validation and follows the established pattern for API routes. The destructuring of parameters and passing to the heartbeat method is clean and maintainable.

apps/webapp/app/routes/engine.v1.worker-actions.dequeue.ts (1)

1-16: Clean implementation of worker dequeue API route with appropriate response handling.

This endpoint correctly follows the established pattern for API routes while properly handling the response from the dequeue operation. The direct return of the dequeued result as JSON is appropriate for this use case.

apps/webapp/app/routes/engine.v1.worker-actions.connect.ts (1)

1-19: Well-implemented worker connect API route with informative response.

This endpoint follows the established pattern while providing a more detailed response that includes the worker group information. The structured response with both status and worker metadata is helpful for clients consuming this API.

apps/webapp/app/routes/engine.v1.dev.config.ts (1)

10-10: Clarify the purpose of the findResource function

The findResource function simply returns 1 without any apparent purpose. This seems like a placeholder or a required parameter that doesn't serve a meaningful function in this context.

Is this a required parameter for the createLoaderApiRoute function? If so, consider adding a comment explaining why it returns 1 or refactor it to serve a meaningful purpose.

apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.restore.ts (1)

2-5: Address schema naming mismatch

There's a mismatch between the route name ("restore") and the schemas being used ("WorkerApiSuspendRun*"). This creates confusion since suspension and restoration are opposite operations.

Confirm that these are the correct schemas to use for a restoration operation. If not, update to use the appropriate schemas for restoration.

apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.start.ts (1)

1-32: Well-structured API route implementation!

The implementation correctly handles starting a run attempt with proper validation and error handling. The code follows good practices with type safety, input validation using Zod, and a focused handler function.

apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.snapshots.latest.ts (1)

12-12: Consider clarifying the resource finder implementation.

The resource finder always returns 1, which seems unusual. If this is intentional, consider adding a comment explaining why this approach is used rather than finding an actual resource.

apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.complete.ts (1)

1-33: Well-implemented API route!

The code correctly handles completing a run attempt with proper validation and error handling. The implementation follows good practices with type safety, input validation using Zod, and a focused handler function.

apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.logs.debug.ts (2)

1-6: Good import structure.
No issues found here. The imports are straightforward and properly scoped.


8-14: Well-structured route definition.
The use of Zod schema validation for parameters and body is clear and robust.

apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.suspend.ts (2)

1-9: Straightforward imports.
No concerns. The imports are reasonable and well-organized.


10-17: Clean route definition.
Parameter and body validators are clearly defined.

apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.heartbeat.ts (3)

1-12: Imports look good.
Nothing to note regarding syntax or usage.


16-23: Route definition is concise.
The structure is consistent and clear.


59-60: Module export is fine.
No issues found here.

apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.complete.ts (3)

1-17: Import statements are valid.
Everything appears in order.


19-26: Config and schema usage are clear.
Nicely defined parameters and body schema.


64-65: Export statement is consistent.
No additional feedback.

apps/webapp/app/routes/engine.v1.dev.presence.ts (1)

61-65: Validate the periodic presence update frequency
Frequent calls to Redis (based on the defined interval) could increase load. Ensure the chosen interval balances real-time accuracy with resource usage.

apps/webapp/app/routes/engine.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.start.ts (1)

50-53: Verify engine dependency injection
The engine.startRunAttempt call relies on an external engine import. Consider verifying that it’s correctly initialized with relevant configuration for the environment to prevent unexpected behavior.

Comment on lines +22 to +26
async ({
authentication,
body,
params,
}): Promise<TypedResponse<WorkerApiRunAttemptStartResponseBody>> => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix return type inconsistency.

The function's return type is specified as Promise<TypedResponse<WorkerApiRunAttemptStartResponseBody>>, but the function actually returns regular Response objects with no body data. This could cause type errors.

Update the return type to match the actual implementation:

  async ({
    authentication,
    body,
    params,
-  }): Promise<TypedResponse<WorkerApiRunAttemptStartResponseBody>> => {
+  }): Promise<Response> => {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async ({
authentication,
body,
params,
}): Promise<TypedResponse<WorkerApiRunAttemptStartResponseBody>> => {
async ({
authentication,
body,
params,
}): Promise<Response> => {


return json(continuationResult);
} catch (error) {
logger.error("Failed to suspend run", { runFriendlyId, snapshotFriendlyId, error });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect error message

The function is for continuing run execution, but the error message says "Failed to suspend run". This appears to be a copy-paste error.

Update the error message to correctly reflect the operation:

-      logger.error("Failed to suspend run", { runFriendlyId, snapshotFriendlyId, error });
+      logger.error("Failed to continue run", { runFriendlyId, snapshotFriendlyId, error });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.error("Failed to suspend run", { runFriendlyId, snapshotFriendlyId, error });
logger.error("Failed to continue run", { runFriendlyId, snapshotFriendlyId, error });

Comment on lines +27 to +36
if (!body.success) {
// TODO: we could create a debug span here
logger.error("Failed to restore run", {
runFriendlyId,
snapshotFriendlyId,
error: body.error,
});

return json({ ok: true });
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Revisit error handling strategy

The current implementation returns { ok: true } for both successful and failed operations (when body.success is false). This makes it difficult for API consumers to distinguish between success and failure.

Consider returning an error status or different response for failures:

      logger.error("Failed to restore run", {
        runFriendlyId,
        snapshotFriendlyId,
        error: body.error,
      });

-      return json({ ok: true });
+      return json({ ok: false, error: body.error?.message || "Unknown error" });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!body.success) {
// TODO: we could create a debug span here
logger.error("Failed to restore run", {
runFriendlyId,
snapshotFriendlyId,
error: body.error,
});
return json({ ok: true });
}
if (!body.success) {
// TODO: we could create a debug span here
logger.error("Failed to restore run", {
runFriendlyId,
snapshotFriendlyId,
error: body.error,
});
return json({ ok: false, error: body.error?.message || "Unknown error" });
}

logger.debug("Restoring run", { runFriendlyId, snapshotFriendlyId, body });

if (!body.success) {
// TODO: we could create a debug span here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Address TODO comment

There's a TODO comment about creating a debug span that should be addressed before finalizing this code.


🏁 Script executed:

#!/bin/bash
# Check if the debug span has been implemented in similar routes
grep -r "debug span" --include="*.ts" apps/webapp/app/routes/

Length of output: 379


Action Required: Implement Debug Span Functionality

The file still contains a TODO comment for creating a debug span (line 28) and no implementation has been added—as confirmed by similar examples in related routes. Please either implement the debug span or remove the TODO comment if it's no longer applicable.

Comment on lines +31 to +37
if (!deployment) {
throw new Error("Deployment not found");
}

if (!deployment.worker) {
throw new Error("Worker not found");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling with proper status codes.

Instead of throwing generic errors, consider using structured responses with appropriate status codes for better API consistency.

    if (!deployment) {
-      throw new Error("Deployment not found");
+      throw new Response("Deployment not found", { status: 404 });
    }

    if (!deployment.worker) {
-      throw new Error("Worker not found");
+      throw new Response("Worker not found for this deployment", { status: 404 });
    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!deployment) {
throw new Error("Deployment not found");
}
if (!deployment.worker) {
throw new Error("Worker not found");
}
if (!deployment) {
throw new Response("Deployment not found", { status: 404 });
}
if (!deployment.worker) {
throw new Response("Worker not found for this deployment", { status: 404 });
}

Comment on lines +18 to +50
async ({
authenticatedWorker,
params,
body,
}): Promise<TypedResponse<WorkerApiSuspendRunResponseBody>> => {
const { runFriendlyId, snapshotFriendlyId } = params;

logger.debug("Suspending run", { runFriendlyId, snapshotFriendlyId, body });

if (!body.success) {
// TODO: we could create a debug span here
logger.error("Failed to suspend run", {
runFriendlyId,
snapshotFriendlyId,
error: body.error,
});

return json({ ok: true });
}

try {
await authenticatedWorker.createCheckpoint({
runFriendlyId,
snapshotFriendlyId,
checkpoint: body.checkpoint,
});

return json({ ok: true });
} catch (error) {
logger.error("Failed to suspend run", { runFriendlyId, snapshotFriendlyId, error });
throw error;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Clarify handling of failure states.
When body.success is false, the code logs an error but still returns { ok: true }. This might mask underlying issues, making debugging more difficult. Consider returning a non-200 response or including a more descriptive payload to reflect the failed suspension.

Comment on lines +9 to +16
const redis = new Redis({
port: env.RUN_ENGINE_DEV_PRESENCE_REDIS_PORT ?? undefined,
host: env.RUN_ENGINE_DEV_PRESENCE_REDIS_HOST ?? undefined,
username: env.RUN_ENGINE_DEV_PRESENCE_REDIS_USERNAME ?? undefined,
password: env.RUN_ENGINE_DEV_PRESENCE_REDIS_PASSWORD ?? undefined,
enableAutoPipelining: true,
...(env.RUN_ENGINE_DEV_PRESENCE_REDIS_TLS_DISABLED === "true" ? {} : { tls: {} }),
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider improved error handling when connecting to Redis
If Redis fails to connect due to incorrect host/port/password, the app could silently fail. Implementing retry logic or throwing an explicit error if the connection cannot be created would improve resiliency.

Comment on lines +74 to +101
}
);

async function getEnvVars(
environment: RuntimeEnvironment,
runId: string,
machinePreset: MachinePreset
): Promise<Record<string, string>> {
const variables = await resolveVariablesForEnvironment(environment);

const jwt = await generateJWTTokenForEnvironment(environment, {
run_id: runId,
machine_preset: machinePreset.name,
});

variables.push(
...[
{ key: "TRIGGER_JWT", value: jwt },
{ key: "TRIGGER_RUN_ID", value: runId },
{ key: "TRIGGER_MACHINE_PRESET", value: machinePreset.name },
]
);

return variables.reduce((acc: Record<string, string>, curr) => {
acc[curr.key] = curr.value;
return acc;
}, {});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Safeguard sensitive token data
You are generating JWT tokens and storing them in the environment variables array. Ensure they are not inadvertently logged or exposed through debug statements to avoid leaking tokens.

- logger.debug("Created environment vars:", variables);
+ // Avoid logging actual JWT tokens or any secret credentials
+ logger.debug("Created environment vars: [filtered for sensitive data]");

Committable suggestion skipped: line range outside the PR's diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants