Skip to content

Conversation

matt-aitken
Copy link
Member

No description provided.

Copy link

changeset-bot bot commented Aug 19, 2025

⚠️ No Changeset found

Latest commit: 1bac20b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

coderabbitai bot commented Aug 19, 2025

Walkthrough

Changes across the repo adjust logging behavior and error propagation without altering functional control flow or public APIs. Logger.error now checks for an ignoreError flag before invoking Logger.onError. Prisma writer error logs attach ignoreError: true. Multiple modules (runtimeEnvironment, eventRepository, machinePresets, runEngineHandlers, deliverAlert, createCheckpoint, cancelTaskRunV1, shared queue consumer, realtime streams, alerts worker entries, route loader) change log levels or the log method and, in one route, convert a rethrow to returning a structured 422 JSON error. Realtime ingest adds a special ECONNRESET branch. The Redis worker worker.ts adds a logErrors?: boolean catalog flag and centralizes per-item logging behavior and messages.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch false-error-downgrades

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
apps/webapp/app/db.server.ts (2)

308-317: Parity: also gate Prisma replica error events with ignoreError

Replica “error” events can be just as noisy as writer events. For consistency and to fully achieve the PR’s objective of reducing false alerts, add ignoreError: true to the replica $on("error") path as well.

Apply this diff:

     replicaClient.$on("error", (log) => {
       logger.error("PrismaClient error", {
         clientType: "reader",
         event: {
           timestamp: log.timestamp,
           message: log.message,
           target: log.target,
         },
+        ignoreError: true,
       });
     });

1-372: Add missing ignoreError: true to replica PrismaClient error log

The writer client’s error handler correctly sets ignoreError: true, but the replica client’s handler on line 309 does not. For consistency and to prevent alert noise, add ignoreError: true there.

• File: apps/webapp/app/db.server.ts
– Line 309: include ignoreError: true in the replica logger.error call

Suggested diff:

   replicaClient.$on("error", (log) => {
-    logger.error("PrismaClient error", {
+    logger.error("PrismaClient error", {
       clientType: "reader",
       event: {
         timestamp: log.timestamp,
         message: log.message,
         target: log.target,
       },
+      ignoreError: true,
     });
   });
apps/webapp/app/models/runtimeEnvironment.server.ts (1)

33-44: Potential null dereference when no environment is found

environment can be null from findFirst. Accessing environment.type will throw instead of returning null as per the function signature. Add an early null return before checking type.

Apply this diff:

-  //don't return deleted projects
-  if (environment?.project.deletedAt !== null) {
-    return null;
-  }
-
-  if (environment.type === "PREVIEW") {
+  // Don't return deleted projects
+  if (environment?.project.deletedAt !== null) {
+    return null;
+  }
+
+  // If no environment found, return null
+  if (!environment) {
+    return null;
+  }
+
+  if (environment.type === "PREVIEW") {
🧹 Nitpick comments (16)
internal-packages/run-engine/src/engine/machinePresets.ts (1)

28-31: Prefer logging parse issues and a minimal summary instead of the full config

Good downgrade to info. To improve signal while avoiding noisy/PII-prone payloads, log Zod issues and a compact config summary rather than the entire raw config.

Apply this diff:

-    logger.info("Failed to parse machine config", { config });
+    logger.info("Failed to parse machine config", {
+      // Log Zod issues for debuggability
+      issues: parsedConfig.error?.issues,
+      // Avoid dumping the entire config. Keep a tiny, useful summary.
+      configSummary:
+        typeof config === "object" && config !== null
+          ? {
+              hasPreset: "preset" in (config as any),
+              cpu: (config as any)?.cpu,
+              memory: (config as any)?.memory,
+            }
+          : { type: typeof config },
+    });
apps/webapp/app/v3/services/cancelTaskRunV1.server.ts (1)

46-49: Include friendlyId to ease log correlation (minor)

Including the friendly run ID alongside the internal ID helps correlate with user-facing references and other logs.

Apply this diff:

-      logger.info("Task run is not in a cancellable state", {
-        runId: taskRun.id,
-        status: taskRun.status,
-      });
+      logger.info("Task run is not in a cancellable state", {
+        runId: taskRun.id,
+        friendlyRunId: taskRun.friendlyId,
+        status: taskRun.status,
+      });
packages/core/src/logger.ts (1)

69-79: Optional: document and surface ignoreError behavior

Consider a brief doc comment on Logger.error explaining the ignoreError flag, since it now materially affects forwarding via onError. This reduces surprise for consumers adding structured args.

apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts (1)

187-194: Make ECONNRESET handling type-safe and add stream context to the log

The current "code" in error && error.code === "ECONNRESET" will still trip TS since Error doesn't declare code. Replace with a narrow type guard and include streamKey (and code) in the log context for faster triage.

Apply this diff within the catch:

-      if (error instanceof Error) {
-        if ("code" in error && error.code === "ECONNRESET") {
-          logger.info("[RealtimeStreams][ingestData] Connection reset during ingestData:", {
-            error,
-          });
-          return new Response(null, { status: 500 });
-        }
-      }
+      if (isConnResetError(error)) {
+        logger.info("[RealtimeStreams][ingestData] Connection reset during ingestData", {
+          streamKey,
+          code: error.code,
+          error,
+        });
+        return new Response(null, { status: 500 });
+      }

And add this helper (outside the function, e.g., near the top of the module):

function isConnResetError(e: unknown): e is NodeJS.ErrnoException {
  return e instanceof Error && "code" in (e as any) && (e as any).code === "ECONNRESET";
}
apps/webapp/app/v3/machinePresets.server.ts (1)

9-9: Downgrade to info makes sense; consider logging Zod issues at debug

Keeping this at info reduces noise. For debuggability, adding the Zod issues at debug level is low risk and helpful.

   if (!parsedConfig.success) {
     logger.info("Failed to parse machine config", { config });
+    logger.debug("Machine config parse issues", { issues: parsedConfig.error.issues });
apps/webapp/app/v3/eventRepository.server.ts (1)

1249-1251: Good downgrade; add flush/context fields to the log for easier correlation

Since this is a retriable path, the info level is appropriate. Add flushId, depth, and eventsCount to the structured log to correlate with spans and metrics.

-          logger.info("Failed to insert events, will attempt bisection", {
-            error: errorDetails,
-          });
+          logger.info("[EventRepository][doFlushBatch] Retriable insert failure; bisection next", {
+            error: errorDetails,
+            flushId,
+            depth,
+            eventsCount: events.length,
+          });
apps/webapp/app/v3/runEngineHandlers.server.ts (1)

385-409: Right-severity logging for oversized metadata; consider adding status code to context

Using warn for a known 413-style validation is appropriate. Consider including an explicit httpStatus: 413 (or equivalent) in the log context to make filtering/searching easier.

-        logger.warn("[runMetadataUpdated] Failed to update metadata, too large", {
-          taskRun: run.id,
+        logger.warn("[runMetadataUpdated] Failed to update metadata, too large", {
+          taskRun: run.id,
+          httpStatus: 413,
           error:
             e instanceof Error
               ? {
                 name: e.name,
                 message: e.message,
                 stack: e.stack,
               }
               : e,
         });
apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts (1)

30-36: Fix misleading log message; consider error payload typing for 422 responses

  • The log message says "Failed to suspend run" but this route is a "continue" endpoint.
  • Optional: the loader’s declared return type is TypedResponse, but you throw a { error: string } body on failures. If your route builder surfaces thrown JSON directly, this is fine at runtime; otherwise, you may want to widen the return type or return json(...) instead of throwing.

Suggested tweak:

- logger.warn("Failed to suspend run", { runFriendlyId, snapshotFriendlyId, error });
+ logger.warn("Failed to continue run", { runFriendlyId, snapshotFriendlyId, error });
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1)

624-626: Downgrading to info for expected “no matching deployment” is appropriate; consider adding run id for quicker triage

LGTM on severity. For easier log querying, you might include the run id alongside messageId (it’s already included in later attrs, just not in this log call).

- logger.info("No matching deployment found for task run", {
-   queueMessage: message.data,
-   messageId: message.messageId,
- });
+ logger.info("No matching deployment found for task run", {
+   queueMessage: message.data,
+   messageId: message.messageId,
+   runId: existingTaskRun.id,
+ });
packages/redis-worker/src/worker.ts (5)

39-41: Clarify JSDoc: errors are downgraded to info when logErrors is false

The current comment implies “no logging,” but the implementation still logs at info. Clarify intent to avoid confusion.

- /** Defaults to true. If false, errors will not be logged. */
+ /** Defaults to true. If false, errors will be logged at info level instead of error. */

546-552: Improve “no handler/catalog item found” logs (grammar and context)

Minor polish for readability and triage. Include the job id to make searching easier and adjust wording.

- this.logger.error(`Worker no handler found for job type: ${job}`);
+ this.logger.error("Worker: No handler found for job type", { job, id });
...
- this.logger.error(`Worker no catalog item found for job type: ${job}`);
+ this.logger.error("Worker: No catalog item found for job type", { job, id });

595-606: Centralized error logging looks good; enrich attributes

Good consolidation. Add attempt and deduplicationKey to logAttributes to make downstream analysis simpler.

   const logAttributes = {
     name: this.options.name,
     id,
     job,
     item,
     visibilityTimeoutMs,
     error,
     errorMessage,
+    attempt,
+    deduplicationKey,
   };

646-655: Reuse logAttributes for consistency (optional)

These attributes mirror logAttributes; consider reusing and extending it to keep logs uniform and reduce drift.

- this.logger.info(`Worker requeuing failed item with delay`, {
-   name: this.options.name,
-   id,
-   job,
-   item,
-   retryDate,
-   retryDelay,
-   visibilityTimeoutMs,
-   attempt: newAttempt,
- });
+ this.logger.info("Worker requeuing failed item with delay", {
+   ...logAttributes,
+   retryDate,
+   retryDelay,
+   attempt: newAttempt,
+ });

666-675: Consider applying logErrors gating to requeue failure as well

If requeue fails, you currently always log at error. If the catalog item opted out of error-level logs, mirror that choice here too.

- this.logger.error(
-   `Worker failed to requeue item. It will be retried after the visibility timeout.`,
-   {
-     name: this.options.name,
-     id,
-     job,
-     item,
-     visibilityTimeoutMs,
-     error: requeueError,
-   }
- );
+ const requeueLog = {
+   ...logAttributes,
+   error: requeueError,
+ };
+ if (shouldLogError) {
+   this.logger.error(
+     "Worker failed to requeue item. It will be retried after the visibility timeout.",
+     requeueLog
+   );
+ } else {
+   this.logger.info(
+     "Worker failed to requeue item. It will be retried after the visibility timeout.",
+     requeueLog
+   );
+ }
apps/webapp/app/v3/services/alerts/deliverAlert.server.ts (1)

929-935: Log the hex signature (or omit), not the raw ArrayBuffer

You compute signatureHex but log signature (ArrayBuffer), which is noisy and often useless in logs. Prefer logging signatureHex or drop it entirely.

- logger.info("[DeliverAlert] Failed to send alert webhook", {
+ logger.info("[DeliverAlert] Failed to send alert webhook", {
     status: response.status,
     statusText: response.statusText,
     url: webhook.url,
     body: payload,
-    signature,
+    signature: signatureHex,
   });
apps/webapp/app/v3/alertsWorker.server.ts (1)

46-46: Consider logging only on final attempt (feature follow-up)

If the current noise is mostly transient retries, consider an enhancement in the redis-worker API to support logging only on the last attempt (or providing an onError hook with attempt metadata) instead of a blanket suppression. This preserves signal for hard failures without spamming logs on transient ones.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b41129a and 1bac20b.

📒 Files selected for processing (16)
  • apps/webapp/app/db.server.ts (1 hunks)
  • apps/webapp/app/models/runtimeEnvironment.server.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts (1 hunks)
  • apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts (1 hunks)
  • apps/webapp/app/v3/alertsWorker.server.ts (3 hunks)
  • apps/webapp/app/v3/eventRepository.server.ts (1 hunks)
  • apps/webapp/app/v3/machinePresets.server.ts (1 hunks)
  • apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1 hunks)
  • apps/webapp/app/v3/runEngineHandlers.server.ts (2 hunks)
  • apps/webapp/app/v3/services/alerts/deliverAlert.server.ts (1 hunks)
  • apps/webapp/app/v3/services/cancelTaskRunV1.server.ts (1 hunks)
  • apps/webapp/app/v3/services/createCheckpoint.server.ts (2 hunks)
  • internal-packages/run-engine/src/engine/machinePresets.ts (1 hunks)
  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts (1 hunks)
  • packages/core/src/logger.ts (1 hunks)
  • packages/redis-worker/src/worker.ts (6 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations

Files:

  • internal-packages/run-engine/src/engine/machinePresets.ts
  • apps/webapp/app/db.server.ts
  • apps/webapp/app/v3/machinePresets.server.ts
  • apps/webapp/app/v3/services/cancelTaskRunV1.server.ts
  • apps/webapp/app/v3/alertsWorker.server.ts
  • apps/webapp/app/models/runtimeEnvironment.server.ts
  • apps/webapp/app/v3/eventRepository.server.ts
  • packages/core/src/logger.ts
  • apps/webapp/app/v3/services/createCheckpoint.server.ts
  • apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts
  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
  • apps/webapp/app/v3/runEngineHandlers.server.ts
  • apps/webapp/app/v3/services/alerts/deliverAlert.server.ts
  • apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts
  • packages/redis-worker/src/worker.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

We use zod a lot in packages/core and in the webapp

Files:

  • apps/webapp/app/db.server.ts
  • apps/webapp/app/v3/machinePresets.server.ts
  • apps/webapp/app/v3/services/cancelTaskRunV1.server.ts
  • apps/webapp/app/v3/alertsWorker.server.ts
  • apps/webapp/app/models/runtimeEnvironment.server.ts
  • apps/webapp/app/v3/eventRepository.server.ts
  • packages/core/src/logger.ts
  • apps/webapp/app/v3/services/createCheckpoint.server.ts
  • apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts
  • apps/webapp/app/v3/runEngineHandlers.server.ts
  • apps/webapp/app/v3/services/alerts/deliverAlert.server.ts
  • apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: In the webapp, all environment variables must be accessed through the env export of env.server.ts, instead of directly accessing process.env.
When importing from @trigger.dev/core in the webapp, never import from the root @trigger.dev/core path; always use one of the subpath exports as defined in the package's package.json.

Files:

  • apps/webapp/app/db.server.ts
  • apps/webapp/app/v3/machinePresets.server.ts
  • apps/webapp/app/v3/services/cancelTaskRunV1.server.ts
  • apps/webapp/app/v3/alertsWorker.server.ts
  • apps/webapp/app/models/runtimeEnvironment.server.ts
  • apps/webapp/app/v3/eventRepository.server.ts
  • apps/webapp/app/v3/services/createCheckpoint.server.ts
  • apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts
  • apps/webapp/app/v3/runEngineHandlers.server.ts
  • apps/webapp/app/v3/services/alerts/deliverAlert.server.ts
  • apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts
apps/webapp/app/services/**/*.server.ts

📄 CodeRabbit Inference Engine (.cursor/rules/webapp.mdc)

For testable services, separate service logic and configuration, as exemplified by realtimeClient.server.ts (service) and realtimeClientGlobal.server.ts (configuration).

Files:

  • apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts
🧠 Learnings (1)
📚 Learning: 2025-07-18T17:49:24.468Z
Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-18T17:49:24.468Z
Learning: Applies to internal-packages/database/**/*.{ts,tsx} : We use prisma in internal-packages/database for our database interactions using PostgreSQL

Applied to files:

  • apps/webapp/app/db.server.ts
🧬 Code Graph Analysis (7)
apps/webapp/app/v3/machinePresets.server.ts (1)
packages/core/src/v3/tracer.ts (1)
  • logger (56-64)
apps/webapp/app/models/runtimeEnvironment.server.ts (1)
apps/webapp/app/v3/tracer.server.ts (1)
  • logger (101-106)
apps/webapp/app/v3/eventRepository.server.ts (3)
packages/core/src/v3/tracer.ts (1)
  • logger (56-64)
apps/webapp/app/services/logger.server.ts (1)
  • logger (49-59)
apps/webapp/app/v3/tracer.server.ts (1)
  • logger (101-106)
apps/webapp/app/v3/services/createCheckpoint.server.ts (1)
apps/webapp/app/services/logger.server.ts (1)
  • logger (49-59)
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1)
apps/webapp/app/services/logger.server.ts (1)
  • logger (49-59)
apps/webapp/app/v3/runEngineHandlers.server.ts (1)
apps/webapp/app/utils/packets.ts (1)
  • MetadataTooLargeError (4-9)
apps/webapp/app/v3/services/alerts/deliverAlert.server.ts (1)
packages/trigger-sdk/src/v3/index.ts (1)
  • logger (39-39)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (9)
apps/webapp/app/db.server.ts (1)

196-206: LGTM on gating Prisma writer error with ignoreError

Adding ignoreError: true to the Prisma writer error event aligns with the new Logger.onError gating and should reduce false alerts without losing error visibility.

apps/webapp/app/v3/services/cancelTaskRunV1.server.ts (1)

46-49: LGTM on log level downgrade

Downgrading this non-actionable path to info makes sense. It’s a frequent state check, not an exception.

apps/webapp/app/models/runtimeEnvironment.server.ts (1)

40-43: LGTM on downgrading to warn

Using warn here is appropriate: it’s a client input gap rather than a server-side failure.

apps/webapp/app/v3/runEngineHandlers.server.ts (1)

20-20: Import addition looks correct

Importing MetadataTooLargeError from packets aligns with the new conditional logging path.

internal-packages/run-engine/src/engine/systems/dequeueSystem.ts (2)

138-144: Structured logging improvement is a win

Switching to a concise message with structured context (runId, snapshotId, executionStatus) improves log queryability and reduces duplication.


147-153: Consistent structured context on warn path

Same here—moving details to the context payload keeps messages readable and machine-parsable.

apps/webapp/app/v3/services/createCheckpoint.server.ts (2)

135-139: LGTM: log level downgrade for already-resumed child run

Logging as info for an expected pre-check is the right call; return shape remains consistent.


171-175: LGTM: log level downgrade for already-resumed batch

Same here—appropriate severity reduction for a non-exceptional condition.

packages/redis-worker/src/worker.ts (1)

624-634: LGTM: DLQ transition logging respects per-item logErrors

Conditionally downgrading to info when logErrors is false matches the catalog contract and reduces noise while preserving context.

@ericallam ericallam changed the title More false errors turned to logs chore(webapp): More false errors turned to logs Aug 19, 2025
@matt-aitken matt-aitken force-pushed the false-error-downgrades branch from 1bac20b to d402945 Compare August 20, 2025 08:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal-packages/run-engine/src/engine/machinePresets.ts (1)

69-74: “Smallest preset” selection is order-dependent; sort to actually choose the smallest.

Current iteration depends on object insertion order and may pick a larger-than-necessary machine. Sort by cpu then memory to honor the function’s contract and reduce cost.

 function derivePresetNameFromValues(
   machines: Record<string, MachinePreset>,
   cpu: number,
   memory: number
 ): MachinePresetName | undefined {
-  for (const [name, preset] of Object.entries(machines)) {
+  const sorted = Object.entries(machines).sort(([, a], [, b]) => {
+    if (a.cpu === b.cpu) return a.memory - b.memory;
+    return a.cpu - b.cpu;
+  });
+  for (const [name, preset] of sorted) {
     if (preset.cpu >= cpu && preset.memory >= memory) {
       return name as MachinePresetName;
     }
   }
 }
🧹 Nitpick comments (13)
internal-packages/run-engine/src/engine/machinePresets.ts (7)

28-31: Avoid logging the entire config; log structured validation errors and high-level shape instead.

Full config objects can be large and may contain sensitive data. Prefer Zod issue summaries and config keys for context.

Apply this diff to improve signal and reduce risk:

-  if (!parsedConfig.success) {
-    logger.info("Failed to parse machine config", { config });
+  if (!parsedConfig.success) {
+    logger.info("Failed to parse machine config", {
+      errorIssues: parsedConfig.error.issues.map((i) => ({
+        path: i.path.join("."),
+        code: i.code,
+      })),
+      configKeys:
+        typeof config === "object" && config !== null
+          ? Object.keys(config as Record<string, unknown>)
+          : undefined,
+    });

20-21: Consider downgrading this parse failure to info for consistency.

You’ve treated config parse failures as informational. The preset parse failure here seems similar and may also be a frequent, non-exceptional case.

Apply this diff if you agree:

-      logger.error("Failed to parse machine preset", { machinePreset: run.machinePreset });
+      logger.info("Failed to parse machine preset", { machinePreset: run.machinePreset });

31-31: Prefer the provided defaultMachine over a hardcoded fallback.

Using the caller-provided default keeps behavior consistent with environments that may not include "small-1x".

Proposed change:

-    return machinePresetFromName(machines, "small-1x");
+    return machinePresetFromName(machines, defaultMachine);

If "small-1x" is the intentional canonical fallback across the platform, feel free to ignore.


51-51: Same here: use defaultMachine instead of hardcoding "small-1x".

Keeps fallbacks consistent across all branches.

-  return machinePresetFromName(machines, "small-1x");
+  return machinePresetFromName(machines, defaultMachine);

38-49: Don’t use truthiness for numeric checks (0 is falsy).

Be explicit about undefined/null to avoid skipping valid values.

-  if (parsedConfig.data.cpu && parsedConfig.data.memory) {
+  if (parsedConfig.data.cpu !== undefined && parsedConfig.data.memory !== undefined) {

58-61: Add a guard to produce a clear error if the preset name is unknown.

Spreading an undefined value throws a confusing TypeError. Fail fast with a clear message.

 export function machinePresetFromName(
   machines: Record<string, MachinePreset>,
   name: MachinePresetName
 ): MachinePreset {
-  return {
-    ...machines[name],
-  };
+  const preset = machines[name];
+  if (!preset) {
+    throw new Error(`Unknown machine preset: ${name}`);
+  }
+  return { ...preset };
 }

4-4: Nit: align logger name with the exported function for easier log tracing.

Optional, but it helps when correlating call sites and logs.

-const logger = new Logger("machinePresetFromConfig");
+const logger = new Logger("getMachinePreset");
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1)

624-629: Enrich the info log and tighten the inline comment

Add commonly-needed fields to the log for quicker triage, and reword the comment to reflect the actual flow (we’re about to mark it as WAITING_FOR_DEPLOY here).

Apply this diff:

-      // This happens when a run is "WAITING_FOR_DEPLOY" and is expected
-      logger.info("No matching deployment found for task run", {
-        queueMessage: message.data,
-        messageId: message.messageId,
-      });
+      // No matching deployment found; mark run as "WAITING_FOR_DEPLOY" (expected during deploy gaps)
+      logger.info("No matching deployment found for task run", {
+        queueMessage: message.data,
+        messageId: message.messageId,
+        runId: existingTaskRun.id,
+        runStatus: existingTaskRun.status,
+        lockedById: existingTaskRun.lockedById ?? undefined,
+        lockedToVersionId: existingTaskRun.lockedToVersionId ?? undefined,
+        environmentId: existingTaskRun.runtimeEnvironmentId,
+      });
apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts (5)

187-194: LGTM on downgrading ECONNRESET to info; consider a small type-guard + context tweak

This achieves the PR goal (reducing false errors) without changing control flow. To improve readability and add better diagnostics, consider:

  • Using a narrow type-guard for ECONNRESET instead of the nested "code" in error checks.
  • Including streamKey/runId/streamId in the info log for correlation.

Apply this diff within the catch block:

-      if (error instanceof Error) {
-        if ("code" in error && error.code === "ECONNRESET") {
-          logger.info("[RealtimeStreams][ingestData] Connection reset during ingestData:", {
-            error,
-          });
-          return new Response(null, { status: 500 });
-        }
-      }
+      if (isConnReset(error)) {
+        logger.info("[RealtimeStreams][ingestData] Connection reset during ingestData", {
+          error,
+          streamKey,
+          runId,
+          streamId,
+        });
+        return new Response(null, { status: 500 });
+      }

Add this helper near the top of the file (outside the selected range):

type ErrorWithCode = Error & { code?: string | number };

function isConnReset(e: unknown): e is ErrorWithCode {
  return e instanceof Error && (e as ErrorWithCode).code === "ECONNRESET";
}

110-114: Prefer logger over console.error in cleanup to keep logging consistent

Using the shared logger keeps formatting consistent and lets you attach metadata (e.g., ignoreError).

-      await redis.quit().catch(console.error);
+      await redis.quit().catch((error) =>
+        logger.info(
+          "[RealtimeStreams][streamResponse] Error during Redis quit in cleanup",
+          { error, ignoreError: true }
+        )
+      );

136-140: Demote cleanup failures to info and mark as ignored to reduce noise

Cleanup failures are non-critical here. Logging them at error can inflate error budgets. Keep them, but at info with ignoreError for consistency with the PR’s goal.

-      } catch (error) {
-        logger.error("[RedisRealtimeStreams][ingestData] Error in cleanup:", { error });
-      }
+      } catch (error) {
+        logger.info("[RedisRealtimeStreams][ingestData] Error in cleanup:", {
+          error,
+          ignoreError: true,
+        });
+      }

144-169: Release the reader lock after the read loop

After you finish reading, explicitly release the reader lock to help avoid holding resources longer than necessary.

       while (true) {
         const { done, value } = await reader.read();

         if (done || !value) {
           break;
         }

         logger.debug("[RedisRealtimeStreams][ingestData] Reading data", {
           streamKey,
           runId,
           value,
         });

         await redis.xadd(
           streamKey,
           "MAXLEN",
           "~",
           String(env.REALTIME_STREAM_MAX_LENGTH),
           "*",
           "data",
           value
         );
       }
+
+      // Ensure the lock on the reader is released before proceeding.
+      reader.releaseLock();

187-194: Consider returning 204/200 instead of 500 for ECONNRESET in ingestData

ECONNRESET usually indicates the client hung up mid-upload. Returning a 5xx here may trigger upstream retries or alerting even though there’s nothing actionable. Switching to a 204 (or 200) can reduce noise if callers treat non-2xx as retryable.

Affected location:

  • apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts (ingestData, lines 187–194)

Proposed diff:

- if (error instanceof Error && "code" in error && error.code === "ECONNRESET") {
-   logger.info("[RealtimeStreams][ingestData] Connection reset during ingestData:", { error });
-   return new Response(null, { status: 500 });
- }
+ if (error instanceof Error && "code" in error && error.code === "ECONNRESET") {
+   logger.info("[RealtimeStreams][ingestData] Connection reset during ingestData:", { error });
+   // Client disconnected—return 204 to avoid unnecessary retries/alerts
+   return new Response(null, { status: 204 });
+ }

• Verify how your callers treat 2xx vs 5xx on this endpoint (e.g. retries, error reporting) before rolling out.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1bac20b and d402945.

📒 Files selected for processing (16)
  • apps/webapp/app/db.server.ts (1 hunks)
  • apps/webapp/app/models/runtimeEnvironment.server.ts (1 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts (1 hunks)
  • apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts (1 hunks)
  • apps/webapp/app/v3/alertsWorker.server.ts (3 hunks)
  • apps/webapp/app/v3/eventRepository.server.ts (1 hunks)
  • apps/webapp/app/v3/machinePresets.server.ts (1 hunks)
  • apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1 hunks)
  • apps/webapp/app/v3/runEngineHandlers.server.ts (2 hunks)
  • apps/webapp/app/v3/services/alerts/deliverAlert.server.ts (1 hunks)
  • apps/webapp/app/v3/services/cancelTaskRunV1.server.ts (1 hunks)
  • apps/webapp/app/v3/services/createCheckpoint.server.ts (2 hunks)
  • internal-packages/run-engine/src/engine/machinePresets.ts (1 hunks)
  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts (1 hunks)
  • packages/core/src/logger.ts (1 hunks)
  • packages/redis-worker/src/worker.ts (6 hunks)
✅ Files skipped from review due to trivial changes (1)
  • apps/webapp/app/v3/services/cancelTaskRunV1.server.ts
🚧 Files skipped from review as they are similar to previous changes (12)
  • apps/webapp/app/v3/runEngineHandlers.server.ts
  • internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
  • apps/webapp/app/v3/services/alerts/deliverAlert.server.ts
  • apps/webapp/app/v3/machinePresets.server.ts
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts
  • apps/webapp/app/models/runtimeEnvironment.server.ts
  • apps/webapp/app/db.server.ts
  • packages/redis-worker/src/worker.ts
  • apps/webapp/app/v3/alertsWorker.server.ts
  • apps/webapp/app/v3/eventRepository.server.ts
  • apps/webapp/app/v3/services/createCheckpoint.server.ts
  • packages/core/src/logger.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations

Files:

  • internal-packages/run-engine/src/engine/machinePresets.ts
  • apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts
  • apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

We use zod a lot in packages/core and in the webapp

Files:

  • apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts
  • apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: In the webapp, all environment variables must be accessed through the env export of env.server.ts, instead of directly accessing process.env.
When importing from @trigger.dev/core in the webapp, never import from the root @trigger.dev/core path; always use one of the subpath exports as defined in the package's package.json.

Files:

  • apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts
  • apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts
apps/webapp/app/services/**/*.server.ts

📄 CodeRabbit Inference Engine (.cursor/rules/webapp.mdc)

For testable services, separate service logic and configuration, as exemplified by realtimeClient.server.ts (service) and realtimeClientGlobal.server.ts (configuration).

Files:

  • apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts
🧬 Code Graph Analysis (1)
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1)
apps/webapp/app/services/logger.server.ts (1)
  • logger (49-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (4)
internal-packages/run-engine/src/engine/machinePresets.ts (2)

29-29: Downgrading to info is appropriate and aligns with PR intent.

Good call reducing noise for non-exceptional validation failures.


1-75: Fallback safe: "small-1x" present in all machine maps
Verified that every machines map—including run-engine test fixtures, core schema defaults, and webapp presets—defines an entry for "small-1x". The hard-coded fallback will not cause missing-key errors.

apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (2)

624-642: LGTM: Downgrade to info + WAITING_FOR_DEPLOY ack looks right for expected gaps

This aligns with the PR goal of reducing noisy errors when a deploy isn’t available yet, and setting the run to WAITING_FOR_DEPLOY matches the intended state transition.


630-641: Re-enqueue Path Exists for WAITING_FOR_DEPLOY

We’ve confirmed that WAITING_FOR_DEPLOY runs are picked up and re-enqueued by a dedicated background service:

• apps/webapp/app/v3/services/executeTasksWaitingForDeploy.ts (around lines 37–43)
– Queries for runs where status = "WAITING_FOR_DEPLOY"
– Calls marqs.enqueueMessage(...) for each run to put it back on the shared queue

Because this service handles all ACKed WAITING_FOR_DEPLOY runs, the ack_and_do_more_work path in sharedQueueConsumer.server.ts will not starve tasks.

@matt-aitken matt-aitken merged commit 4264fc0 into main Aug 20, 2025
56 of 57 checks passed
@matt-aitken matt-aitken deleted the false-error-downgrades branch August 20, 2025 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants