chore(webapp): More false errors turned to logs #2422

matt-aitken · 2025-08-19T17:01:56Z

No description provided.

changeset-bot · 2025-08-19T17:02:00Z

⚠️ No Changeset found

Latest commit: 1bac20b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2025-08-19T17:02:03Z

Walkthrough

Changes across the repo adjust logging behavior and error propagation without altering functional control flow or public APIs. Logger.error now checks for an ignoreError flag before invoking Logger.onError. Prisma writer error logs attach ignoreError: true. Multiple modules (runtimeEnvironment, eventRepository, machinePresets, runEngineHandlers, deliverAlert, createCheckpoint, cancelTaskRunV1, shared queue consumer, realtime streams, alerts worker entries, route loader) change log levels or the log method and, in one route, convert a rethrow to returning a structured 422 JSON error. Realtime ingest adds a special ECONNRESET branch. The Redis worker worker.ts adds a logErrors?: boolean catalog flag and centralizes per-item logging behavior and messages.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch false-error-downgrades

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

…rovided issue

… have errors for complete failures

…or undefined

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

apps/webapp/app/db.server.ts (2)
308-317: Parity: also gate Prisma replica error events with ignoreError

Replica “error” events can be just as noisy as writer events. For consistency and to fully achieve the PR’s objective of reducing false alerts, add ignoreError: true to the replica $on("error") path as well.

Apply this diff:
     replicaClient.$on("error", (log) => {
       logger.error("PrismaClient error", {
         clientType: "reader",
         event: {
           timestamp: log.timestamp,
           message: log.message,
           target: log.target,
         },
+        ignoreError: true,
       });
     });
1-372: Add missing ignoreError: true to replica PrismaClient error log

The writer client’s error handler correctly sets ignoreError: true, but the replica client’s handler on line 309 does not. For consistency and to prevent alert noise, add ignoreError: true there.

• File: apps/webapp/app/db.server.ts
– Line 309: include ignoreError: true in the replica logger.error call

Suggested diff:
   replicaClient.$on("error", (log) => {
-    logger.error("PrismaClient error", {
+    logger.error("PrismaClient error", {
       clientType: "reader",
       event: {
         timestamp: log.timestamp,
         message: log.message,
         target: log.target,
       },
+      ignoreError: true,
     });
   });
apps/webapp/app/models/runtimeEnvironment.server.ts (1)
33-44: Potential null dereference when no environment is found

environment can be null from findFirst. Accessing environment.type will throw instead of returning null as per the function signature. Add an early null return before checking type.

Apply this diff:
-  //don't return deleted projects
-  if (environment?.project.deletedAt !== null) {
-    return null;
-  }
-
-  if (environment.type === "PREVIEW") {
+  // Don't return deleted projects
+  if (environment?.project.deletedAt !== null) {
+    return null;
+  }
+
+  // If no environment found, return null
+  if (!environment) {
+    return null;
+  }
+
+  if (environment.type === "PREVIEW") {

🧹 Nitpick comments (16)

internal-packages/run-engine/src/engine/machinePresets.ts (1)
28-31: Prefer logging parse issues and a minimal summary instead of the full config

Good downgrade to info. To improve signal while avoiding noisy/PII-prone payloads, log Zod issues and a compact config summary rather than the entire raw config.

Apply this diff:
-    logger.info("Failed to parse machine config", { config });
+    logger.info("Failed to parse machine config", {
+      // Log Zod issues for debuggability
+      issues: parsedConfig.error?.issues,
+      // Avoid dumping the entire config. Keep a tiny, useful summary.
+      configSummary:
+        typeof config === "object" && config !== null
+          ? {
+              hasPreset: "preset" in (config as any),
+              cpu: (config as any)?.cpu,
+              memory: (config as any)?.memory,
+            }
+          : { type: typeof config },
+    });
apps/webapp/app/v3/services/cancelTaskRunV1.server.ts (1)
46-49: Include friendlyId to ease log correlation (minor)

Including the friendly run ID alongside the internal ID helps correlate with user-facing references and other logs.

Apply this diff:
-      logger.info("Task run is not in a cancellable state", {
-        runId: taskRun.id,
-        status: taskRun.status,
-      });
+      logger.info("Task run is not in a cancellable state", {
+        runId: taskRun.id,
+        friendlyRunId: taskRun.friendlyId,
+        status: taskRun.status,
+      });
packages/core/src/logger.ts (1)

69-79: Optional: document and surface ignoreError behavior

Consider a brief doc comment on Logger.error explaining the ignoreError flag, since it now materially affects forwarding via onError. This reduces surprise for consumers adding structured args.
apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts (1)
187-194: Make ECONNRESET handling type-safe and add stream context to the log

The current "code" in error && error.code === "ECONNRESET" will still trip TS since Error doesn't declare code. Replace with a narrow type guard and include streamKey (and code) in the log context for faster triage.

Apply this diff within the catch:
-      if (error instanceof Error) {
-        if ("code" in error && error.code === "ECONNRESET") {
-          logger.info("[RealtimeStreams][ingestData] Connection reset during ingestData:", {
-            error,
-          });
-          return new Response(null, { status: 500 });
-        }
-      }
+      if (isConnResetError(error)) {
+        logger.info("[RealtimeStreams][ingestData] Connection reset during ingestData", {
+          streamKey,
+          code: error.code,
+          error,
+        });
+        return new Response(null, { status: 500 });
+      }
And add this helper (outside the function, e.g., near the top of the module):
function isConnResetError(e: unknown): e is NodeJS.ErrnoException {
  return e instanceof Error && "code" in (e as any) && (e as any).code === "ECONNRESET";
}
apps/webapp/app/v3/machinePresets.server.ts (1)
9-9: Downgrade to info makes sense; consider logging Zod issues at debug

Keeping this at info reduces noise. For debuggability, adding the Zod issues at debug level is low risk and helpful.
   if (!parsedConfig.success) {
     logger.info("Failed to parse machine config", { config });
+    logger.debug("Machine config parse issues", { issues: parsedConfig.error.issues });
apps/webapp/app/v3/eventRepository.server.ts (1)
1249-1251: Good downgrade; add flush/context fields to the log for easier correlation

Since this is a retriable path, the info level is appropriate. Add flushId, depth, and eventsCount to the structured log to correlate with spans and metrics.
-          logger.info("Failed to insert events, will attempt bisection", {
-            error: errorDetails,
-          });
+          logger.info("[EventRepository][doFlushBatch] Retriable insert failure; bisection next", {
+            error: errorDetails,
+            flushId,
+            depth,
+            eventsCount: events.length,
+          });
apps/webapp/app/v3/runEngineHandlers.server.ts (1)
385-409: Right-severity logging for oversized metadata; consider adding status code to context

Using warn for a known 413-style validation is appropriate. Consider including an explicit httpStatus: 413 (or equivalent) in the log context to make filtering/searching easier.
-        logger.warn("[runMetadataUpdated] Failed to update metadata, too large", {
-          taskRun: run.id,
+        logger.warn("[runMetadataUpdated] Failed to update metadata, too large", {
+          taskRun: run.id,
+          httpStatus: 413,
           error:
             e instanceof Error
               ? {
                 name: e.name,
                 message: e.message,
                 stack: e.stack,
               }
               : e,
         });
apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts (1)
30-36: Fix misleading log message; consider error payload typing for 422 responses

The log message says "Failed to suspend run" but this route is a "continue" endpoint.

Optional: the loader’s declared return type is TypedResponse, but you throw a { error: string } body on failures. If your route builder surfaces thrown JSON directly, this is fine at runtime; otherwise, you may want to widen the return type or return json(...) instead of throwing.

Suggested tweak:
- logger.warn("Failed to suspend run", { runFriendlyId, snapshotFriendlyId, error });
+ logger.warn("Failed to continue run", { runFriendlyId, snapshotFriendlyId, error });
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1)
624-626: Downgrading to info for expected “no matching deployment” is appropriate; consider adding run id for quicker triage

LGTM on severity. For easier log querying, you might include the run id alongside messageId (it’s already included in later attrs, just not in this log call).
- logger.info("No matching deployment found for task run", {
-   queueMessage: message.data,
-   messageId: message.messageId,
- });
+ logger.info("No matching deployment found for task run", {
+   queueMessage: message.data,
+   messageId: message.messageId,
+   runId: existingTaskRun.id,
+ });
packages/redis-worker/src/worker.ts (5)
39-41: Clarify JSDoc: errors are downgraded to info when logErrors is false

The current comment implies “no logging,” but the implementation still logs at info. Clarify intent to avoid confusion.
- /** Defaults to true. If false, errors will not be logged. */
+ /** Defaults to true. If false, errors will be logged at info level instead of error. */
546-552: Improve “no handler/catalog item found” logs (grammar and context)

Minor polish for readability and triage. Include the job id to make searching easier and adjust wording.
- this.logger.error(`Worker no handler found for job type: ${job}`);
+ this.logger.error("Worker: No handler found for job type", { job, id });
...
- this.logger.error(`Worker no catalog item found for job type: ${job}`);
+ this.logger.error("Worker: No catalog item found for job type", { job, id });
595-606: Centralized error logging looks good; enrich attributes

Good consolidation. Add attempt and deduplicationKey to logAttributes to make downstream analysis simpler.
   const logAttributes = {
     name: this.options.name,
     id,
     job,
     item,
     visibilityTimeoutMs,
     error,
     errorMessage,
+    attempt,
+    deduplicationKey,
   };
646-655: Reuse logAttributes for consistency (optional)

These attributes mirror logAttributes; consider reusing and extending it to keep logs uniform and reduce drift.
- this.logger.info(`Worker requeuing failed item with delay`, {
-   name: this.options.name,
-   id,
-   job,
-   item,
-   retryDate,
-   retryDelay,
-   visibilityTimeoutMs,
-   attempt: newAttempt,
- });
+ this.logger.info("Worker requeuing failed item with delay", {
+   ...logAttributes,
+   retryDate,
+   retryDelay,
+   attempt: newAttempt,
+ });
666-675: Consider applying logErrors gating to requeue failure as well

If requeue fails, you currently always log at error. If the catalog item opted out of error-level logs, mirror that choice here too.
- this.logger.error(
-   `Worker failed to requeue item. It will be retried after the visibility timeout.`,
-   {
-     name: this.options.name,
-     id,
-     job,
-     item,
-     visibilityTimeoutMs,
-     error: requeueError,
-   }
- );
+ const requeueLog = {
+   ...logAttributes,
+   error: requeueError,
+ };
+ if (shouldLogError) {
+   this.logger.error(
+     "Worker failed to requeue item. It will be retried after the visibility timeout.",
+     requeueLog
+   );
+ } else {
+   this.logger.info(
+     "Worker failed to requeue item. It will be retried after the visibility timeout.",
+     requeueLog
+   );
+ }
apps/webapp/app/v3/services/alerts/deliverAlert.server.ts (1)
929-935: Log the hex signature (or omit), not the raw ArrayBuffer

You compute signatureHex but log signature (ArrayBuffer), which is noisy and often useless in logs. Prefer logging signatureHex or drop it entirely.
- logger.info("[DeliverAlert] Failed to send alert webhook", {
+ logger.info("[DeliverAlert] Failed to send alert webhook", {
     status: response.status,
     statusText: response.statusText,
     url: webhook.url,
     body: payload,
-    signature,
+    signature: signatureHex,
   });
apps/webapp/app/v3/alertsWorker.server.ts (1)

46-46: Consider logging only on final attempt (feature follow-up)

If the current noise is mostly transient retries, consider an enhancement in the redis-worker API to support logging only on the last attempt (or providing an onError hook with attempt metadata) instead of a blanket suppression. This preserves signal for hard failures without spamming logs on transient ones.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b41129a and 1bac20b.

📒 Files selected for processing (16)

apps/webapp/app/db.server.ts (1 hunks)
apps/webapp/app/models/runtimeEnvironment.server.ts (1 hunks)
apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts (1 hunks)
apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts (1 hunks)
apps/webapp/app/v3/alertsWorker.server.ts (3 hunks)
apps/webapp/app/v3/eventRepository.server.ts (1 hunks)
apps/webapp/app/v3/machinePresets.server.ts (1 hunks)
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1 hunks)
apps/webapp/app/v3/runEngineHandlers.server.ts (2 hunks)
apps/webapp/app/v3/services/alerts/deliverAlert.server.ts (1 hunks)
apps/webapp/app/v3/services/cancelTaskRunV1.server.ts (1 hunks)
apps/webapp/app/v3/services/createCheckpoint.server.ts (2 hunks)
internal-packages/run-engine/src/engine/machinePresets.ts (1 hunks)
internal-packages/run-engine/src/engine/systems/dequeueSystem.ts (1 hunks)
packages/core/src/logger.ts (1 hunks)
packages/redis-worker/src/worker.ts (6 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations

Files:

internal-packages/run-engine/src/engine/machinePresets.ts
apps/webapp/app/db.server.ts
apps/webapp/app/v3/machinePresets.server.ts
apps/webapp/app/v3/services/cancelTaskRunV1.server.ts
apps/webapp/app/v3/alertsWorker.server.ts
apps/webapp/app/models/runtimeEnvironment.server.ts
apps/webapp/app/v3/eventRepository.server.ts
packages/core/src/logger.ts
apps/webapp/app/v3/services/createCheckpoint.server.ts
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts
internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/v3/services/alerts/deliverAlert.server.ts
apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts
apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts
packages/redis-worker/src/worker.ts

{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

We use zod a lot in packages/core and in the webapp

Files:

apps/webapp/app/db.server.ts
apps/webapp/app/v3/machinePresets.server.ts
apps/webapp/app/v3/services/cancelTaskRunV1.server.ts
apps/webapp/app/v3/alertsWorker.server.ts
apps/webapp/app/models/runtimeEnvironment.server.ts
apps/webapp/app/v3/eventRepository.server.ts
packages/core/src/logger.ts
apps/webapp/app/v3/services/createCheckpoint.server.ts
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/v3/services/alerts/deliverAlert.server.ts
apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts
apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts

apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: In the webapp, all environment variables must be accessed through the env export of env.server.ts, instead of directly accessing process.env.
When importing from @trigger.dev/core in the webapp, never import from the root @trigger.dev/core path; always use one of the subpath exports as defined in the package's package.json.

Files:

apps/webapp/app/db.server.ts
apps/webapp/app/v3/machinePresets.server.ts
apps/webapp/app/v3/services/cancelTaskRunV1.server.ts
apps/webapp/app/v3/alertsWorker.server.ts
apps/webapp/app/models/runtimeEnvironment.server.ts
apps/webapp/app/v3/eventRepository.server.ts
apps/webapp/app/v3/services/createCheckpoint.server.ts
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/v3/services/alerts/deliverAlert.server.ts
apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts
apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts

apps/webapp/app/services/**/*.server.ts

📄 CodeRabbit Inference Engine (.cursor/rules/webapp.mdc)

For testable services, separate service logic and configuration, as exemplified by realtimeClient.server.ts (service) and realtimeClientGlobal.server.ts (configuration).

Files:

apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts

🧠 Learnings (1)

📚 Learning: 2025-07-18T17:49:24.468Z

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-18T17:49:24.468Z
Learning: Applies to internal-packages/database/**/*.{ts,tsx} : We use prisma in internal-packages/database for our database interactions using PostgreSQL

Applied to files:

apps/webapp/app/db.server.ts

🧬 Code Graph Analysis (7)

apps/webapp/app/v3/machinePresets.server.ts (1)

packages/core/src/v3/tracer.ts (1)

logger (56-64)

apps/webapp/app/models/runtimeEnvironment.server.ts (1)

apps/webapp/app/v3/tracer.server.ts (1)

logger (101-106)

apps/webapp/app/v3/eventRepository.server.ts (3)

packages/core/src/v3/tracer.ts (1)

logger (56-64)

apps/webapp/app/services/logger.server.ts (1)

logger (49-59)

apps/webapp/app/v3/tracer.server.ts (1)

logger (101-106)

apps/webapp/app/v3/services/createCheckpoint.server.ts (1)

apps/webapp/app/services/logger.server.ts (1)

logger (49-59)

apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1)

apps/webapp/app/services/logger.server.ts (1)

logger (49-59)

apps/webapp/app/v3/runEngineHandlers.server.ts (1)

apps/webapp/app/utils/packets.ts (1)

MetadataTooLargeError (4-9)

apps/webapp/app/v3/services/alerts/deliverAlert.server.ts (1)

packages/trigger-sdk/src/v3/index.ts (1)

logger (39-39)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)

GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: typecheck / typecheck
GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (9)

apps/webapp/app/db.server.ts (1)

196-206: LGTM on gating Prisma writer error with ignoreError

Adding ignoreError: true to the Prisma writer error event aligns with the new Logger.onError gating and should reduce false alerts without losing error visibility.

apps/webapp/app/v3/services/cancelTaskRunV1.server.ts (1)

46-49: LGTM on log level downgrade

Downgrading this non-actionable path to info makes sense. It’s a frequent state check, not an exception.

apps/webapp/app/models/runtimeEnvironment.server.ts (1)

40-43: LGTM on downgrading to warn

Using warn here is appropriate: it’s a client input gap rather than a server-side failure.

apps/webapp/app/v3/runEngineHandlers.server.ts (1)

20-20: Import addition looks correct

Importing MetadataTooLargeError from packets aligns with the new conditional logging path.

internal-packages/run-engine/src/engine/systems/dequeueSystem.ts (2)

138-144: Structured logging improvement is a win

Switching to a concise message with structured context (runId, snapshotId, executionStatus) improves log queryability and reduces duplication.

147-153: Consistent structured context on warn path

Same here—moving details to the context payload keeps messages readable and machine-parsable.

apps/webapp/app/v3/services/createCheckpoint.server.ts (2)

135-139: LGTM: log level downgrade for already-resumed child run

Logging as info for an expected pre-check is the right call; return shape remains consistent.

171-175: LGTM: log level downgrade for already-resumed batch

Same here—appropriate severity reduction for a non-exceptional condition.

packages/redis-worker/src/worker.ts (1)

624-634: LGTM: DLQ transition logging respects per-item logErrors

Conditionally downgrading to info when logErrors is false matches the catalog contract and reduces noise while preserving context.

apps/webapp/app/v3/alertsWorker.server.ts

packages/core/src/logger.ts

apps/webapp/app/db.server.ts

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal-packages/run-engine/src/engine/machinePresets.ts (1)
69-74: “Smallest preset” selection is order-dependent; sort to actually choose the smallest.

Current iteration depends on object insertion order and may pick a larger-than-necessary machine. Sort by cpu then memory to honor the function’s contract and reduce cost.
 function derivePresetNameFromValues(
   machines: Record<string, MachinePreset>,
   cpu: number,
   memory: number
 ): MachinePresetName | undefined {
-  for (const [name, preset] of Object.entries(machines)) {
+  const sorted = Object.entries(machines).sort(([, a], [, b]) => {
+    if (a.cpu === b.cpu) return a.memory - b.memory;
+    return a.cpu - b.cpu;
+  });
+  for (const [name, preset] of sorted) {
     if (preset.cpu >= cpu && preset.memory >= memory) {
       return name as MachinePresetName;
     }
   }
 }

🧹 Nitpick comments (13)

internal-packages/run-engine/src/engine/machinePresets.ts (7)
28-31: Avoid logging the entire config; log structured validation errors and high-level shape instead.

Full config objects can be large and may contain sensitive data. Prefer Zod issue summaries and config keys for context.

Apply this diff to improve signal and reduce risk:
-  if (!parsedConfig.success) {
-    logger.info("Failed to parse machine config", { config });
+  if (!parsedConfig.success) {
+    logger.info("Failed to parse machine config", {
+      errorIssues: parsedConfig.error.issues.map((i) => ({
+        path: i.path.join("."),
+        code: i.code,
+      })),
+      configKeys:
+        typeof config === "object" && config !== null
+          ? Object.keys(config as Record<string, unknown>)
+          : undefined,
+    });
20-21: Consider downgrading this parse failure to info for consistency.

You’ve treated config parse failures as informational. The preset parse failure here seems similar and may also be a frequent, non-exceptional case.

Apply this diff if you agree:
-      logger.error("Failed to parse machine preset", { machinePreset: run.machinePreset });
+      logger.info("Failed to parse machine preset", { machinePreset: run.machinePreset });
31-31: Prefer the provided defaultMachine over a hardcoded fallback.

Using the caller-provided default keeps behavior consistent with environments that may not include "small-1x".

Proposed change:
-    return machinePresetFromName(machines, "small-1x");
+    return machinePresetFromName(machines, defaultMachine);
If "small-1x" is the intentional canonical fallback across the platform, feel free to ignore.

51-51: Same here: use defaultMachine instead of hardcoding "small-1x".

Keeps fallbacks consistent across all branches.
-  return machinePresetFromName(machines, "small-1x");
+  return machinePresetFromName(machines, defaultMachine);
38-49: Don’t use truthiness for numeric checks (0 is falsy).

Be explicit about undefined/null to avoid skipping valid values.
-  if (parsedConfig.data.cpu && parsedConfig.data.memory) {
+  if (parsedConfig.data.cpu !== undefined && parsedConfig.data.memory !== undefined) {
58-61: Add a guard to produce a clear error if the preset name is unknown.

Spreading an undefined value throws a confusing TypeError. Fail fast with a clear message.
 export function machinePresetFromName(
   machines: Record<string, MachinePreset>,
   name: MachinePresetName
 ): MachinePreset {
-  return {
-    ...machines[name],
-  };
+  const preset = machines[name];
+  if (!preset) {
+    throw new Error(`Unknown machine preset: ${name}`);
+  }
+  return { ...preset };
 }
4-4: Nit: align logger name with the exported function for easier log tracing.

Optional, but it helps when correlating call sites and logs.
-const logger = new Logger("machinePresetFromConfig");
+const logger = new Logger("getMachinePreset");
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1)
624-629: Enrich the info log and tighten the inline comment

Add commonly-needed fields to the log for quicker triage, and reword the comment to reflect the actual flow (we’re about to mark it as WAITING_FOR_DEPLOY here).

Apply this diff:
-      // This happens when a run is "WAITING_FOR_DEPLOY" and is expected
-      logger.info("No matching deployment found for task run", {
-        queueMessage: message.data,
-        messageId: message.messageId,
-      });
+      // No matching deployment found; mark run as "WAITING_FOR_DEPLOY" (expected during deploy gaps)
+      logger.info("No matching deployment found for task run", {
+        queueMessage: message.data,
+        messageId: message.messageId,
+        runId: existingTaskRun.id,
+        runStatus: existingTaskRun.status,
+        lockedById: existingTaskRun.lockedById ?? undefined,
+        lockedToVersionId: existingTaskRun.lockedToVersionId ?? undefined,
+        environmentId: existingTaskRun.runtimeEnvironmentId,
+      });
apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts (5)
187-194: LGTM on downgrading ECONNRESET to info; consider a small type-guard + context tweak

This achieves the PR goal (reducing false errors) without changing control flow. To improve readability and add better diagnostics, consider:

Using a narrow type-guard for ECONNRESET instead of the nested "code" in error checks.

Including streamKey/runId/streamId in the info log for correlation.

Apply this diff within the catch block:
-      if (error instanceof Error) {
-        if ("code" in error && error.code === "ECONNRESET") {
-          logger.info("[RealtimeStreams][ingestData] Connection reset during ingestData:", {
-            error,
-          });
-          return new Response(null, { status: 500 });
-        }
-      }
+      if (isConnReset(error)) {
+        logger.info("[RealtimeStreams][ingestData] Connection reset during ingestData", {
+          error,
+          streamKey,
+          runId,
+          streamId,
+        });
+        return new Response(null, { status: 500 });
+      }
Add this helper near the top of the file (outside the selected range):
type ErrorWithCode = Error & { code?: string | number };

function isConnReset(e: unknown): e is ErrorWithCode {
  return e instanceof Error && (e as ErrorWithCode).code === "ECONNRESET";
}
110-114: Prefer logger over console.error in cleanup to keep logging consistent

Using the shared logger keeps formatting consistent and lets you attach metadata (e.g., ignoreError).
-      await redis.quit().catch(console.error);
+      await redis.quit().catch((error) =>
+        logger.info(
+          "[RealtimeStreams][streamResponse] Error during Redis quit in cleanup",
+          { error, ignoreError: true }
+        )
+      );
136-140: Demote cleanup failures to info and mark as ignored to reduce noise

Cleanup failures are non-critical here. Logging them at error can inflate error budgets. Keep them, but at info with ignoreError for consistency with the PR’s goal.
-      } catch (error) {
-        logger.error("[RedisRealtimeStreams][ingestData] Error in cleanup:", { error });
-      }
+      } catch (error) {
+        logger.info("[RedisRealtimeStreams][ingestData] Error in cleanup:", {
+          error,
+          ignoreError: true,
+        });
+      }
144-169: Release the reader lock after the read loop

After you finish reading, explicitly release the reader lock to help avoid holding resources longer than necessary.
       while (true) {
         const { done, value } = await reader.read();

         if (done || !value) {
           break;
         }

         logger.debug("[RedisRealtimeStreams][ingestData] Reading data", {
           streamKey,
           runId,
           value,
         });

         await redis.xadd(
           streamKey,
           "MAXLEN",
           "~",
           String(env.REALTIME_STREAM_MAX_LENGTH),
           "*",
           "data",
           value
         );
       }
+
+      // Ensure the lock on the reader is released before proceeding.
+      reader.releaseLock();
187-194: Consider returning 204/200 instead of 500 for ECONNRESET in ingestData

ECONNRESET usually indicates the client hung up mid-upload. Returning a 5xx here may trigger upstream retries or alerting even though there’s nothing actionable. Switching to a 204 (or 200) can reduce noise if callers treat non-2xx as retryable.

Affected location:

apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts (ingestData, lines 187–194)

Proposed diff:
- if (error instanceof Error && "code" in error && error.code === "ECONNRESET") {
-   logger.info("[RealtimeStreams][ingestData] Connection reset during ingestData:", { error });
-   return new Response(null, { status: 500 });
- }
+ if (error instanceof Error && "code" in error && error.code === "ECONNRESET") {
+   logger.info("[RealtimeStreams][ingestData] Connection reset during ingestData:", { error });
+   // Client disconnected—return 204 to avoid unnecessary retries/alerts
+   return new Response(null, { status: 204 });
+ }
• Verify how your callers treat 2xx vs 5xx on this endpoint (e.g. retries, error reporting) before rolling out.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1bac20b and d402945.

📒 Files selected for processing (16)

apps/webapp/app/db.server.ts (1 hunks)
apps/webapp/app/models/runtimeEnvironment.server.ts (1 hunks)
apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts (1 hunks)
apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts (1 hunks)
apps/webapp/app/v3/alertsWorker.server.ts (3 hunks)
apps/webapp/app/v3/eventRepository.server.ts (1 hunks)
apps/webapp/app/v3/machinePresets.server.ts (1 hunks)
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1 hunks)
apps/webapp/app/v3/runEngineHandlers.server.ts (2 hunks)
apps/webapp/app/v3/services/alerts/deliverAlert.server.ts (1 hunks)
apps/webapp/app/v3/services/cancelTaskRunV1.server.ts (1 hunks)
apps/webapp/app/v3/services/createCheckpoint.server.ts (2 hunks)
internal-packages/run-engine/src/engine/machinePresets.ts (1 hunks)
internal-packages/run-engine/src/engine/systems/dequeueSystem.ts (1 hunks)
packages/core/src/logger.ts (1 hunks)
packages/redis-worker/src/worker.ts (6 hunks)

✅ Files skipped from review due to trivial changes (1)

apps/webapp/app/v3/services/cancelTaskRunV1.server.ts

🚧 Files skipped from review as they are similar to previous changes (12)

apps/webapp/app/v3/runEngineHandlers.server.ts
internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
apps/webapp/app/v3/services/alerts/deliverAlert.server.ts
apps/webapp/app/v3/machinePresets.server.ts
apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts
apps/webapp/app/models/runtimeEnvironment.server.ts
apps/webapp/app/db.server.ts
packages/redis-worker/src/worker.ts
apps/webapp/app/v3/alertsWorker.server.ts
apps/webapp/app/v3/eventRepository.server.ts
apps/webapp/app/v3/services/createCheckpoint.server.ts
packages/core/src/logger.ts

🧰 Additional context used

📓 Path-based instructions (4)

**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations

Files:

internal-packages/run-engine/src/engine/machinePresets.ts
apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts

{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

We use zod a lot in packages/core and in the webapp

Files:

apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts

apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: In the webapp, all environment variables must be accessed through the env export of env.server.ts, instead of directly accessing process.env.
When importing from @trigger.dev/core in the webapp, never import from the root @trigger.dev/core path; always use one of the subpath exports as defined in the package's package.json.

Files:

apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts

apps/webapp/app/services/**/*.server.ts

📄 CodeRabbit Inference Engine (.cursor/rules/webapp.mdc)

For testable services, separate service logic and configuration, as exemplified by realtimeClient.server.ts (service) and realtimeClientGlobal.server.ts (configuration).

Files:

apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts

🧬 Code Graph Analysis (1)

apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1)

apps/webapp/app/services/logger.server.ts (1)

logger (49-59)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)

GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: typecheck / typecheck
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (4)

internal-packages/run-engine/src/engine/machinePresets.ts (2)

29-29: Downgrading to info is appropriate and aligns with PR intent.

Good call reducing noise for non-exceptional validation failures.

1-75: Fallback safe: "small-1x" present in all machine maps
Verified that every machines map—including run-engine test fixtures, core schema defaults, and webapp presets—defines an entry for "small-1x". The hard-coded fallback will not cause missing-key errors.

apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (2)

624-642: LGTM: Downgrade to info + WAITING_FOR_DEPLOY ack looks right for expected gaps

This aligns with the PR goal of reducing noisy errors when a deploy isn’t available yet, and setting the run to WAITING_FOR_DEPLOY matches the intended state transition.

630-641: Re-enqueue Path Exists for WAITING_FOR_DEPLOY

We’ve confirmed that WAITING_FOR_DEPLOY runs are picked up and re-enqueued by a dedicated background service:

• apps/webapp/app/v3/services/executeTasksWaitingForDeploy.ts (around lines 37–43)
– Queries for runs where status = "WAITING_FOR_DEPLOY"
– Calls marqs.enqueueMessage(...) for each run to put it back on the shared queue

Because this service handles all ACKed WAITING_FOR_DEPLOY runs, the ack_and_do_more_work path in sharedQueueConsumer.server.ts will not starve tasks.

matt-aitken added 15 commits August 19, 2025 18:03

Remove variables from dequeue log message

b8b83e3

Continue snapshot throw json

402d974

Waiting for deploy error removed

eeffe13

Realtime ECONNRESET is expected

0f6c2ed

Redis worker logErrors changes, removed ids

a3f6d96

Preview branch without a branch shouldn't log an error, it's a user p…

38518b6

…rovided issue

"Task run is not in a cancellable state" isn't an error, it's expected

dd2933d

"CreateCheckpointService: Child run already resumed" is expected

9a09afe

"CreateCheckpointService: Batch already resumed" is expected

58df053

"Failed to insert events, will attempt bisection" changed to info, we…

4ce3e22

… have errors for complete failures

Ignore "PrismaClient error"

b97bea8

Don't log Redis worker DLQ errors if we're ignoring

2d36180

"Failed to parse machine config" is fine, sometimes a config is null …

f8a89e2

…or undefined

"Failed to parse machine config" for v3

201fc57

MetadataTooLargeError shouldn't log an error

d402945

coderabbitai bot reviewed Aug 19, 2025

View reviewed changes

apps/webapp/app/v3/alertsWorker.server.ts Show resolved Hide resolved

apps/webapp/app/v3/alertsWorker.server.ts Show resolved Hide resolved

packages/core/src/logger.ts Show resolved Hide resolved

ericallam changed the title ~~More false errors turned to logs~~ chore(webapp): More false errors turned to logs Aug 19, 2025

nicktrn reviewed Aug 19, 2025

View reviewed changes

apps/webapp/app/db.server.ts Show resolved Hide resolved

nicktrn approved these changes Aug 19, 2025

View reviewed changes

matt-aitken force-pushed the false-error-downgrades branch from 1bac20b to d402945 Compare August 20, 2025 08:51

coderabbitai bot reviewed Aug 20, 2025

View reviewed changes

matt-aitken merged commit 4264fc0 into main Aug 20, 2025
56 of 57 checks passed

matt-aitken deleted the false-error-downgrades branch August 20, 2025 10:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

chore(webapp): More false errors turned to logs #2422

chore(webapp): More false errors turned to logs #2422

Uh oh!

matt-aitken commented Aug 19, 2025

Uh oh!

changeset-bot bot commented Aug 19, 2025

Uh oh!

coderabbitai bot commented Aug 19, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chore(webapp): More false errors turned to logs #2422

chore(webapp): More false errors turned to logs #2422

Uh oh!

Conversation

matt-aitken commented Aug 19, 2025

Uh oh!

changeset-bot bot commented Aug 19, 2025

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Aug 19, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)