Skip to content

Use _is_spammy_exception for action_health metric filtering#193

Merged
cmttt merged 3 commits intomainfrom
ls/action-health-metrics-v2-clean
Mar 25, 2026
Merged

Use _is_spammy_exception for action_health metric filtering#193
cmttt merged 3 commits intomainfrom
ls/action-health-metrics-v2-clean

Conversation

@cmttt
Copy link
Collaborator

@cmttt cmttt commented Mar 25, 2026

Summary

Fixes the osprey.action_health metric (added in #191) to use _is_spammy_exception for filtering errors instead of just ExpectedUdfException.

Problem

The previous filter counted MissingJsonPath, TypeError, and NodeFailurePropagationException as "unexpected" errors. These are operationally expected — MissingJsonPath fires for every optional JsonData field not present in the payload, and NodeFailurePropagationException cascades from those. This caused ~100% "unexpected error" rate for action types like payment_blocked that import models with many optional fields (~211 errors per action, all from cascade).

Changes

  • Uses _is_spammy_exception (same filter as the existing udf_execution metric) to only count truly actionable errors
  • Renames tag from had_unexpected_errorshad_actionable_errors
  • Drops the always-true had_errors tag

Test plan

  • Run full test suite via Docker (./run-tests.sh)
  • Deploy and verify had_actionable_errors tag appears in Datadog
  • Confirm payment_blocked no longer shows ~100% error rate

… metric

The previous filter (not ExpectedUdfException) counted MissingJsonPath,
TypeError, and NodeFailurePropagationException as "unexpected" errors.
These are operationally expected — MissingJsonPath fires for every
optional JsonData field not present in the payload, and
NodeFailurePropagationException cascades from those, resulting in
~100% error rate for action types like payment_blocked.

Now uses _is_spammy_exception (same filter as udf_execution metric)
to only count truly actionable errors. Renames tag from
had_unexpected_errors to had_actionable_errors and drops the
always-true had_errors tag.
@cmttt cmttt requested review from a team, EXBreder, ayubun, haileyok and vinaysrao1 as code owners March 25, 2026 00:54
Copy link
Collaborator

@haileyok haileyok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

NodeErrorInfo.error is typed as BaseException but _is_spammy_exception
expects Optional[Exception]. Add isinstance guard to satisfy mypy.
@cmttt cmttt merged commit 5261d84 into main Mar 25, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants