Use _is_spammy_exception for action_health metric filtering#193
Merged
Conversation
… metric The previous filter (not ExpectedUdfException) counted MissingJsonPath, TypeError, and NodeFailurePropagationException as "unexpected" errors. These are operationally expected — MissingJsonPath fires for every optional JsonData field not present in the payload, and NodeFailurePropagationException cascades from those, resulting in ~100% error rate for action types like payment_blocked. Now uses _is_spammy_exception (same filter as udf_execution metric) to only count truly actionable errors. Renames tag from had_unexpected_errors to had_actionable_errors and drops the always-true had_errors tag.
NodeErrorInfo.error is typed as BaseException but _is_spammy_exception expects Optional[Exception]. Add isinstance guard to satisfy mypy.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the
osprey.action_healthmetric (added in #191) to use_is_spammy_exceptionfor filtering errors instead of justExpectedUdfException.Problem
The previous filter counted
MissingJsonPath,TypeError, andNodeFailurePropagationExceptionas "unexpected" errors. These are operationally expected —MissingJsonPathfires for every optionalJsonDatafield not present in the payload, andNodeFailurePropagationExceptioncascades from those. This caused ~100% "unexpected error" rate for action types likepayment_blockedthat import models with many optional fields (~211 errors per action, all from cascade).Changes
_is_spammy_exception(same filter as the existingudf_executionmetric) to only count truly actionable errorshad_unexpected_errors→had_actionable_errorshad_errorstagTest plan
./run-tests.sh)had_actionable_errorstag appears in Datadogpayment_blockedno longer shows ~100% error rate