Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[active_job] support exception reporting only after last retry failed #2573

Merged
merged 20 commits into from
Apr 4, 2025

Conversation

solnic
Copy link
Collaborator

@solnic solnic commented Mar 5, 2025

This is an updated version of #2500 which adds support for active_job_report_after_job_retries option which makes reporting an exception upon a job failure to happen only on the last attempt retry failure.

@solnic solnic force-pushed the solnic/pr-2500-follow-up branch 4 times, most recently from 4ab9ff9 to b26b17a Compare March 6, 2025 13:13
@getsentry getsentry deleted a comment from github-actions bot Mar 7, 2025
@solnic solnic force-pushed the solnic/pr-2500-follow-up branch 2 times, most recently from 90144c4 to 2fa01fc Compare March 7, 2025 14:38
@getsentry getsentry deleted a comment from github-actions bot Mar 7, 2025
@solnic solnic marked this pull request as ready for review March 7, 2025 14:40
@solnic solnic requested review from st0012 and sl0thentr0py March 7, 2025 14:40
@getsentry getsentry deleted a comment from github-actions bot Mar 7, 2025
@solnic solnic force-pushed the solnic/pr-2500-follow-up branch from 2fa01fc to d19050a Compare March 10, 2025 14:07
@solnic solnic changed the title PR 2500 follow up [active_job] support exception reporting only after last retry failed Mar 10, 2025
@solnic solnic force-pushed the solnic/pr-2500-follow-up branch from d19050a to 36c1176 Compare March 11, 2025 14:51
@getsentry getsentry deleted a comment from github-actions bot Mar 11, 2025
@solnic
Copy link
Collaborator Author

solnic commented Mar 11, 2025

Please ignore Danger here, it just fails to see that I added CHANGELOG entry.

Copy link
Collaborator

@st0012 st0012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some suggestions for tests.

@@ -52,6 +54,10 @@
expect(Sentry::Rails::Tracing.subscribed_tracing_events).to be_empty
Sentry::Rails::Tracing.remove_active_support_notifications_patch

if defined?(Sentry::Rails::ActiveJobExtensions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with our current setup this will always be true?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@st0012 not really, it may or may not be defined, depending on the order of loading files which is pretty much random 🙃

@solnic solnic force-pushed the solnic/pr-2500-follow-up branch from 36c1176 to 4e59569 Compare March 31, 2025 14:36
Copy link

github-actions bot commented Mar 31, 2025

Fails
🚫 Please consider adding a changelog entry for the next release.

Instructions and example for changelog

Please add an entry to CHANGELOG.md to the "Unreleased" section. Make sure the entry includes this PR's number.

Example:

## Unreleased

- [active_job] support exception reporting only after last retry failed ([#2573](https://github.com/getsentry/sentry-ruby/pull/2573))

If none of the above apply, you can opt out of this check by adding #skip-changelog to the PR description.

Generated by 🚫 dangerJS against 4e59569

@solnic solnic force-pushed the solnic/pr-2500-follow-up branch from dc38bb6 to 4e59569 Compare March 31, 2025 14:41
@solnic solnic merged commit 7db2c29 into master Apr 4, 2025
217 of 266 checks passed
@solnic solnic deleted the solnic/pr-2500-follow-up branch April 4, 2025 12:46
@aburgel
Copy link

aburgel commented Apr 9, 2025

Hi there, I came across this issue while trying to understand why I don't see any sentry issues from failing ActiveJob jobs until the final retry attempt. I'm using sentry-rails 5.23 and based on what this PR is trying to do, it seems like I should be seeing a Sentry issue on each retry. But I don't. That's a bit confusing to me, so I did a bit of investigation.

The first thing I found is that the test setup is not representative of how ActiveJob typically runs in the real world. In particular, these lines do some subtle things that don't work well with retry_on.

assert_performed_jobs 3 do
  FailedJobWithRetryOn.perform_later rescue nil
end

With the test adapter, assert_performed_jobs will run all enqueued jobs immediately as well as all jobs that those jobs enqueue. Where things fall apart is that retry_on will reenqueue the job when it fails, but because the test adapter immediately runs the job upon enqueuing it, it will immediately fail again. This causes an exception to be raised in the ActiveJob exception handler, causing the exception to propagate.

So in this test scenario, the Sentry ActiveJob error handler will be called upon each retry, but I don't think that's what happens in the real world.

If you change this test to look like this:

-        assert_performed_jobs 3 do
-          FailedJobWithRetryOn.perform_later rescue nil
-        end
+        FailedJobWithRetryOn.perform_later
+
+        perform_enqueued_jobs
+        perform_enqueued_jobs
+        perform_enqueued_jobs rescue nil

Then you should see that only 1 call to the sentry error handler is made. This is more representative of how ActiveJob actually behaves because the retry handler will enqueue the job but not run it until the next call to perform_enqueued_jobs. In other words, with this approach, the retry handler will complete successfully and the exception will not propagate.


But then this leads to my original question, does sentry-rails (even with this PR) capture an exception on every retry with ActiveJob? I don't quite see how this happens.

Sentry::Rails::ActiveJobExtensions gets placed in the ancestor chain above all the ActiveJob internal modules, which means its implementation of perform_now will wrap the internal ActiveJob implementations. And since those internal ActiveJob methods are where the retry logic happens, Sentry won't even see the exception when the retry logic handles it.

So I'm left a little confused, I'm guessing I've missed something. Perhaps some ActiveJob adapters behave differently (I'm using solid_queue)?

@solnic
Copy link
Collaborator Author

solnic commented Apr 9, 2025

@aburgel hey Alex, thanks for bringing this up. I'll look into this and re-visit this PR and related issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants