-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on idempotency requirement #22
Comments
Hi @stesen-wefunder! Thanks for reaching out! These are some great questions -- exactly the kinds that I'd hoped this library's documentation could prompt! Here's my attempt to provide a few answers:
ActiveStorage jobs should be idempotent already, insofar as AnalyzeJobs and variant processing should produce deterministic (or at least functionally equivalent) results every time. ActionMailer jobs, on the other hand, don't seem to go out of their way to stamp any kind of message_id eagerly into the serialized job data, which likely means that they are not idempotent (unless an underlying delivery API gem does something particularly fancy involving patching the
I wouldn't say that by any stretch. The real-world impact of an errant double-send is likely very low in practice, and particularly for emails (but for many jobs in general), at-least-once delivery is still far preferable to at-most-once delivery, even without any idempotency guarantees. So I'd argue that you still should generally use In the rare case where idempotency is not possible and you'd rather intervene than chance a double-send, you could set
So, to unpack this, there are a couple things worth knowing. Firstly, the worker's pickup query does safely lock jobs, so you shouldn't need to use any kind of external locking mechanism to prevent double execution. (Plus, I wouldn't think that using an external datastore would provide meaningful sequencing guarantees beyond what Secondly, jobs have a maximum runtime, after which they will be forcibly stopped. Therefore, jobs that have been locked for longer than the maximum runtime (plus a hardcoded 30 second buffer) can be assumed to either be in a terminal "failed" state (attempts == Because of this dequeue issue (i.e. the job retry/cleanup process), there is an extremely small chance that a job may be attempted twice, even if its Footnotes
|
Thanks for the detailed response! :D So it sounds like a job completing twice is actually an exceptionally rare case? More of a "be aware there's a chance this could happen" than an "expect and prepare for this to happen regularly"? I had kind of misinterpreted things as a need for idempotency being the trade-off made to get better performance out of So in practice, would you say most users could drop P.S., How did you make a footnote in a GitHub comment?? 🤯 |
Yes, that's a good way of putting it. Though I'd argue that similar idempotency/retryability strategies will apply if your job raises an exception somewhere in the middle (especially if there are side effects external to the local DB that cannot be rolled-back), which may be a more regular occurrence.
Yes, that was very much the intent! While the two libraries have drifted a bit in terms of functionality and querying strategy, the default resiliency of
Instructions here. It's one of my favorite features that they've added in the last couple years! 😄 |
Wonderful! Thanks for all the info! I'm moving forward with a renewed sense of ease, hahaha. (Sorry for going quiet for a bit there, haha. I do have one other unrelated question, but I'll open another thread for that.) I'll let you decide whether to leave this issue open for others to come across, or as a reminder to add to the docs, or you can just close it. Whatever works for you. :) |
This was a very helpful conversation. I'm also considering switching from delayed_job to delayed, but concerned about the need for idempotency with delayed. A few follow-up questions to dig into this deeper @stesen-wefunder ... Does delayed_job also have an At-Least-Once Delivery guarantee? More importantly, does delayed_job also have the behaviour that under certain conditions a job may run more than once, and if so are these conditions the same ones that delayed has? I'm mostly concerned about the possibility of introducing new edge-case behaviour wherein a job can finish successfully more than once. Your post indicates that this behaviour can occur with the delayed gem. Can this behaviour occur with delayed_job too? Your footnote above seems to have spoken to this question/concern, but I want to ensure I understood correctly that the behaviour can occur with delayed_job too. If this behaviour can occur with delayed_job too, is it just as likely to occur with delayed as it is with delayed_job? In other words, with delayed, is it as common to have jobs finish successfully more than once as with delayed_job? |
Jobs being worked more than once is possible in the default configurations of both delayed_job and delayed. Delayed's opinionated worker draining should in fact reduce the incidence of jobs being worked multiple times vs delayed_job, assuming that jobs never take longer than the recommended 20 minute timeout to complete. |
Thanks @jmileham ! How about if jobs take longer than 20 minutes? We have jobs that take many hours to run, and it would be a very large amount of work to decompose them into smaller jobs. |
You can tune your timeout settings, but delayed's default tunings are designed to:
leading us to recommend decomposing work into short-running jobs and setting a short timeout. If you are ok to leave your old worker to drain for longer than the longest running job, that could work, but you'll be subject to all the usual challenges if the worker dies for any other reason. |
Hi! This project looks very exciting for teams looking to migrate away from
delayed_job
. However, I find myself having some lingering questions about the idempotency requirement of this gem:delayed
not be used with these jobs?after_perform
would there still be a race condition?)Massive thanks to the team at Betterment for this project!
The text was updated successfully, but these errors were encountered: