-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: reject zero and negative periodic tasks schedule #887
Conversation
@Roiocam have you seen the busy loop behaviour in the real world? What about the case where some sets a very short value (eg 1) - isn't that almost as bad? |
The issue was discovered in our unit test on CI, and the reason was that our heartbeat configuration was not being loaded correctly. The "short value" might cause a busy loop, but we always get a chance to rest. However, in the case of a zero period, this can lead to the actor system never being able to shut down. Please refer to the specific unit test on the link for more details: https://gist.github.com/Roiocam/d317683d54bdbf3afe75b1b945c7f115 |
tests fail seems to be down to a duplicate test name
|
I checked both Netty and Java, which will throw a IllegalArgumentException. |
actor/src/main/scala/org/apache/pekko/actor/LightArrayRevolverScheduler.scala
Outdated
Show resolved
Hide resolved
private def checkZeroPeriod(delayNanos: Long): Unit = | ||
if (delayNanos <= 0) | ||
throw new IllegalArgumentException( | ||
"Task scheduled with zero or negative period, which is create an an infinite loop") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use:
delay :xxx expected > 0
ordelay should > 0
.
user do not know what's delayNanos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about "Task scheduled with [${delay.toSeconds}] seconds delay, which means creating an infinite loop. The expected delay must be greater than 0."?
I will go to airport, will try to review this when free |
time | ||
} | ||
|
||
override protected def getShutdownTimeout: FiniteDuration = (10 seconds).dilated | ||
|
||
override protected def waitNanos(ns: Long): Unit = { | ||
// println(s"waiting $ns") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the printlns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray println comments
if (delay.toNanos <= 0) | ||
throw new IllegalArgumentException( | ||
s"Task scheduled with [${delay.toSeconds}] seconds delay, which means creating an infinite loop. " + | ||
s"The expected delay must be greater than 0.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not going to block this over this - but I would I strongly argue that this should be a function - like checkMaxDelay.
I'm not convinced by all this inline
debate. Modern JIT compilers are good - in fact, very good - at this. I struggle to accept that this coding pratcice is justified without someone proving it to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is an exception it shouldn't be evaluated in the hot path anyways, or is this an exception which is expected to be thrown and caught in the standard business logic flow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pre-existing line right after this does a similar check and throws the same type of exception - checkMaxDelay(roundUp(delay).toNanos)
I have no problem with validating the values here.
The scaladoc for this method already has @throws java.lang.IllegalArgumentException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right about the compiler's knowledge, but this change is mostly about reducing the stack frame.
java.lang.IllegalArgumentException: Task scheduled with [0] seconds delay, which means creating an infinite loop. The expected delay must be greater than 0.
+ at org.apache.pekko.actor.LightArrayRevolverScheduler.checkDelay(LightArrayRevolverScheduler.scala:205)
at org.apache.pekko.actor.LightArrayRevolverScheduler.scheduleWithFixedDelay(LightArrayRevolverScheduler.scala:108)
at org.apache.pekko.actor.Scheduler.scheduleWithFixedDelay(Scheduler.scala:157)
at org.apache.pekko.actor.Scheduler.scheduleWithFixedDelay$(Scheduler.scala:149)
at org.apache.pekko.actor.LightArrayRevolverScheduler.scheduleWithFixedDelay(LightArrayRevolverScheduler.scala:50)
at org.apache.pekko.actor.LightArrayRevolverSchedulerSpec.$anonfun$new$52(SchedulerSpec.scala:688)
It seems like everything is okay because most of the time we only care about the top frame of the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is acceptable, even in real-world case, where the dispatcher stack frame is only a few. while other frames are more like noise.
[2024-01-04 10:09:59,876] [ERROR] [ispatcher-55] [akka.actor.typed.Behavior$ ]: Supervisor StopSupervisor saw failure: Task scheduled with [42949672] seconds delay, which is too far in future, maximum delay is [21474835] seconds MDC: {akkaAddress=akka://[email protected]:2551, akkaSource=akka://bank-account/system/sharding/xxxxxxxxxx, sourceActorSystem=bank-account}
java.lang.IllegalArgumentException: Task scheduled with [42949672] seconds delay, which is too far in future, maximum delay is [21474835] seconds
at akka.actor.LightArrayRevolverScheduler.checkMaxDelay(LightArrayRevolverScheduler.scala:216)
at akka.actor.LightArrayRevolverScheduler.scheduleWithFixedDelay(LightArrayRevolverScheduler.scala:98)
at akka.actor.typed.internal.adapter.SchedulerAdapter.scheduleWithFixedDelay(SchedulerAdapter.scala:44)
at akka.actor.typed.internal.TimerSchedulerImpl.startTimer(TimerSchedulerImpl.scala:127)
at akka.actor.typed.internal.TimerSchedulerImpl.startTimerWithFixedDelay(TimerSchedulerImpl.scala:101)
at akka.actor.typed.internal.TimerSchedulerCrossDslSupport.startTimerWithFixedDelay(TimerSchedulerImpl.scala:63)
at akka.actor.typed.internal.TimerSchedulerCrossDslSupport.startTimerWithFixedDelay$(TimerSchedulerImpl.scala:62)
at akka.actor.typed.internal.TimerSchedulerImpl.startTimerWithFixedDelay(TimerSchedulerImpl.scala:83)
at akka.actor.typed.javadsl.BuiltBehavior.receive(BehaviorBuilder.scala:197)
at akka.actor.typed.javadsl.BuiltBehavior.receive(BehaviorBuilder.scala:186)
at akka.actor.typed.Behavior$.interpret(Behavior.scala:274)
at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:230)
at akka.actor.typed.internal.InterceptorImpl$$anon$2.apply(InterceptorImpl.scala:57)
at akka.actor.typed.internal.InterceptorImpl.receive(InterceptorImpl.scala:87)
at akka.actor.typed.Behavior$.interpret(Behavior.scala:274)
at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:230)
at akka.actor.typed.internal.InterceptorImpl$$anon$2.apply(InterceptorImpl.scala:57)
at akka.actor.typed.internal.SimpleSupervisor.aroundReceive(Supervision.scala:131)
at akka.actor.typed.internal.InterceptorImpl.receive(InterceptorImpl.scala:85)
at akka.actor.typed.Behavior$.interpret(Behavior.scala:274)
at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:230)
at akka.actor.typed.internal.adapter.ActorAdapter.handleMessage(ActorAdapter.scala:128)
at akka.actor.typed.internal.adapter.ActorAdapter.aroundReceive(ActorAdapter.scala:107)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:579)
at akka.actor.ActorCell.invoke(ActorCell.scala:547)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
actor/src/main/scala/org/apache/pekko/actor/LightArrayRevolverScheduler.scala
Show resolved
Hide resolved
@pjfanning Do you want to recheck this? |
the printlns are commented out still but the lines should be removed altogether |
@Roiocam Do you want to remove this so we can get the PR over the finish line? |
Done, let's keep this pull request simple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
@@ -193,6 +195,12 @@ class LightArrayRevolverScheduler(config: Config, log: LoggingAdapter, threadFac | |||
task | |||
} | |||
|
|||
private def checkPeriod(delay: FiniteDuration): Unit = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @He-Pin made a comment here before which now seems to be gone regarding inlining.
I just wanted to say you can inline this method if you want, you just have to make a scala-2
version using the @inline
annotation and a scala-3
version with the inline
keyword.
The easiest way to do this would be to place the checkPeriod
function into a private[pekko]
trait
in the scala-2
/scala-3
source folders respectively and make class LightArrayRevolverScheduler
extend that trait
.
It might be best to do this in a separate PR though, since iirc we are wanting to backport this to 1.0.x and 1.0.x doesn't have the scala-2 inliner available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can do that in a separate pr, I have to say, this is a hot path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hotspot Compiler is very good at optimising this. There is zero evidence that this code needs excess optimisation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hotspot Compiler is very good at optimising this. There is zero evidence that this code needs excess optimisation.
I am also skeptical but it can be proved with benchmarks in a separate PR if need be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@pjfanning Do you want to re-review it now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@Roiocam could you create a backport PR (to 1.0.x branch)? |
Thank you @Roiocam |
* fix: reject zero and negative periodic tasks schedule * fix: undo the symbol change * use different test name, redescribe the exception * abstract check function * remove the printlns change * reduce time units scale convert
* fix: reject zero and negative periodic tasks schedule * fix: undo the symbol change * use different test name, redescribe the exception * abstract check function * remove the printlns change * reduce time units scale convert
create a zero periodic tasks on scheduler will let scheduler busy and can not scheduler other task.
on jdk
ScheduledThreadPoolExecutor
, the zero period task currently is forrbidden.