Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: reject zero and negative periodic tasks schedule #887

Merged
merged 6 commits into from
Jan 18, 2024

Conversation

Roiocam
Copy link
Member

@Roiocam Roiocam commented Dec 28, 2023

create a zero periodic tasks on scheduler will let scheduler busy and can not scheduler other task.

on jdk ScheduledThreadPoolExecutor, the zero period task currently is forrbidden.

@pjfanning pjfanning added this to the 1.0.x milestone Dec 28, 2023
@pjfanning
Copy link
Contributor

@Roiocam have you seen the busy loop behaviour in the real world? What about the case where some sets a very short value (eg 1) - isn't that almost as bad?

@Roiocam
Copy link
Member Author

Roiocam commented Dec 28, 2023

@Roiocam have you seen the busy loop behaviour in the real world? What about the case where some sets a very short value (eg 1) - isn't that almost as bad?

The issue was discovered in our unit test on CI, and the reason was that our heartbeat configuration was not being loaded correctly.

The "short value" might cause a busy loop, but we always get a chance to rest. However, in the case of a zero period, this can lead to the actor system never being able to shut down.

Please refer to the specific unit test on the link for more details: https://gist.github.com/Roiocam/d317683d54bdbf3afe75b1b945c7f115

@pjfanning
Copy link
Contributor

tests fail

seems to be down to a duplicate test name

8T08:21:22.8083608Z [12-28 08:21:22.808] �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31morg.apache.pekko.actor.LightArrayRevolverSchedulerSpec *** ABORTED *** (1 millisecond)�[0m�[0m
2023-12-28T08:21:22.8088023Z [12-28 08:21:22.808] �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31m  org.scalatest.exceptions.DuplicateTestNameException was thrown inside "using scheduleWithFixedDelay" must, construction cannot continue: "Duplicate test name: A LightArrayRevolverScheduler when using scheduleWithFixedDelay must reject periodic tasks scheduled with zero interval" (SchedulerSpec.scala:727)�[0m�[0m

@He-Pin
Copy link
Member

He-Pin commented Dec 28, 2023

I checked both Netty and Java, which will throw a IllegalArgumentException.

@Roiocam Roiocam marked this pull request as draft December 28, 2023 09:14
private def checkZeroPeriod(delayNanos: Long): Unit =
if (delayNanos <= 0)
throw new IllegalArgumentException(
"Task scheduled with zero or negative period, which is create an an infinite loop")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use:

  • delay :xxx expected > 0 or delay should > 0.

user do not know what's delayNanos

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about "Task scheduled with [${delay.toSeconds}] seconds delay, which means creating an infinite loop. The expected delay must be greater than 0."?

@Roiocam Roiocam marked this pull request as ready for review December 28, 2023 09:46
@He-Pin
Copy link
Member

He-Pin commented Dec 28, 2023

I will go to airport, will try to review this when free

time
}

override protected def getShutdownTimeout: FiniteDuration = (10 seconds).dilated

override protected def waitNanos(ns: Long): Unit = {
// println(s"waiting $ns")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the printlns

Copy link
Contributor

@pjfanning pjfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray println comments

if (delay.toNanos <= 0)
throw new IllegalArgumentException(
s"Task scheduled with [${delay.toSeconds}] seconds delay, which means creating an infinite loop. " +
s"The expected delay must be greater than 0.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to block this over this - but I would I strongly argue that this should be a function - like checkMaxDelay.

I'm not convinced by all this inline debate. Modern JIT compilers are good - in fact, very good - at this. I struggle to accept that this coding pratcice is justified without someone proving it to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an exception it shouldn't be evaluated in the hot path anyways, or is this an exception which is expected to be thrown and caught in the standard business logic flow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pre-existing line right after this does a similar check and throws the same type of exception - checkMaxDelay(roundUp(delay).toNanos)

I have no problem with validating the values here.

The scaladoc for this method already has @throws java.lang.IllegalArgumentException

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right about the compiler's knowledge, but this change is mostly about reducing the stack frame.

java.lang.IllegalArgumentException: Task scheduled with [0] seconds delay, which means creating an infinite loop. The expected delay must be greater than 0.
+       at org.apache.pekko.actor.LightArrayRevolverScheduler.checkDelay(LightArrayRevolverScheduler.scala:205)
	at org.apache.pekko.actor.LightArrayRevolverScheduler.scheduleWithFixedDelay(LightArrayRevolverScheduler.scala:108)
	at org.apache.pekko.actor.Scheduler.scheduleWithFixedDelay(Scheduler.scala:157)
	at org.apache.pekko.actor.Scheduler.scheduleWithFixedDelay$(Scheduler.scala:149)
	at org.apache.pekko.actor.LightArrayRevolverScheduler.scheduleWithFixedDelay(LightArrayRevolverScheduler.scala:50)
	at org.apache.pekko.actor.LightArrayRevolverSchedulerSpec.$anonfun$new$52(SchedulerSpec.scala:688)

It seems like everything is okay because most of the time we only care about the top frame of the stack.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is acceptable, even in real-world case, where the dispatcher stack frame is only a few. while other frames are more like noise.

[2024-01-04 10:09:59,876] [ERROR] [ispatcher-55] [akka.actor.typed.Behavior$    ]: Supervisor StopSupervisor saw failure: Task scheduled with [42949672] seconds delay, which is too far in future, maximum delay is [21474835] seconds MDC: {akkaAddress=akka://[email protected]:2551, akkaSource=akka://bank-account/system/sharding/xxxxxxxxxx, sourceActorSystem=bank-account}
java.lang.IllegalArgumentException: Task scheduled with [42949672] seconds delay, which is too far in future, maximum delay is [21474835] seconds
	at akka.actor.LightArrayRevolverScheduler.checkMaxDelay(LightArrayRevolverScheduler.scala:216)
	at akka.actor.LightArrayRevolverScheduler.scheduleWithFixedDelay(LightArrayRevolverScheduler.scala:98)
	at akka.actor.typed.internal.adapter.SchedulerAdapter.scheduleWithFixedDelay(SchedulerAdapter.scala:44)
	at akka.actor.typed.internal.TimerSchedulerImpl.startTimer(TimerSchedulerImpl.scala:127)
	at akka.actor.typed.internal.TimerSchedulerImpl.startTimerWithFixedDelay(TimerSchedulerImpl.scala:101)
	at akka.actor.typed.internal.TimerSchedulerCrossDslSupport.startTimerWithFixedDelay(TimerSchedulerImpl.scala:63)
	at akka.actor.typed.internal.TimerSchedulerCrossDslSupport.startTimerWithFixedDelay$(TimerSchedulerImpl.scala:62)
	at akka.actor.typed.internal.TimerSchedulerImpl.startTimerWithFixedDelay(TimerSchedulerImpl.scala:83)
	at akka.actor.typed.javadsl.BuiltBehavior.receive(BehaviorBuilder.scala:197)
	at akka.actor.typed.javadsl.BuiltBehavior.receive(BehaviorBuilder.scala:186)
	at akka.actor.typed.Behavior$.interpret(Behavior.scala:274)
	at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:230)
	at akka.actor.typed.internal.InterceptorImpl$$anon$2.apply(InterceptorImpl.scala:57)
	at akka.actor.typed.internal.InterceptorImpl.receive(InterceptorImpl.scala:87)
	at akka.actor.typed.Behavior$.interpret(Behavior.scala:274)
	at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:230)
	at akka.actor.typed.internal.InterceptorImpl$$anon$2.apply(InterceptorImpl.scala:57)
	at akka.actor.typed.internal.SimpleSupervisor.aroundReceive(Supervision.scala:131)
	at akka.actor.typed.internal.InterceptorImpl.receive(InterceptorImpl.scala:85)
	at akka.actor.typed.Behavior$.interpret(Behavior.scala:274)
	at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:230)
	at akka.actor.typed.internal.adapter.ActorAdapter.handleMessage(ActorAdapter.scala:128)
	at akka.actor.typed.internal.adapter.ActorAdapter.aroundReceive(ActorAdapter.scala:107)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:579)
	at akka.actor.ActorCell.invoke(ActorCell.scala:547)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
	at akka.dispatch.Mailbox.run(Mailbox.scala:231)
	at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

@mdedetrich
Copy link
Contributor

stray println comments

@pjfanning Do you want to recheck this?

@pjfanning
Copy link
Contributor

stray println comments

@pjfanning Do you want to recheck this?

the printlns are commented out still but the lines should be removed altogether

@mdedetrich
Copy link
Contributor

the printlns are commented out still but the lines should be removed altogether

@Roiocam Do you want to remove this so we can get the PR over the finish line?

@Roiocam
Copy link
Member Author

Roiocam commented Jan 17, 2024

the printlns are commented out still but the lines should be removed altogether

@Roiocam Do you want to remove this so we can get the PR over the finish line?

Done, let's keep this pull request simple.

Copy link
Member

@He-Pin He-Pin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@@ -193,6 +195,12 @@ class LightArrayRevolverScheduler(config: Config, log: LoggingAdapter, threadFac
task
}

private def checkPeriod(delay: FiniteDuration): Unit =
Copy link
Contributor

@mdedetrich mdedetrich Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @He-Pin made a comment here before which now seems to be gone regarding inlining.

I just wanted to say you can inline this method if you want, you just have to make a scala-2 version using the @inline annotation and a scala-3 version with the inline keyword.

The easiest way to do this would be to place the checkPeriod function into a private[pekko] trait in the scala-2/scala-3 source folders respectively and make class LightArrayRevolverScheduler extend that trait.

It might be best to do this in a separate PR though, since iirc we are wanting to backport this to 1.0.x and 1.0.x doesn't have the scala-2 inliner available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do that in a separate pr, I have to say, this is a hot path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hotspot Compiler is very good at optimising this. There is zero evidence that this code needs excess optimisation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hotspot Compiler is very good at optimising this. There is zero evidence that this code needs excess optimisation.

I am also skeptical but it can be proved with benchmarks in a separate PR if need be

Copy link
Member

@He-Pin He-Pin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mdedetrich
Copy link
Contributor

@pjfanning Do you want to re-review it now?

Copy link
Contributor

@pjfanning pjfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@pjfanning pjfanning merged commit 7b2a3d8 into apache:main Jan 18, 2024
18 checks passed
@pjfanning
Copy link
Contributor

@Roiocam could you create a backport PR (to 1.0.x branch)?

@He-Pin
Copy link
Member

He-Pin commented Jan 18, 2024

Thank you @Roiocam

@Roiocam Roiocam deleted the reject-zero-periodic-tasks branch January 19, 2024 01:57
Roiocam added a commit to Roiocam/pekko that referenced this pull request Jan 19, 2024
* fix: reject zero and negative periodic tasks schedule

* fix: undo the symbol change

* use different test name, redescribe the exception

* abstract check function

* remove the printlns change

* reduce time units scale convert
He-Pin pushed a commit that referenced this pull request Jan 19, 2024
* fix: reject zero and negative periodic tasks schedule

* fix: undo the symbol change

* use different test name, redescribe the exception

* abstract check function

* remove the printlns change

* reduce time units scale convert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants