-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add switchMap operator #1787
Conversation
|
||
/* | ||
* Copyright (C) 2014-2022 Lightbend Inc. <https://www.lightbend.com> | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is needed, but: a good fraction of my test code is a result of copy/paste/modify from org.apache.pekko.stream.scaladsl.FlowFlattenMergeSpec
which is code "inherited" from Lightbend. (See various comments // copied (with modifications) from org.apache.pekko.stream.scaladsl.FlowFlattenMergeSpec
below.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - if you copy code from a class that has headers like this into a new source file then that source file needs the headers from the original file(s).
|
||
package org.apache.pekko.stream.scaladsl | ||
|
||
import org.apache.pekko.NotUsed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one nit is that the project code style is to import org.apache.pekko
and then the subsequent imports can begin import pekko.stream...
(ie omit the org.apache bit).
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* license agreements; and to You under the Apache License, version 2.0: | ||
* | ||
* https://www.apache.org/licenses/LICENSE-2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brand new source files with no copying from existing source files should have the standard Apache source header not this one.
https://github.com/apache/pekko/blob/main/project/AddMetaInfLicenseFiles.scala#L1-L16
*/ | ||
package org.apache.pekko.stream.impl.fusing | ||
|
||
import org.apache.pekko.annotation.InternalApi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment elsewhere about scala imports for import org.apache.pekko
- adjust import style - adjust header
(add missing newline)
needs
|
Thanks - and sorry for choosing the "submit your PR straight away ... without reading the rest of this wonderful doc" route. |
@@ -0,0 +1,29 @@ | |||
# switchMap | |||
|
|||
Transforms each input element into a `Source` of output elements that is then flattened into the output stream until a new input element is received. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then the old stream is been cancelled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to be more explicit in the docs about the behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was struggling to find the right trade of between brevety (after all, there is a one-line-limit enforced for this part here) and explicitness. I now added a little bit more to this line and even more in the desciption section below.
(I'd be very open for any specific wording suggestions if you feel the updated version is not clear enough.)
} | ||
|
||
setHandler(in, | ||
new InHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fold the in handler and the outhandler to the graphstagelogic to avoid some allocation
override def createLogic(enclosingAttributes: Attributes) = | ||
new GraphStageLogic(shape) { | ||
|
||
var source = Option.empty[SubSinkInlet[T]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use OptionVal to avoid boxing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private[pekko] object OptionVal { |
def setSource(source: Graph[SourceShape[T], M]): Unit = { | ||
cancelCurrentSource() | ||
removeCurrentSource(completeIfClosed = false) | ||
val sinkIn = new SubSinkInlet[T]("SwitchSink") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can avoid some materialization if the source is SingleSource
or IterableSource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a brief shot at that before I submitted the PR but then refrained from that because my initial naive approach lead to inconsistencies (my test case "not behave differently for substreams Source.single(x) and Source(List(x))" results from that).
The thing I found challenging: I'd expect that as long as further values are available from upstream, any previous values from upstream are discarded without emitting a single item.
For example, the following should and currently does result in Seq(2, 3) (with the 1 being discarded):
Source(List(Source(List(1)), Source(List(2, 3))))
.switchMap(identity)
.runWith(Sink.seq)
I'd expect the same for
Source(List(Source.single(1), Source(List(2, 3))))
.switchMap(identity)
.runWith(Sink.seq)
That means that for example, I detect a SingleSource
as current substream, I cannot just emit the single value, but need to check wether further items are available from upstream before doing that (and discard the single value if there are).
At the point where I had understood that, I decided to keep the initial implementation simple.
Now that almost everything else seems to be more or less agreed on, I might try to give it another shot sometime during the coming week and see if I can get that optimization done without breaking the above expectations (maybe one of you has some pointers?). But I might end up at a point where I'd suggest to merge this at is it is (then hoping for someone else to get that optoimization done).
Thank you for this. I have some pr that avoids the materialization of subSource, which you can refer to. |
private val in = Inlet[Graph[SourceShape[T], M]]("switch.in") | ||
private val out = Outlet[T]("switch.out") | ||
|
||
override def initialAttributes = DefaultAttributes.switch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better with the lambda line number too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@He-Pin do you still want this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented it, but not within this file (because the actual lambda f
is not available here): https://github.com/MartinHH/pekko/blob/90c8ea8b7aca4ecd0c5e39b94e101174d8ddd617/stream/src/main/scala/org/apache/pekko/stream/scaladsl/Flow.scala#L2824
/** | ||
* INTERNAL API | ||
*/ | ||
@InternalApi private[pekko] final class Switch[T, M] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe SwitchMap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be fine with renaming this, but maybe it's worth to explain my choice of name before we decide whether to rename:
- This
GraphStage
does not evaluate the lambda - similar toflatMapMerge
andflatMapConcat
, the evalation off
is delegated to themap
operator and then only the "flattening" is done in theGraphStage
. (My reasons for implementing it that way: consistency with the afforementioned operators and less redundancy in code because I did not need to care about things likef
throwing.) - For those other operators that delegate evaluation of
f
tomap
, the naming scheme seems to be to not haveMap
in the name of theGraphStage
(the stage forflatMapMerge
is calledFlattenMerge
) - The transformation that is implemented by this
GraphStage
("switching" without the "mapping" part) corresponds to what an operator that is namedswitch
in both ReactiveX and Monix does
So following my train of thought, if this stage were named SwitchMap
one should move the evaluation of f
into the GraphStage
.
Don't get me wrong: I'd be fine with any solution (keep it as it is; rename to SwitchMap
and move evaluation of f
into the GraphStage
or just rename and keep evaluation of f
where it is), I just wanted to share my reasoning before I make any changes.
I'm Ok with the current shape too, which we can polish it later, but let's wait a little longer for @MartinHH if he has some more updates, wdyt. I would also like to have #1702 later If you have time @pjfanning . |
@pjfanning @MartinHH BTW, the Kotlin flow is using |
I'm kind of biased to the naming from certain libraries (RxScala, Monix) because I learned about all those stream operators back when the official scala coursera courses had Erik Meijer teach RxScala, so for me, this is "switchMap". Given that Rx is still very popular in certain ecosystems (e.g. for Js/Ts frontend), I'd say one can't go wrong following their naming, but then again: I don't really mind. |
docs/src/main/paradox/stream/operators/Source-or-Flow/switchMap.md
Outdated
Show resolved
Hide resolved
docs/src/main/paradox/stream/operators/Source-or-Flow/switchMap.md
Outdated
Show resolved
Hide resolved
@MartinHH Yes, I see the code in Spring Ai, which is using Flux#switchMap too. Thanks for the input. |
* | ||
* '''Emits when''' the current substream has an element available | ||
* | ||
* '''Backpressures when''' never |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this never
is really great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought: this might be formally correct as it never backpressures on upstream (and backpressure on upstream is what is to be adressed here if I'm not mistaken), but maybe it would be more helpful to mention the backpressure on substream, e.g.:
* '''Backpressures when''' never | |
* '''Backpressures when''' never on upstream (backpressures on current substream when downstream backpressures) |
wdyt?
* @since 1.2.0 | ||
*/ | ||
def switchMap[T, M](f: Out => Graph[SourceShape[T], M]): Repr[T] = | ||
map(f).via(new Switch[T, M].addAttributes(Attributes(SourceLocation.forLambda(f)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You implemented it with a map
and switch
cool.
} | ||
|
||
setHandler(in, this) | ||
setHandler(out, this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setHandlers(in, out, this)
} | ||
|
||
private def cancelCurrentSource(): Unit = { | ||
if (source.isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better with pattern matching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stopped being passionate about such things so I just followed your suggestion, although I do not agree 😉 .
} | ||
} | ||
|
||
override def postStop(): Unit = cancelCurrentSource() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should cancel the current source in onDownStreamFinished
instead of here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I followed the example of FlattenMerge
where the cancelling of substreams is done in postStop()
:
pekko/stream/src/main/scala/org/apache/pekko/stream/impl/fusing/StreamOfStreams.scala
Line 146 in 5f64ccb
override def postStop(): Unit = sources.foreach(_.cancel()) |
onDownStreamFinished
is not overriden).
I'd expect that postStop
will eventually be called in any case and don't see a risk of cancelCurrentSource()
being called from here when it shouldn't, so doing it in this "catch-all" handler feels safer to me. (But maybe I'm just missing the significant difference?)
} | ||
} | ||
|
||
def setSource(source: Graph[SourceShape[T], M]): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setupCurrentSource
maybe a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, thank you very much for this.
I submitted another set of suggested changes. Looking at the sunny weather forecast, I doubt that I will get to figure out how to properly implement avoiding materialization for certain sources in the coming days, so I'd suggest to merge if this my latest changes get approval. |
@MartinHH, I think we can merge this. I can prepare to continue a PR after your excellent work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Ok, I think I'm done here, so go ahead and merge as you feel. Btw: thanks for the welcoming experience on here! |
Thank you very much, it completes the pekko stream for ai scenarios. |
Implements #974