-
Notifications
You must be signed in to change notification settings - Fork 746
Closed
Description
A note in docs says the unique operator has to block for all values in the source channel, but in testing it doesn't do or need to.
This is important to know since it allows for higher parallelism and efficiency.
nextflow/docs/reference/operator.md
Line 1660 in 8e37b5b
| The difference between `unique` and `distinct` is that `unique` removes *all* duplicate values, whereas `distinct` removes only *consecutive* duplicate values. As a result, `unique` must process the entire source channel before it can emit anything, whereas `distinct` can emit each value immediately. |
In the repro, you can see that the values already get emitted before the 100 seconds pass.
repro (nextflow 25.10.0):
process sleep {
input:
val v1
output:
val v1
script:
"""
sleep ${v1}
"""
}
workflow {
sleep(Channel.of("10","1","2","1","3","100","1","5")).unique().view()
// simulate blocking
// sleep(Channel.of("10","1","2","1","3","100","1","5")).collect().flatten().unique().view()
}
Metadata
Metadata
Assignees
Labels
No labels