Skip to content

Incorrect docs, unique doesn't block #6531

@feiloo

Description

@feiloo

A note in docs says the unique operator has to block for all values in the source channel, but in testing it doesn't do or need to.
This is important to know since it allows for higher parallelism and efficiency.

The difference between `unique` and `distinct` is that `unique` removes *all* duplicate values, whereas `distinct` removes only *consecutive* duplicate values. As a result, `unique` must process the entire source channel before it can emit anything, whereas `distinct` can emit each value immediately.

In the repro, you can see that the values already get emitted before the 100 seconds pass.

repro (nextflow 25.10.0):

process sleep {
  input:
  val v1

  output:
  val v1

  script:
  """
  sleep ${v1}
  """
}
workflow {
    sleep(Channel.of("10","1","2","1","3","100","1","5")).unique().view()
    // simulate blocking
    // sleep(Channel.of("10","1","2","1","3","100","1","5")).collect().flatten().unique().view()
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions