-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nonunique. #506
base: master
Are you sure you want to change the base?
Add nonunique. #506
Conversation
Nonunique returns the already seen elements of sequence.
Guarding the seen_add call can improve performance when there are a high ratio of duplicates.
@eriknw Can I get your thoughts on this? |
Thanks @groutr! Everything here looks reasonable and good. I'm curious: do you have a use case for this? And sorry for my delay. This year has been, uh, a little crazy. |
I'm sure that I had a better use case when I created the PR that I cannot recall now. One use case that currently comes to mind: when I'm asking "is this distinct", many times I'm really meaning to ask "why isn't this distinct"? If |
Yeah, that sounds reasonable. |
@eriknw which name do you find easier to remember? |
I think I prefer the name |
I think this is ready. What do you think @eriknw? |
itertoolz.unique
yields the never before seen elements of sequence.nonunique
is the complement, yielding the already seen elements of a sequence.This is incredibly useful for finding duplicates in a sequence.
This isn't really a new feature to itertoolz, but instead exposes an already existing feature.
isdistinct
already had this logic, but instead of returning True/False, I return the already seen elements as they are encountered. This PR simply moves the logic into its own function.ping: @eriknw