Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Contains for Distinct/Union with custom comparer #112815

Merged
merged 3 commits into from
Feb 26, 2025

Conversation

stephentoub
Copy link
Member

The recently-added optimization needs to factor in a non-default comparer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR fixes the behavior of the Contains method in the LINQ Union and Distinct implementations to correctly handle custom comparers. Key changes include:

  • Updating Union.SpeedOpt.cs to explicitly return false for default comparer cases instead of default.
  • Modifying Distinct.SpeedOpt.cs to use a conditional expression that differentiates between default and custom comparer handling.
  • Enhancing test coverage in ContainsTests.cs for both Union and Distinct with custom comparer scenarios.

Reviewed Changes

File Description
src/libraries/System.Linq/src/System/Linq/Union.SpeedOpt.cs Adjusts the Contains override to return false explicitly when using the default comparer.
src/libraries/System.Linq/src/System/Linq/Distinct.SpeedOpt.cs Updates the Contains override to choose between _source.Contains(value) and base.Contains(value) based on the comparer.
src/libraries/System.Linq/tests/ContainsTests.cs Adds new tests to verify behavior changes when a custom comparer is specified.

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

Tip: Copilot code review supports C#, Go, Java, JavaScript, Markdown, Python, Ruby and TypeScript, with more languages coming soon. Learn more

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

// source.Contains(value), as the Distinct() won't remove anything that could have caused
// Contains to return true. If, however, there is a custom comparer, Distinct might remove
// the elements that would have matched, and thus we can't skip it.
_comparer is null || _comparer == EqualityComparer<TSource>.Default ? _source.Contains(value) :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the reason we need to check both for null and EqualityComparer<TSource>.Default even though they functionally have the same behavior is because we use the latter representation for reference types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to check for both; we could just check for null. But the latter is functionally equivalent and is substituted in for null in various places. I can remove it, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the iterator constructor just normalize to one of the two representations?

The recently-added optimization needs to factor in a non-default comparer.
Copy link
Contributor

@IDisposable IDisposable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do a check for _comparer is null || _comparer == EqualityComparer<TSource>.Default
in Union.SpeedOpt.cs but only check _comparer is null in Distinct.SpeedOpt.cs?

Seems like we should do the same test in both places.

@stephentoub
Copy link
Member Author

stephentoub commented Feb 27, 2025

Why do we do a check for _comparer is null || _comparer == EqualityComparer<TSource>.Default in Union.SpeedOpt.cs but only check _comparer is null in Distinct.SpeedOpt.cs?

Seems like we should do the same test in both places.

I deleted it from both but apparently didn't push the update that deleted it from one of them. Would you like to submit a pr that temoves the extra check?

@IDisposable
Copy link
Contributor

IDisposable commented Feb 27, 2025

I have only a slight bias toward checking for .Default in both places, but if eliding it is the preference I can delete it in .Union I welcome a decision either way from @stephentoub and/or @eiriktsarpalis :) Will be happy to add a PR then ;)

Also, it seems a bit "fragile" / "insider knowledge" in the ContainsTests.cs that the survivor will be the first value encountered (e.g. "a" will be what remains after the .Distinct or .Union on line 264 and 286). Is that something that is disclosed/guaranteed?

@IDisposable
Copy link
Contributor

I have only a slight bias toward checking for .Default in both places

I realize I should explain this bias. Since having _comparer is null means the same thing as/causes the EqualityComparer<TSource>.Default to be used, it would seem checking for just is null would be enough, but what happens when the Enumerable is created with an explicit use of EqualityComparer<TSource>.Default passed in the constructor? In that case, we're not going to trigger this speedopt, which seems suboptimal.

@stephentoub
Copy link
Member Author

I have only a slight bias toward checking for .Default in both places

I realize I should explain this bias. Since having _comparer is null means the same thing as/causes the EqualityComparer<TSource>.Default to be used, it would seem checking for just is null would be enough, but what happens when the Enumerable is created with an explicit use of EqualityComparer<TSource>.Default passed in the constructor? In that case, we're not going to trigger this speedopt, which seems suboptimal.

The flip side is that it's an extra check, and it's not super common for EqualityComparer.Default to be explicitly passed in. The initial code I'd added had both checks, and I got feedback to cull it back to just the null check, so that's what we'd like it to be. I just somehow missed updating both locations.

@IDisposable
Copy link
Contributor

PR #112973 to remove the test of .Default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants