Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add findDuplicates Method to ListUtils with Comprehensive Test Cases #537

Closed
wants to merge 35 commits into from

Conversation

hemanth0525
Copy link

Summary

This pull request introduces the findDuplicates method to the ListUtils class, a feature designed to enhance the functionality of list operations by providing a straightforward way to detect duplicate elements.

Why This Change?

In many software development scenarios, detecting duplicates within a list is a common task. This functionality is essential for tasks such as data validation, cleaning, and processing. Instead of requiring developers to implement custom solutions or rely on third-party libraries, this method integrates seamlessly into the ListUtils class, offering a reliable and efficient way to handle duplicate detection.

How This Change is Useful

  • Efficiency:

    • The findDuplicates method is implemented with optimal performance in mind, ensuring that duplicate detection is handled efficiently even for large lists.
  • Simplification:

    • Provides a single, reusable method for duplicate detection, reducing the need for repetitive code across different projects.
  • Versatility:

    • Handles a variety of scenarios, including lists with different types of elements, empty lists, and lists with no duplicates.

Example Usage

List<Integer> numbers = Arrays.asList(1, 2, 3, 2, 4, 5, 3);
List<Integer> duplicates = ListUtils.findDuplicates(numbers); // [2, 3]

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the result is effectively a set and the input does not have to be restricted to a list, it would be better to do this IMO:

    public static <E> Set<E> duplicates(final Collection<E> list) {
        final HashSet<E> dups = new HashSet<>();
        final HashSet<E> set = new HashSet<>();
        for (E e : list) {
            (set.contains(e) ? dups : set).add(e);
        }
        return dups;
    }

and put it in CollectionUtils probably.

@hemanth0525
Copy link
Author

Since the result is effectively a set and the input does not have to be restricted to a list, it would be better to do this IMO:

    public static <E> Set<E> duplicates(final Collection<E> list) {
        final HashSet<E> dups = new HashSet<>();
        final HashSet<E> set = new HashSet<>();
        for (E e : list) {
            (set.contains(e) ? dups : set).add(e);
        }
        return dups;
    }

and put it in CollectionUtils probably.

Thanks for the feedback. The ListUtils implementation is tailored for List<E>, preserving the order of elements and identifying duplicates where the sequence matters. This approach is particularly useful in scenarios such as:

  • Preserving Order: When the sequence of elements is important, such as in order-sensitive algorithms or data processing tasks.
  • List-Specific APIs: When working with APIs that return or expect List, ensuring that the utility aligns with the data structure's inherent properties.
  • Performance Considerations: In cases where developers need efficient duplicate detection while maintaining the list's order, which can be crucial in performance-sensitive applications.

Keeping this method in ListUtils supports these specific needs and maintains consistency for developers working with ordered collections.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments. The input can still be

@garydgregory
Copy link
Member

garydgregory commented Sep 1, 2024

Hello @hemanth0525

Please see git master for:

  • Add CollectionUtils.duplicateList(Collection)
  • Add CollectionUtils.duplicateSequencedSet(Collection)
  • Add CollectionUtils.duplicateSet(Collection)

You are credited in changes.xml

Are there tests or functionality we could provide specific to ListUtils?

I believe that this PR's new ListUtils method is now the same in functionality as CollectionUtils.duplicateList(Collection<E>)

Note that this PR is still failing. Run 'mvn' by itself before you push to catch all build errors.

TY!

@hemanth0525
Copy link
Author

hemanth0525 commented Sep 1, 2024

hello @garydgregory,

TY !

I guess all the change requests are done.

i have run mvn locally and build success !

@hemanth0525
Copy link
Author

hello @garydgregory
is this PR ready to merge ?

* @since 4.5.0-M3
*/
public static <E> List<E> findDuplicates(final List<E> list) {
Objects.requireNonNull(list, "The input list must not be null.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hemanth0525

If you replace the body of this (git master) method with:

return IterableUtils.duplicateList(list);

You'll find that all tests pass.
So I don't think we need this method in this class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u mean from CollectionUtils ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if yes, I have done it. So yeah utilizing the other function to help list functionalities.

* @since 4.5.0-M3
*/
public static <E> List<E> findDuplicates(final List<E> list) {
return CollectionUtils.duplicateList(list);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you rebase on git master, you'll find this method in 'IterableUtils'.

@hemanth0525
Copy link
Author

Oh Yeah !!
Done!!

@garydgregory
Copy link
Member

garydgregory commented Sep 3, 2024

Hello @hemanth0525
If all the new method does is delegate to IterableUtils.duplicateList(), it is unnecessary. The tests in this PR may cover use cases not in IterableUtilsTest.

@hemanth0525
Copy link
Author

@garydgregory Okay so what u want me to do in this case ?

@garydgregory
Copy link
Member

I would either close the PR or update it if there are missing use cases in IterableUtilsTest (if yes, then also remove the new method from this PR).

@hemanth0525
Copy link
Author

you can update

@garydgregory
Copy link
Member

I don't understand what you mean.

@hemanth0525
Copy link
Author

hemanth0525 commented Sep 3, 2024

sorry !
I will look for the use cases.

Suggested Improvement
For the duplicateList method, it might be more appropriate to call duplicateSequencedSet instead of duplicateSet. Since the method returns a List, preserving the order of elements is often expected or desirable. By using duplicateSequencedSet, we can ensure that the order of duplicate elements is maintained, providing a more intuitive and consistent behavior for users of this method.

    /**
     * Finds and returns the List of duplicate elements in the given collection.
     *
     * @param <E> the type of elements in the collection.
     * @param iterable the list to test, must not be null.
     * @return the set of duplicate elements, may be empty.
     * @since 4.5.0-M3
     */
    public static <E> List<E> duplicateList(final Iterable<E> iterable) {
        return new ArrayList<>(duplicateSet(iterable));
    }

Shall i make this update ?

@garydgregory
Copy link
Member

@hemanth0525

In git master, we now make sure IterableUtils.duplicateList() has predictable order in its
result.

  • Add testDuplicateListMultipleDuplicatesInDequeReverse()
  • Add testDuplicateListMultipleDuplicatesInListReverse()
  • If you do not care about order, you can use IterableUtils.duplicateSet()

@hemanth0525
Copy link
Author

@garydgregory I just did that and thought of commiting...........

@hemanth0525
Copy link
Author

Hello @garydgregory,

I have added several diverse test cases to IterableUtilsTest.java. Kindly review them.

@hemanth0525
Copy link
Author

update on this PR ?

@garydgregory
Copy link
Member

Hello @hemanth0525

Does the PR now provide additional code coverage? You can tell when you run "mvn clean install site" and you look at the JaCoCo report.

Gary

@hemanth0525
Copy link
Author

i dont see any additional code coverage

@hemanth0525
Copy link
Author

this PR started to provide functionalities to find duplicates to list which u have implemented in IterableUtils for all collections. so ig lets close this PR now

@hemanth0525 hemanth0525 closed this by deleting the head repository Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants