Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement shortcut for 0.0 and 1.0 percentile calculations, and add two new tests #382

Merged
merged 5 commits into from
May 21, 2021

Conversation

JingHuaMan
Copy link
Contributor

#376

If the percentile is 0 or 1, it's unnecessary to sort all the elements in the container before getting the output by the index; instead, we can just scan all the items in one traversal and get the minimum or maximum with a time complexity of O(n).

All the tests passed after the modification. Besides, in the test testPercentileWithStringsAndFunction, since the function is String::length and each string in the stream is of length 1, the sorting result should be the same as the original order of the input elements.

I think the orginal test is not enough, so I add two new tests for the function percentileBy.

*
* @param function map the items in the streams into values
* @param comparator comparator used for sorting the items
* @return a collector that calculates the derived <code>PERCENTILE_DISC(percentile)</code> function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a good idea to document these in general, but I prefer this be done in a separate task, for the entire API, not just this method. I've created a new issue to track this: #388.

// CS304 Issue link: https://github.com/jOOQ/jOOL/issues/376
if (percentile == 0.0)
// If percentile is 0, this is the same as taking the item with the minimum value.
return minBy(function, comparator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the same thing as the method call, so it isn't really necessary. I'll remove it again after merging.

@lukaseder lukaseder merged commit ee8b7e1 into jOOQ:main May 21, 2021
*/
public static <T, U> Collector<T, ?, Optional<T>> percentileBy(double percentile, Function<? super T, ? extends U> function, Comparator<? super U> comparator) {
if (percentile < 0.0 || percentile > 1.0)
throw new IllegalArgumentException("Percentile must be between 0.0 and 1.0");

// CS304 Issue link: https://github.com/jOOQ/jOOL/issues/376
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what CS304 means. Some reference to some external tracking system? There's no need for this, I will remove it. The convention to track github issues (if necessary) would be to use:

// [#376] Rationale

// If there are multiple maxima, take the last one.
return maxBy(function, (o1, o2) -> {
int compareResult = comparator.compare(o1, o2);
return compareResult == 0 ? -1 : compareResult;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hack violates the Comparator contract. We can't implement it like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the correct way to implement this:

collectingAndThen(maxAllBy(function, comparator), s -> s.findLast())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emmm, I think my solution is not incorrect, because the function maxBy will not sort the elements but compare them one by one, and the input comparator will not be modified. So even though this implementation violates the design philosophy of Comparator, it works and there seems no potential problem with it.

It's perfectly fine to implement this with maxAllBy, but in the worst case the space complexity will be O(n). That's my concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's incorrect. If comparator.compare(o1, o2) == 0, then comparator.compare(o2, o1) == 0, yet you return -1 in both cases. I don't want to spend the time now to find an edge case where this breaks sorting algorithms, but it should be easy to get an intuition about how this hack feels very wrong

because the function maxBy will not sort the elements but compare them one by one

You shouldn't rely on such an implementation detail.

So even though this implementation violates the design philosophy of Comparator, it works and there seems no potential problem with it.

Famous last words :)

It's perfectly fine to implement this with maxAllBy, but in the worst case the space complexity will be O(n). That's my concern.

I'm open to other suggestions, but correctness always beats performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Thanks for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants