-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement shortcut for 0.0 and 1.0 percentile calculations, and add two new tests #382
Conversation
* | ||
* @param function map the items in the streams into values | ||
* @param comparator comparator used for sorting the items | ||
* @return a collector that calculates the derived <code>PERCENTILE_DISC(percentile)</code> function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably a good idea to document these in general, but I prefer this be done in a separate task, for the entire API, not just this method. I've created a new issue to track this: #388.
// CS304 Issue link: https://github.com/jOOQ/jOOL/issues/376 | ||
if (percentile == 0.0) | ||
// If percentile is 0, this is the same as taking the item with the minimum value. | ||
return minBy(function, comparator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says the same thing as the method call, so it isn't really necessary. I'll remove it again after merging.
*/ | ||
public static <T, U> Collector<T, ?, Optional<T>> percentileBy(double percentile, Function<? super T, ? extends U> function, Comparator<? super U> comparator) { | ||
if (percentile < 0.0 || percentile > 1.0) | ||
throw new IllegalArgumentException("Percentile must be between 0.0 and 1.0"); | ||
|
||
// CS304 Issue link: https://github.com/jOOQ/jOOL/issues/376 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what CS304 means. Some reference to some external tracking system? There's no need for this, I will remove it. The convention to track github issues (if necessary) would be to use:
// [#376] Rationale
// If there are multiple maxima, take the last one. | ||
return maxBy(function, (o1, o2) -> { | ||
int compareResult = comparator.compare(o1, o2); | ||
return compareResult == 0 ? -1 : compareResult; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hack violates the Comparator
contract. We can't implement it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the correct way to implement this:
collectingAndThen(maxAllBy(function, comparator), s -> s.findLast())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emmm, I think my solution is not incorrect, because the function maxBy
will not sort the elements but compare them one by one, and the input comparator will not be modified. So even though this implementation violates the design philosophy of Comparator
, it works and there seems no potential problem with it.
It's perfectly fine to implement this with maxAllBy
, but in the worst case the space complexity will be O(n). That's my concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's incorrect. If comparator.compare(o1, o2) == 0
, then comparator.compare(o2, o1) == 0
, yet you return -1
in both cases. I don't want to spend the time now to find an edge case where this breaks sorting algorithms, but it should be easy to get an intuition about how this hack feels very wrong
because the function
maxBy
will not sort the elements but compare them one by one
You shouldn't rely on such an implementation detail.
So even though this implementation violates the design philosophy of
Comparator
, it works and there seems no potential problem with it.
Famous last words :)
It's perfectly fine to implement this with
maxAllBy
, but in the worst case the space complexity will be O(n). That's my concern.
I'm open to other suggestions, but correctness always beats performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. Thanks for your reply!
#376
If the
percentile
is 0 or 1, it's unnecessary to sort all the elements in the container before getting the output by the index; instead, we can just scan all the items in one traversal and get the minimum or maximum with a time complexity of O(n).All the tests passed after the modification. Besides, in the test
testPercentileWithStringsAndFunction
, since the function isString::length
and each string in the stream is of length 1, the sorting result should be the same as the original order of the input elements.I think the orginal test is not enough, so I add two new tests for the function
percentileBy
.