fix: correct weight handling in approx_percentile_cont_with_weight #19941
+38
−43
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The approx_percentile_cont_with_weight function was producing incorrect results due to wrong weight handling in the TDigest implementation.
Root cause: In TDigest::new_with_centroid(), the count field was hardcoded to 1 regardless of the actual centroid weight, while the weight was correctly used in the sum calculation. This mismatch caused incorrect percentile calculations since estimate_quantile() uses count to compute the rank.
Changes:
Which issue does this PR close?
Rationale for this change
The
approx_percentile_cont_with_weightfunction produces incorrect weighted percentile results. The bug is in the TDigest implementation wherenew_with_centroid()setscount: 1regardless of the actual centroid weight, while the weight is used elsewhere in centroid merging. This mismatch corrupts the percentile calculation.What changes are included in this PR?
TDigest::countfromu64tof64to properly support fractional weights (consistent with ClickHouse's TDigest implementation)new_with_centroid()to usecentroid.weightfor countstate_fields()inapprox_percentile_contandapprox_medianto useFloat64for the count fieldmerge_digests()when all centroids have zero weight to prevent panicAre these changes tested?
Yes.
Are there any user-facing changes?
Yes, this is a breaking change: