[RDF] Add Skewness and Kurtosis actions to RDataFrame using Welford's algorithm #20517

edish-github · 2025-11-25T16:48:47Z

This Pull Request:

Adds .Skewness() and .Kurtosis() actions to RDataFrame.

Motivation

In high-energy physics, looking beyond the mean and standard deviation is usually critical.

Skewness helps in detecting asymmetry in energy distributions (which potentially indicates parity violation).
Kurtosis helps identify "heavy tails" where rare events (like new heavy particles or dark matter candidates) might hide.

But currently it requires users to either bin data into TH1 (losing precision) or write manual loops. This action allows for exact calculation in a single pass.

Implementation Details

I used Welford's Online Algorithm, which allows us to calculate mean, variance, skewness, and kurtosis simultaneously in one pass. This is numerically stable and fits the RDataFrame parallel map-reduce pattern.

Implementation Note: I initially explored an explicit SIMD implementation using ROOT::RVec.However, benchmarks on my local machine (M3, Clang -O3) showed that the compiler auto-vectorizes the scalar Welford loop very effectively, so i went for the scalar implementation

Changes or fixes:

Added SkewnessHelper and KurtosisHelper in tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx.
Exposed .Skewness() and .Kurtosis() in tree/dataframe/inc/ROOT/RDF/RInterface.hxx.
Registered the new action tags in tree/dataframe/inc/ROOT/RDF/InterfaceUtils.hxx.

Checklist:

tested changes locally (Verified against a reference Welford implementation via test macro)
updated the docs (Added Doxygen comments matching the StdDev style)

hahnjo · 2025-11-25T19:25:44Z

Thanks for the pointers to the numerically stable algorithm, I think I will borrow this for the global statistics of ROOT's new histograms 😉

edish-github · 2025-11-25T19:52:27Z

@hahnjo Thanks, It’s really cool that the algorithm might be useful for the new histograms too. 🚀

I’m currently working to implement the weighted version (West's algorithm) to support weighted skewness/kurtosis in RDF. The parallel merge steps get complex for cubic terms, but I verified the prototype against calculations, it holds up perfectly.

If you end up needing the weighted merge derivations (adapted from West/Chan) for the histogram work, please let me know—happy to share my prototype if it saves you some digging! 😊

dpiparo · 2025-11-26T07:11:44Z

Hello. A bit of context around this PR might be useful. Did you propose these changes to address an issue related to a task at hand or about which you learned? In other words, is this done to fix someone's problems?

edish-github · 2025-11-26T07:47:05Z

@dpiparo Hello, thanks for looking into this.

Motivation for this was primarily about the feature completeness. Since Mean and StdDev are already supported, adding Skewness and Kurtosis seemed like a logical step to bring RDataFrame closer to parity with other data analysis tools. I used Welford's algorithm to make sure the calculations are numerically stable, and I'm currently working on the statistical analysis for weighted distributions for these metrics as well.

I was also happy to see Jonas mention above that this approach might be helpful for the new histograms, so hopefully, this proves to be a useful addition to the codebase.

Best regards,
Edish.

hageboeck · 2025-11-26T10:26:48Z

Hello @edish-github,

I could imagine extending RDF, but before that, I have a little request 😅:
Could we separate whitespace changes and code changes? This tremendously helps the review.

We typically use git clang-format to format only the lines that have been touched. Like this, we progressively format the code base, but we don't introduce noise in "older" lines that were unaffected.

…ithm

edish-github · 2025-11-26T12:10:17Z

Hello @hageboeck, thanks for reviewing!

I've cleaned up the diff by reverting the unrelated whitespace changes and applying clang-format strictly to the new code. I also added the example usage snippets to the documentation to match the style of the other actions.

On a side note, I am really enjoying working with the RDF internals. Currently I'm working on tackling weighted skewness/kurtosis next (e.g., .Skewness(col, weight_col)), as weighted events are standard in MC analysis.
Would that be a welcome addition, or is there a higher priority area in RDF you would suggest?

Best regards,
Edish

edish-github · 2025-12-03T10:14:27Z

Hello @hageboeck (and @vepadulano @martamaja10),

I've updated the PR and ran the tests locally.
Could you please trigger the CI workflows when you have a moment? I'd like to ensure the tests pass on the official runners.

Best,
Edish

edish-github · 2025-12-19T08:24:14Z

Hello @hageboeck (@vepadulano and @martamaja10)

It's been a little while, and I completely respect your time and busy schedules. I just wanted to gently bump this PR to see if we can get the workflows approved or if you have any initial feedback on the implementation.

I am standing by for any feedback or changes required. Thanks!

vepadulano

Thank you @edish-github for this contribution! I have left some minor comments to start the review process. Even if I started the CI workflows now, there would be no new test to verify the new features. Could you add a few tests so that the CI will have something to process? For example you could add them in tree/dataframe/test/dataframe_simple.cxx

vepadulano · 2026-01-07T10:46:45Z

tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx

 #include <utility> // std::index_sequence
 #include <vector>
 #include <numeric> // std::accumulate in MeanHelper
+#include <cmath>


Just to know that we need this header, please add a comment to the right signaling which function is used from this header

vepadulano · 2026-01-07T10:49:27Z

tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx

   }
 };

+// Implements Welford's Online Algorithm for Skewness


Just for reference, I would add a link to the explanation of the algorithm, e.g. https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm

vepadulano · 2026-01-07T10:50:06Z

tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx

+   SkewnessHelper(SkewnessHelper &&) = default;
+   SkewnessHelper(const SkewnessHelper &) = delete;


Should apply rule of five here (missing assignment operators and destructor)

vepadulano · 2026-01-07T10:54:53Z

tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx

+   // Helper functions for RMergeableValue
+   std::unique_ptr<RMergeableValueBase> GetMergeableValue() const final { return nullptr; }


Please add a TODO here, without the mergeable value implementation we can't support these methods in distributed RDataFrame

edish-github · 2026-01-09T18:00:40Z

Hi @vepadulano, really appreciate your detailed feedback! I have updated the PR to address the comments:

Documentation: Added the explanation for and the Wikipedia references for Welford's algorithm.
Rule of Five: Explicitly handled the destructor, copy, and move constructors/assignments for both SkewnessHelper and KurtosisHelper.
Distributed Support: Added the GetMergeableValue implementation returning nullptr with the requested TODO.
Testing: Added a regression test in dataframe_simple.cxx covering both Skewness and Kurtosis.
Ready for another look when you have a moment!

edish-github requested review from martamaja10 and vepadulano as code owners November 25, 2025 16:48

edish-github changed the title ~~Add Skewness and Kurtosis actions to RDataFrame using Welford's algor…~~ [RDF] Add Skewness and Kurtosis actions to RDataFrame using Welford's algorithm Nov 25, 2025

dpiparo assigned hageboeck Nov 26, 2025

edish-github force-pushed the feat-rdf-stats branch 2 times, most recently from a011edb to bc4d707 Compare November 26, 2025 11:34

Add Skewness and Kurtosis actions to RDataFrame using Welford's algor…

64041cc

…ithm

edish-github force-pushed the feat-rdf-stats branch from bc4d707 to 64041cc Compare November 26, 2025 11:51

Fix: Add overflow protection and missing semicolon

dc12911

vepadulano requested changes Jan 7, 2026

View reviewed changes

Address review comments: Rule of Five, docs, and tests

1a2b0ac

		SkewnessHelper(SkewnessHelper &&) = default;
		SkewnessHelper(const SkewnessHelper &) = delete;

		// Helper functions for RMergeableValue
		std::unique_ptr<RMergeableValueBase> GetMergeableValue() const final { return nullptr; }

[RDF] Add Skewness and Kurtosis actions to RDataFrame using Welford's algorithm #20517

Are you sure you want to change the base?

[RDF] Add Skewness and Kurtosis actions to RDataFrame using Welford's algorithm #20517

Uh oh!

Conversation

edish-github commented Nov 25, 2025

This Pull Request:

Motivation

Implementation Details

Changes or fixes:

Checklist:

Uh oh!

hahnjo commented Nov 25, 2025

Uh oh!

edish-github commented Nov 25, 2025

Uh oh!

dpiparo commented Nov 26, 2025

Uh oh!

edish-github commented Nov 26, 2025

Uh oh!

hageboeck commented Nov 26, 2025

Uh oh!

edish-github commented Nov 26, 2025

Uh oh!

edish-github commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edish-github commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vepadulano left a comment

Choose a reason for hiding this comment

Uh oh!

vepadulano Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

vepadulano Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

vepadulano Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

vepadulano Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

edish-github commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

edish-github commented Dec 3, 2025 •

edited

Loading

edish-github commented Dec 19, 2025 •

edited

Loading