Skip to content

[MRG] FIX macro_averaged_mean_absolute_error: skip classes absent from y_true (#1094)#1172

Open
jbbqqf wants to merge 2 commits into
scikit-learn-contrib:masterfrom
jbbqqf:feat/1094-fix-macro-mae-empty-class
Open

[MRG] FIX macro_averaged_mean_absolute_error: skip classes absent from y_true (#1094)#1172
jbbqqf wants to merge 2 commits into
scikit-learn-contrib:masterfrom
jbbqqf:feat/1094-fix-macro-mae-empty-class

Conversation

@jbbqqf

@jbbqqf jbbqqf commented May 9, 2026

Copy link
Copy Markdown

Reference Issue

Fixes #1094

What does this implement/fix? Explain your changes.

macro_averaged_mean_absolute_error(y_true, y_pred) raised
ValueError: Found array with 0 sample(s) whenever a class appeared in
y_pred but was absent from y_true. The implementation iterated over
unique_labels(y_true, y_pred) and called mean_absolute_error on the
slice y_true == c; when no ground-truth samples carried label c,
the slice was empty and sklearn's mean_absolute_error raised on
zero-length input.

The fix iterates over unique_labels(y_true) only — per-class MAE is
undefined for a class with zero ground-truth samples, so it is excluded
from the macro average. This matches the docstring intent ("computes
each MAE for each class") and aligns with the graceful handling of
absent labels in sklearn.metrics.f1_score(average='macro').

Reproduce BEFORE/AFTER yourself (copy-paste)

A reviewer can verify this fix end-to-end by pasting the block below.

# --- one-time setup ---
git clone https://github.com/scikit-learn-contrib/imbalanced-learn.git /tmp/repro-1094 && cd /tmp/repro-1094
python -m venv .venv && source .venv/bin/activate
pip install -q -e '.[tests]'

# --- BEFORE (origin/master) ---
git checkout origin/master
python -c "from imblearn.metrics import macro_averaged_mean_absolute_error; print(macro_averaged_mean_absolute_error([0, 0], [0, 1]))"
# Expected: ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.

# --- AFTER (this PR) ---
git fetch https://github.com/jbbqqf/imbalanced-learn.git feat/1094-fix-macro-mae-empty-class
git checkout FETCH_HEAD
python -c "from imblearn.metrics import macro_averaged_mean_absolute_error; print(macro_averaged_mean_absolute_error([0, 0], [0, 1]))"
# Expected: 0.5  (no exception — the predicted-only class 1 is skipped)
What I ran locally
  • pytest imblearn/metrics/ -v → 210 passed (208 existing + 2 new
    parametrised assertions in test_macro_averaged_mean_absolute_error_class_only_in_y_pred).
  • The new regression test (test_macro_averaged_mean_absolute_error_class_only_in_y_pred)
    fails on origin/master with the exact ValueError from the bug
    report and passes on this branch.
  • Verified that the existing parametrised happy-path tests
    (test_macro_averaged_mean_absolute_error, ..._sample_weight)
    still pass — the change only affects the degenerate "class missing
    from y_true" case.
Edge cases tested
# Scenario Input Expected Verified by
1 Class only in y_pred, single ground-truth class [0,0], [0,1] 0.5 (single-class macro = MAE on full slice) test_macro_averaged_mean_absolute_error_class_only_in_y_pred
2 Class only in y_true [0,1], [0,0] 0.5 (per-class MAE = [0, 1], mean = 0.5) test_macro_averaged_mean_absolute_error_class_only_in_y_pred
3 Both classes present (existing happy path) [1,1,1,2,2,2], [1,2,1,2,1,2] 0.333 (unchanged) test_macro_averaged_mean_absolute_error[y_true0-...]
Risk / blast radius

Strictly narrower behaviour: the function used to raise on inputs where
a label is predicted but absent from y_true and now returns a
finite value. Code paths that hit the bug were getting an exception,
not a meaningful number, so no working callers can regress.

Any other comments?

  • Changelog entry added under "Bug fixes" in doc/whats_new/v0.15.rst
    (current dev version). The PR-number placeholder is :pr:\0`; I'll push a follow-up commit with the real number once GitHub assigns one, per the check-changelog.yml` action.
  • The [MRG] prefix is on the title.
FIX :func:`imblearn.metrics.macro_averaged_mean_absolute_error` no
longer raises ``ValueError`` when a class is present in ``y_pred`` but
absent from ``y_true``; such classes are now skipped, mirroring
``sklearn.metrics.f1_score(average='macro')``.

PR drafted with assistance from Claude Code. The change was reviewed
manually against scikit-learn-contrib/imbalanced-learn's source and the
upstream spec/docs cited above. The reproducer block above was used
during development; it is the same one a reviewer can paste verbatim.

jbbqqf and others added 2 commits May 9, 2026 20:07
…ue (scikit-learn-contrib#1094)

When a class appeared only in y_pred (predicted but not present in
ground truth), the per-class loop dispatched mean_absolute_error on an
empty ground-truth slice, which sklearn rejects with
"Found array with 0 sample(s)". The macro average is now computed only
over classes present in y_true, mirroring the docstring intent
("computes each MAE for each class") and aligning the behaviour with
sklearn's f1_score(average='macro') on labels missing from y_true.

Co-Authored-By: Claude Code <noreply@anthropic.com>
@glemaitre

Copy link
Copy Markdown
Member

Looking at the code, it is not the right fix. We should add labels parameters to actually define the set of labels to be used to compute the metrics. It will be consistent with the specificity and sensitivity that follow closely the scikit-learn API in this regards.

@immu4989

immu4989 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

I reviewed this alongside the alternative PR #1168 for the same issue — this is the cleaner of the two and I'd lean towards it.

The fix (unique_labels(y_true) instead of unique_labels(y_true, y_pred)) addresses the root cause directly: a class present only in y_pred has no ground-truth samples, so its per-class MAE is genuinely undefined, and mean_absolute_error rightly rejects the empty slice. Restricting the macro average to ground-truth classes matches the metric's documented intent and mirrors sklearn's f1_score(average='macro') handling of absent labels — exactly the comparison the reporter drew in #1094.

Why I prefer this over #1168: that PR keeps unique_labels(y_true, y_pred) and instead skips empty classes inside the loop with if len(indices) == 0: continue, then guards the final divide. That works, but it's treating the symptom — it still enumerates a class it will always skip, and it changes the return to np.mean(mae) if mae else 0.0, introducing a 0.0 fallback for the all-empty case that isn't really reachable once you filter to y_true. Computing the label set correctly up front (this PR) is simpler and leaves the averaging logic untouched.

Checks:

  • The symmetric test (label missing from y_pred vs. from y_true) is a nice touch and documents the asymmetry clearly.
  • Changelog entry included.
  • The sample_weight path is unaffected since indices are still computed per ground-truth class.

LGTM. One optional nit: a brief inline note that y_pred-only classes are intentionally excluded (you have it in the test, but a word at the unique_labels(y_true) line would help future readers) — though the comment you added already covers the reasoning well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] macro_averaged_mean_absolute_error() raises ValueError

3 participants