Skip to content

FIX sensitivity_specificity_support: correct specificity with sample_weight (#1180)#1181

Open
immu4989 wants to merge 2 commits into
scikit-learn-contrib:masterfrom
immu4989:fix/specificity-sample-weight-1180
Open

FIX sensitivity_specificity_support: correct specificity with sample_weight (#1180)#1181
immu4989 wants to merge 2 commits into
scikit-learn-contrib:masterfrom
immu4989:fix/specificity-sample-weight-1180

Conversation

@immu4989

@immu4989 immu4989 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Reference Issues/PRs

Fixes #1180.

What does this implement/fix? Explain your changes.

sensitivity_specificity_support computes an incorrect specificity when sample_weight is provided — the value can even exceed 1, which is impossible for a rate. Because specificity_score, geometric_mean_score, classification_report_imbalanced and the make_index_balanced_accuracy wrappers all delegate to it, the bug propagates through the whole metrics module.

Root cause. When sample_weight is given, tp_sum, pred_sum and true_sum are weighted sums, but the true-negative count was formed using the raw sample count:

tn_sum = y_true.size - (pred_sum + true_sum - tp_sum)

Mixing a count (y_true.size) with weighted sums makes tn_sum wrong — it can go negative — and the downstream specificity = tn_sum / (tn_sum + pred_sum - tp_sum) is then wrong as well.

import numpy as np
from imblearn.metrics import sensitivity_specificity_support

_, spe, _ = sensitivity_specificity_support(
    [0, 0, 1, 1], [0, 1, 1, 1], sample_weight=[1.0, 2.0, 3.0, 4.0], average=None
)
print(spe)
# master: [1.         1.66666667]   <- 1.667 is impossible
# fixed:  [1.         0.33333333]   <- TN=1, FP=2 -> 1/(1+2)

Fix. Use the total sample weight as the population size when sample_weight is provided, falling back to y_true.size otherwise. The unweighted path is unchanged (every term scales uniformly there, so it was already correct).

Scope: a module-wide audit, single root cause

I audited every sample_weight-aware metric in imblearn.metrics using the invariant that integer sample_weight must equal physically repeating each sample. The results:

Metric Before fix After fix
sensitivity_specificity_support (sensitivity) ✅ correct ✅ correct
sensitivity_specificity_support (specificity) ❌ wrong (negative TN, values > 1) ✅ correct
specificity_score ❌ wrong (delegates) ✅ correct
geometric_mean_score ❌ wrong, incl. nan (delegates) ✅ correct
make_index_balanced_accuracy(...) wrappers ❌ wrong (delegates) ✅ correct
classification_report_imbalanced ❌ wrong specificity column (delegates) ✅ correct
macro_averaged_mean_absolute_error ✅ correct (independent path) ✅ correct

So the entire weighted-metric bug class traces to this one line; there are no other independent weighting bugs in the module.

Tests

  • The existing test_geometric_mean_sample_weight parametrization asserted the weighted value produced under the buggy behaviour (0.333). I corrected it to the right value (0.609), verified by an independent hand-computation of the weighted confusion matrix (per-class specificity [0.667, 0.5], sensitivity [1.0, 0.5]).
  • Added test_sensitivity_specificity_support_sample_weight (the [BUG] sensitivity_specificity_support returns wrong specificity with sample_weight (can exceed 1) #1180 reproducer; asserts specificity ≤ 1; integer weights == repeated samples).
  • Added a parametrized test_metrics_sample_weight_repeat_equivalence that enforces the repeat-equivalence invariant across the whole delegation chain (sensitivity, specificity, specificity_score, geometric_mean_score, the IBA wrappers) and every average mode (None/macro/weighted/micro), plus a [0, 1] rate-bound check — 28 cases. This locks the fix in across the module, not just where the bug originated.
  • Full metrics suite passes locally: pytest imblearn/metrics/ → 236 passed (scikit-learn 1.9.0).

immu4989 added 2 commits June 14, 2026 19:09
…weight (scikit-learn-contrib#1180)

When sample_weight is provided, tp_sum/pred_sum/true_sum are weighted sums but
the true-negative count was formed as y_true.size - (pred_sum + true_sum -
tp_sum), mixing a raw sample count with weighted sums. This makes tn_sum wrong
(it can go negative), so the resulting specificity is incorrect and can exceed
1. specificity_score, geometric_mean_score and classification_report_imbalanced
all delegate here, so their weighted results were affected too.

Use the total sample weight as the population size when sample_weight is given,
falling back to y_true.size otherwise (unweighted path unchanged).

The existing test_geometric_mean_sample_weight parametrization asserted the
weighted value produced under the buggy behaviour (0.333); it is corrected to
the right value (0.609). A dedicated non-regression test is added that checks
specificity never exceeds 1 and that integer weights match repeated samples.
…t-learn-contrib#1180)

Audit of the metrics module showed the specificity weighting bug propagated
through every metric that delegates to sensitivity_specificity_support:
specificity_score, geometric_mean_score, and the make_index_balanced_accuracy
wrappers were all affected, while sensitivity and
macro_averaged_mean_absolute_error were already correct.

Add a parametrized test asserting that integer sample_weight equals physically
repeating each sample, across all of these metrics and every average mode, plus
the rate-bound [0, 1] check. This locks in the fix across the whole delegation
chain rather than only the function where the bug originated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] sensitivity_specificity_support returns wrong specificity with sample_weight (can exceed 1)

1 participant