Skip to content

[BUG] sensitivity_specificity_support returns wrong specificity with sample_weight (can exceed 1) #1180

Description

@immu4989

Describe the bug

imblearn.metrics.sensitivity_specificity_support (and therefore everything built on it — specificity_score, geometric_mean_score, classification_report_imbalanced) computes an incorrect specificity when sample_weight is provided. The returned specificity can even exceed 1, which is impossible for a rate.

The root cause is a units mismatch when forming the true-negative count (imblearn/metrics/_classification.py#L247):

# tp_sum, pred_sum, true_sum are *weighted* sums when sample_weight is given
tp_sum  = np.bincount(tp_bins,  weights=tp_bins_weights, minlength=len(labels))
pred_sum = np.bincount(y_pred,  weights=sample_weight,    minlength=len(labels))
true_sum = np.bincount(y_true,  weights=sample_weight,    minlength=len(labels))

# ...but the total here is the raw sample COUNT, not the total weight:
tn_sum = y_true.size - (pred_sum + true_sum - tp_sum)

When sample_weight is supplied, pred_sum/true_sum/tp_sum are sums of weights, while y_true.size is a plain sample count. Mixing the two makes tn_sum wrong (it can go negative), and the downstream specificity = tn_sum / (tn_sum + pred_sum - tp_sum) is then wrong as well.

Steps/Code to Reproduce

import numpy as np
from imblearn.metrics import sensitivity_specificity_support

y_true = [0, 0, 1, 1]
y_pred = [0, 1, 1, 1]
sample_weight = [1.0, 2.0, 3.0, 4.0]

_, spe, _ = sensitivity_specificity_support(
    y_true, y_pred, sample_weight=sample_weight, average=None
)
print(spe)

Expected Results

The per-class specificity is TN / (TN + FP) using weighted counts. For class 1 (treating 1 as the positive class): TN = 1 (sample 0, weight 1) and FP = 2 (sample 1, weight 2), so specificity = 1 / (1 + 2) = 0.3333.

[1.         0.33333333]

Actual Results

[1.         1.66666667]

A specificity of 1.667 is impossible. Internally tn_sum for class 1 is 4 - (9 + 7 - 7) = -5.

Impact

specificity_score, geometric_mean_score, and classification_report_imbalanced all delegate to sensitivity_specificity_support, so their sample_weight results are affected too — and class/sample weighting is a common need in imbalanced settings. Without weights the result is correct, because every term scales uniformly.

Suggested fix

Use the total of the weights as the population size when weights are provided:

total = np.sum(sample_weight) if sample_weight is not None else y_true.size
tn_sum = total - (pred_sum + true_sum - tp_sum)

This yields the correct [1.0, 0.3333] above and leaves the unweighted path unchanged. I have a fix with tests ready and am happy to open a PR. Note that the existing test_geometric_mean_sample_weight parametrization currently asserts a value computed under the buggy behaviour, so it would need updating to the corrected number as part of the fix.

Versions

imbalanced-learn: 0.15.dev0 (current master)
scikit-learn: 1.9.0
numpy: 2.x

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions