Describe the bug
imblearn.metrics.sensitivity_specificity_support (and therefore everything built on it — specificity_score, geometric_mean_score, classification_report_imbalanced) computes an incorrect specificity when sample_weight is provided. The returned specificity can even exceed 1, which is impossible for a rate.
The root cause is a units mismatch when forming the true-negative count (imblearn/metrics/_classification.py#L247):
# tp_sum, pred_sum, true_sum are *weighted* sums when sample_weight is given
tp_sum = np.bincount(tp_bins, weights=tp_bins_weights, minlength=len(labels))
pred_sum = np.bincount(y_pred, weights=sample_weight, minlength=len(labels))
true_sum = np.bincount(y_true, weights=sample_weight, minlength=len(labels))
# ...but the total here is the raw sample COUNT, not the total weight:
tn_sum = y_true.size - (pred_sum + true_sum - tp_sum)
When sample_weight is supplied, pred_sum/true_sum/tp_sum are sums of weights, while y_true.size is a plain sample count. Mixing the two makes tn_sum wrong (it can go negative), and the downstream specificity = tn_sum / (tn_sum + pred_sum - tp_sum) is then wrong as well.
Steps/Code to Reproduce
import numpy as np
from imblearn.metrics import sensitivity_specificity_support
y_true = [0, 0, 1, 1]
y_pred = [0, 1, 1, 1]
sample_weight = [1.0, 2.0, 3.0, 4.0]
_, spe, _ = sensitivity_specificity_support(
y_true, y_pred, sample_weight=sample_weight, average=None
)
print(spe)
Expected Results
The per-class specificity is TN / (TN + FP) using weighted counts. For class 1 (treating 1 as the positive class): TN = 1 (sample 0, weight 1) and FP = 2 (sample 1, weight 2), so specificity = 1 / (1 + 2) = 0.3333.
Actual Results
A specificity of 1.667 is impossible. Internally tn_sum for class 1 is 4 - (9 + 7 - 7) = -5.
Impact
specificity_score, geometric_mean_score, and classification_report_imbalanced all delegate to sensitivity_specificity_support, so their sample_weight results are affected too — and class/sample weighting is a common need in imbalanced settings. Without weights the result is correct, because every term scales uniformly.
Suggested fix
Use the total of the weights as the population size when weights are provided:
total = np.sum(sample_weight) if sample_weight is not None else y_true.size
tn_sum = total - (pred_sum + true_sum - tp_sum)
This yields the correct [1.0, 0.3333] above and leaves the unweighted path unchanged. I have a fix with tests ready and am happy to open a PR. Note that the existing test_geometric_mean_sample_weight parametrization currently asserts a value computed under the buggy behaviour, so it would need updating to the corrected number as part of the fix.
Versions
imbalanced-learn: 0.15.dev0 (current master)
scikit-learn: 1.9.0
numpy: 2.x
Describe the bug
imblearn.metrics.sensitivity_specificity_support(and therefore everything built on it —specificity_score,geometric_mean_score,classification_report_imbalanced) computes an incorrect specificity whensample_weightis provided. The returned specificity can even exceed1, which is impossible for a rate.The root cause is a units mismatch when forming the true-negative count (
imblearn/metrics/_classification.py#L247):When
sample_weightis supplied,pred_sum/true_sum/tp_sumare sums of weights, whiley_true.sizeis a plain sample count. Mixing the two makestn_sumwrong (it can go negative), and the downstreamspecificity = tn_sum / (tn_sum + pred_sum - tp_sum)is then wrong as well.Steps/Code to Reproduce
Expected Results
The per-class specificity is
TN / (TN + FP)using weighted counts. For class1(treating1as the positive class):TN = 1(sample 0, weight 1) andFP = 2(sample 1, weight 2), so specificity= 1 / (1 + 2) = 0.3333.Actual Results
A specificity of
1.667is impossible. Internallytn_sumfor class 1 is4 - (9 + 7 - 7) = -5.Impact
specificity_score,geometric_mean_score, andclassification_report_imbalancedall delegate tosensitivity_specificity_support, so theirsample_weightresults are affected too — and class/sample weighting is a common need in imbalanced settings. Without weights the result is correct, because every term scales uniformly.Suggested fix
Use the total of the weights as the population size when weights are provided:
This yields the correct
[1.0, 0.3333]above and leaves the unweighted path unchanged. I have a fix with tests ready and am happy to open a PR. Note that the existingtest_geometric_mean_sample_weightparametrization currently asserts a value computed under the buggy behaviour, so it would need updating to the corrected number as part of the fix.Versions