feat: support for censored likelihoods #91

John-Curcio · 2025-07-02T03:22:15Z

add CensoredMixin class to extend fitting univariate distributions to censored data
add CensoredLogNormal and CensoredWeibull
test cases for censored data

xgb.DMatrix already has label_lower_bound, label_upper_bound for the user to specify right-, left-, or interval-censored data. This PR adds a CensoredMixin class which simply overrides objective_fn, metric_fn to accommodate censored data. So to fit a LogNormal distribution to such a dataset, just use CensoredLogNormal instead of LogNormal.

I've added CensoredLogNormal and CensoredWeibull.

I'm happy to further update docs/add examples

StatMixedML · 2025-07-29T06:52:15Z

@John-Curcio Thanks for opening the PR and your effort!

Would need some time to review it.

Copilot

Pull Request Overview

This PR adds support for fitting censored data by introducing a CensoredMixin class that extends univariate distributions to handle interval-censored observations. The mixin overrides objective_fn and metric_fn to compute likelihood functions using cumulative distribution functions (CDFs) for censored intervals.

Adds CensoredMixin class with censored likelihood computation
Implements CensoredLogNormal and CensoredWeibull distribution classes
Adds comprehensive test coverage for censored data functionality

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
xgboostlss/distributions/censored_utils.py	Core `CensoredMixin` class implementing censored likelihood functions
xgboostlss/distributions/Weibull.py	Adds `CensoredWeibull` class inheriting from `CensoredMixin` and `Weibull`
xgboostlss/distributions/LogNormal.py	Adds `CensoredLogNormal` class inheriting from `CensoredMixin` and `LogNormal`
tests/utils.py	Extends test data generation to support censored data scenarios
tests/test_distribution_utils/test_censored_utils.py	Test suite validating censored distribution functionality

Copilot · 2025-08-08T08:16:30Z

xgboostlss/distributions/censored_utils.py

+        mass = cdf_hi - cdf_low
+        log_density = dist.log_prob(low)
+        censored_inds = low != hi
+        loss = -torch.sum(torch.log(mass[censored_inds])) - torch.sum(log_density[~censored_inds])


The log density is computed using only the lower bound, but this should only be used for exact observations (non-censored data). For censored intervals where low != hi, this log_density value is incorrect and shouldn't contribute to the loss.

Suggested change

loss = -torch.sum(torch.log(mass[censored_inds])) - torch.sum(log_density[~censored_inds])

exact_inds = (low == hi)

log_density = dist.log_prob(low[exact_inds])

loss = -torch.sum(torch.log(mass[~exact_inds])) - torch.sum(log_density)

Copilot · 2025-08-08T08:16:31Z

xgboostlss/distributions/censored_utils.py

+            return super().objective_fn(predt, data)
+        if data.get_weight().size == 0:
+            # initialize weights as ones with correct shape
+            weights = torch.ones((lower.shape[0], 1), dtype=torch.as_tensor(lower).dtype).numpy()


Creating a tensor just to get its dtype and then converting back to numpy is inefficient. Consider using weights = np.ones((lower.shape[0], 1), dtype=lower.dtype) directly.

Suggested change

weights = torch.ones((lower.shape[0], 1), dtype=torch.as_tensor(lower).dtype).numpy()

weights = np.ones((lower.shape[0], 1), dtype=lower.dtype)

Copilot · 2025-08-08T08:16:31Z

tests/test_distribution_utils/test_censored_utils.py

+        predt, labels, *rest = gen_test_data(model, weights=False, censored=False)
+        dmat = rest[-1]
+        name_c, loss_c = model.dist.metric_fn(predt, dmat)
+        underlying_cls = model.dist.__class__.__mro__[2]


Using hardcoded index [2] in the MRO (Method Resolution Order) is fragile and could break if the inheritance hierarchy changes. Consider using a more explicit approach like checking class names or using hasattr to find the base distribution class.

Suggested change

underlying_cls = model.dist.__class__.__mro__[2]

# Find the first base class in the MRO that is not a censored distribution and not 'object'

underlying_cls = next(

cls for cls in model.dist.__class__.__mro__

if cls is not model.dist.__class__ and not cls.__name__.startswith("Censored") and cls is not object

)

Copilot · 2025-08-08T08:16:31Z

xgboostlss/distributions/censored_utils.py

+        dist = self.distribution(**dict(zip(self.distribution_arg_names, params_transformed)))
+        # compute cdf bounds: convert lower & upper once to tensor with correct dtype
+        low = torch.as_tensor(lower, dtype=params_transformed[0].dtype).reshape(-1, 1)
+        hi  = torch.as_tensor(upper, dtype=params_transformed[0].dtype).reshape(-1, 1)


Inconsistent spacing: 'hi =' has two spaces before the equals sign while 'low =' on the previous line has one space. This should be consistent.

Suggested change

hi = torch.as_tensor(upper, dtype=params_transformed[0].dtype).reshape(-1, 1)

hi = torch.as_tensor(upper, dtype=params_transformed[0].dtype).reshape(-1, 1)

StatMixedML · 2025-08-08T10:08:04Z

@claude Review this PR

claude · 2025-08-08T10:08:21Z

Claude encountered an error —— View job

Failed with exit code 128

I'll analyze this and get back to you.

StatMixedML · 2025-08-08T10:12:39Z

@John-Curcio CAn you please look into these

John-Curcio added 4 commits May 9, 2025 18:57

add get_params_loss_censored

56caa99

censored tests; leave distribution_utils alone

3cad998

censored weibull as well

967cfa3

add test_censored_utils

13e4202

John-Curcio mentioned this pull request Jul 25, 2025

Maintainer approval needed to run CI for PR #91” #92

Open

StatMixedML requested a review from Copilot August 8, 2025 08:15

Copilot AI reviewed Aug 8, 2025

View reviewed changes

StatMixedML mentioned this pull request Aug 8, 2025

Review PR 91 #95

Closed

Merge branch 'master' into pr/91

c279bb5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support for censored likelihoods #91

feat: support for censored likelihoods #91

Uh oh!

John-Curcio commented Jul 2, 2025 •

edited

Loading

Uh oh!

StatMixedML commented Jul 29, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 8, 2025

Uh oh!

Copilot AI Aug 8, 2025

Uh oh!

Copilot AI Aug 8, 2025

Uh oh!

Copilot AI Aug 8, 2025

Uh oh!

StatMixedML commented Aug 8, 2025

Uh oh!

claude bot commented Aug 8, 2025 •

edited

Loading

Uh oh!

StatMixedML commented Aug 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-        loss = -torch.sum(torch.log(mass[censored_inds])) - torch.sum(log_density[~censored_inds])
+        exact_inds = (low == hi)
+        log_density = dist.log_prob(low[exact_inds])
+        loss = -torch.sum(torch.log(mass[~exact_inds])) - torch.sum(log_density)

	weights = torch.ones((lower.shape[0], 1), dtype=torch.as_tensor(lower).dtype).numpy()
	weights = np.ones((lower.shape[0], 1), dtype=lower.dtype)

-        underlying_cls = model.dist.__class__.__mro__[2]
+        # Find the first base class in the MRO that is not a censored distribution and not 'object'
+        underlying_cls = next(
+            cls for cls in model.dist.__class__.__mro__
+            if cls is not model.dist.__class__ and not cls.__name__.startswith("Censored") and cls is not object
+        )

	hi = torch.as_tensor(upper, dtype=params_transformed[0].dtype).reshape(-1, 1)
	hi = torch.as_tensor(upper, dtype=params_transformed[0].dtype).reshape(-1, 1)

feat: support for censored likelihoods #91

Are you sure you want to change the base?

feat: support for censored likelihoods #91

Uh oh!

Conversation

John-Curcio commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StatMixedML commented Jul 29, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

StatMixedML commented Aug 8, 2025

Uh oh!

claude bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StatMixedML commented Aug 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

John-Curcio commented Jul 2, 2025 •

edited

Loading

claude bot commented Aug 8, 2025 •

edited

Loading