Skip to content

Conversation

@kash2104
Copy link
Contributor

@kash2104 kash2104 commented Dec 23, 2025

Why are these changes needed?

These changes are done to check the validation logic before rayCluster pod creation. It moves the replica validation logic as well as removes the redundant tests. Along with this, unit test are added since we moved the logic from utils.go to validation.go.

Related issue number

Closes #4101

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@JiangJiaWei1103
Copy link
Contributor

Thanks for your effort.

cc @Future-Outlier
This PR duplicates with #4116, which seemed inactive now. Should we close #4116 and work on this one?

@kash2104
Copy link
Contributor Author

kash2104 commented Dec 24, 2025

I have gone through #4116 but it was inactive and no changes were being made after the merging of volcano pr. Many changes are required since volcano pr's merging, that's why I opened up this PR.

@JiangJiaWei1103
Copy link
Contributor

Gotcha. Let's wait for maintainers' reply. I'll help review, thank you.

@kash2104
Copy link
Contributor Author

kash2104 commented Jan 1, 2026

@Future-Outlier Just wanted to follow up - do you prefer closing #4116 and continuing the work in this PR?

@Future-Outlier
Copy link
Member

@Future-Outlier Just wanted to follow up - do you prefer closing #4116 and continuing the work in this PR?

Hi, @kash2104 I just left a comment to #4116 and ask her if she have time to finish the work.
if after 2 weeks she haven't replied, you can take over it

1. Remove validation logic from GetWorkerGroupDesiredReplicas (utils.go)
and add this logic to ValidateRayClusterSpec (validation.go).
2. Remove unnecessary tests from TestGetWorkerGroupDesiredReplicas.
3. Remove the unused ctx.
@kash2104 kash2104 force-pushed the issue-4101 branch 2 times, most recently from 0ad38fc to 529b021 Compare January 1, 2026 16:54
This is added since we moved the validation logic.
Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Future-Outlier
Copy link
Member

cursor review

// as users can manually update them and the autoscaler will handle the adjustment.
if !isAutoscalingEnabled && (workerGroup.MinReplicas == nil || workerGroup.MaxReplicas == nil) {
return fmt.Errorf("worker group %s must set both minReplicas and maxReplicas when autoscaling is disabled", workerGroup.GroupName)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nil pointer dereference when autoscaling enabled with unset replicas

High Severity

The new validation allows MinReplicas and MaxReplicas to be nil when autoscaling is enabled (line 124 skips the nil check when isAutoscalingEnabled is true). However, GetWorkerGroupDesiredReplicas, CalculateMinReplicas, and CalculateMaxReplicas all dereference these pointers without nil checks. When autoscaling is enabled with unset replica fields, the controller will panic with a nil pointer dereference. The test case at lines 2536-2549 in validation_test.go explicitly tests that nil values pass validation when autoscaling is enabled, confirming this mismatch.

Additional Locations (2)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move Replica validation to validation.go

3 participants