-
Notifications
You must be signed in to change notification settings - Fork 686
Fix: Move replica validation logic to right place. #4307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
d1d7b57 to
eb654d7
Compare
|
Thanks for your effort. cc @Future-Outlier |
|
I have gone through #4116 but it was inactive and no changes were being made after the merging of volcano pr. Many changes are required since volcano pr's merging, that's why I opened up this PR. |
|
Gotcha. Let's wait for maintainers' reply. I'll help review, thank you. |
|
@Future-Outlier Just wanted to follow up - do you prefer closing #4116 and continuing the work in this PR? |
Hi, @kash2104 I just left a comment to #4116 and ask her if she have time to finish the work. |
1. Remove validation logic from GetWorkerGroupDesiredReplicas (utils.go) and add this logic to ValidateRayClusterSpec (validation.go). 2. Remove unnecessary tests from TestGetWorkerGroupDesiredReplicas. 3. Remove the unused ctx.
0ad38fc to
529b021
Compare
This is added since we moved the validation logic.
Future-Outlier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @400Ping @justinyeh1995 @machichima to review
|
cursor review |
| // as users can manually update them and the autoscaler will handle the adjustment. | ||
| if !isAutoscalingEnabled && (workerGroup.MinReplicas == nil || workerGroup.MaxReplicas == nil) { | ||
| return fmt.Errorf("worker group %s must set both minReplicas and maxReplicas when autoscaling is disabled", workerGroup.GroupName) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nil pointer dereference when autoscaling enabled with unset replicas
High Severity
The new validation allows MinReplicas and MaxReplicas to be nil when autoscaling is enabled (line 124 skips the nil check when isAutoscalingEnabled is true). However, GetWorkerGroupDesiredReplicas, CalculateMinReplicas, and CalculateMaxReplicas all dereference these pointers without nil checks. When autoscaling is enabled with unset replica fields, the controller will panic with a nil pointer dereference. The test case at lines 2536-2549 in validation_test.go explicitly tests that nil values pass validation when autoscaling is enabled, confirming this mismatch.
Why are these changes needed?
These changes are done to check the validation logic before rayCluster pod creation. It moves the replica validation logic as well as removes the redundant tests. Along with this, unit test are added since we moved the logic from
utils.gotovalidation.go.Related issue number
Closes #4101
Checks