-
Notifications
You must be signed in to change notification settings - Fork 687
Feat/background goroutine get job info test #4368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
fscnick
wants to merge
23
commits into
ray-project:master
Choose a base branch
from
fscnick:feat/background-goroutine-get-job-info-test
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
1bae584
[RayJob] background job info poc
fscnick 293937b
[RayJob] encapsulate the worker pool
fscnick b4326bc
[RayJob] replace concurrency map with lru cache
fscnick fef0c77
[RayJob] remove cache on stop and config flag
fscnick 13070f0
[RayJob] remove delete cache from deleteClusterResources and add lock…
fscnick 3d07403
[Helm] add argument for useBackgroundGoroutine
fscnick 5db324b
[RayJob] remove unused function and background goroutine observability
fscnick 026e9f0
[RayJob] rename useBackgroundGoroutine to asyncJobInfoQuery
fscnick 645aaed
[RayJob] make cache immutable to avoid data race
fscnick bcb2a38
[RayJob] remove unused function
fscnick 6dc8cf6
[RayJob] If error on fetching job info, it removes from loop
fscnick b0b2753
[RayJob] task queue is extendable
fscnick db5aa09
[RayJob] change slice to ring buffer
fscnick 98a17d1
[RayJob] async job info query use feature gate instead
fscnick 38a8602
[RayJob] add test for async job info query
fscnick adc8003
[Test][RayJob] fix e2e test and add unit test
fscnick 31eea56
[RayJob] remove redundent code in async job query
fscnick 8231132
Merge remote-tracking branch 'upstream/master' into feat/background-g…
fscnick 82de99a
[Test][RayJob] fix cache key name in test
fscnick 18d6c84
Merge remote-tracking branch 'upstream/master' into feat/background-g…
fscnick b64d60c
[Test] e2e RayJob test with AsyncJobInfoQuery and without AsyncJobInf…
fscnick f2f717f
[Test] rename to with AsyncJobInfoQuery for testing
fscnick 720dee5
Merge remote-tracking branch 'upstream/master' into feat/background-g…
fscnick File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,3 +18,5 @@ featureGates: | |
| enabled: true | ||
| - name: RayMultiHostIndexing | ||
| enabled: true | ||
| - name: AsyncJobInfoQuery | ||
| enabled: true | ||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
12 changes: 12 additions & 0 deletions
12
...perator/config/overlays/test-overrides-with-async-job-info-query/deployment-override.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| # Strategic merge patch for kuberay-operator Deployment (test / CI only). | ||
| apiVersion: apps/v1 | ||
| kind: Deployment | ||
| metadata: | ||
| name: kuberay-operator | ||
| spec: | ||
| template: | ||
| spec: | ||
| containers: | ||
| - name: kuberay-operator | ||
| args: | ||
| - --feature-gates=RayClusterStatusConditions=true,RayJobDeletionPolicy=true,RayMultiHostIndexing=true,AsyncJobInfoQuery=true | ||
17 changes: 17 additions & 0 deletions
17
ray-operator/config/overlays/test-overrides-with-async-job-info-query/kustomization.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| ## ============================================================================ | ||
| ## Kustomize overlay: test-overrides (CI / e2e only) | ||
| ## ---------------------------------------------------------------------------- | ||
| ## Purpose: Enable alpha / experimental feature gates (currently RayJobDeletionPolicy) | ||
| ## for end-to-end testing without modifying base manifests or Helm defaults. | ||
| ## ============================================================================ | ||
| apiVersion: kustomize.config.k8s.io/v1beta1 | ||
| kind: Kustomization | ||
|
|
||
| resources: | ||
| - ../../default | ||
|
|
||
| patches: | ||
| - path: deployment-override.yaml | ||
| target: | ||
| kind: Deployment | ||
| name: kuberay-operator |
143 changes: 143 additions & 0 deletions
143
ray-operator/controllers/ray/utils/dashboardclient/dashboard_cache_client_test.go
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,143 @@ | ||
| package dashboardclient | ||
|
|
||
| import ( | ||
| "errors" | ||
| "testing" | ||
| "testing/synctest" | ||
| "time" | ||
|
|
||
| "github.com/go-logr/logr" | ||
| "github.com/stretchr/testify/assert" | ||
| "github.com/stretchr/testify/require" | ||
| "go.uber.org/mock/gomock" | ||
| "k8s.io/apimachinery/pkg/types" | ||
|
|
||
| rayv1 "github.com/ray-project/kuberay/ray-operator/apis/ray/v1" | ||
| "github.com/ray-project/kuberay/ray-operator/controllers/ray/utils/dashboardclient/mocks" | ||
| utiltypes "github.com/ray-project/kuberay/ray-operator/controllers/ray/utils/types" | ||
| ) | ||
|
|
||
| func TestAsyncJobInfoQuery(t *testing.T) { | ||
| var mockClient *mocks.MockRayDashboardClientInterface | ||
|
|
||
| ctrl := gomock.NewController(t) | ||
| mockClient = mocks.NewMockRayDashboardClientInterface(ctrl) | ||
|
|
||
| synctest.Test(t, func(t *testing.T) { | ||
| ctx := logr.NewContext(t.Context(), logr.Discard()) | ||
|
|
||
| clusterName := types.NamespacedName{ | ||
| Namespace: "test-namespace", | ||
| Name: "raycluster-async-job-info-query", | ||
| } | ||
| asyncJobInfoQueryClient := RayDashboardCacheClient{} | ||
| asyncJobInfoQueryClient.InitClient(ctx, clusterName, mockClient) | ||
| synctest.Wait() | ||
|
|
||
| jobId := "test-job-id" | ||
|
|
||
| // earlier set up the mock expectation for the second call to avoid flaky test. | ||
| mockJobInfo := &utiltypes.RayJobInfo{ | ||
| JobId: jobId, | ||
| } | ||
| mockClient.EXPECT().GetJobInfo(ctx, jobId).Return(mockJobInfo, nil) | ||
|
|
||
| // First call, the job info is not in cache, so it should return ErrAgain | ||
| jobInfo, err := asyncJobInfoQueryClient.GetJobInfo(ctx, jobId) | ||
| assert.Nil(t, jobInfo) | ||
| assert.Equal(t, ErrAgain, err) | ||
|
|
||
| synctest.Wait() | ||
|
|
||
| // Second call, after GetJobInfo has called in background , the job info should be in cache now. | ||
| jobInfo, err = asyncJobInfoQueryClient.GetJobInfo(ctx, jobId) | ||
| require.NoError(t, err) | ||
| assert.Equal(t, mockJobInfo, jobInfo) | ||
|
|
||
| expectedError := errors.New("test error") | ||
| mockClient.EXPECT().GetJobInfo(ctx, jobId).Return(nil, expectedError) | ||
|
|
||
| // Wait for longer than queryInterval to ensure the task has been re-queued and executed. | ||
| time.Sleep(queryInterval + time.Millisecond) | ||
| synctest.Wait() | ||
|
|
||
| // Third call, after GetJobInfo has called in background and returned error, the error should be in cache now. | ||
| jobInfo, err = asyncJobInfoQueryClient.GetJobInfo(ctx, jobId) | ||
| assert.Nil(t, jobInfo) | ||
| assert.Equal(t, expectedError, err) | ||
|
|
||
| // After error has been consumed previously, the job info is not in cache so it should return ErrAgain. | ||
| jobInfo, err = asyncJobInfoQueryClient.GetJobInfo(ctx, jobId) | ||
| assert.Nil(t, jobInfo) | ||
| assert.Equal(t, ErrAgain, err) | ||
|
|
||
| mockJobInfo = &utiltypes.RayJobInfo{ | ||
| JobId: jobId, | ||
| JobStatus: rayv1.JobStatusSucceeded, | ||
| } | ||
| mockClient.EXPECT().GetJobInfo(ctx, jobId).Return(mockJobInfo, nil) | ||
|
|
||
| // Wait for longer than queryInterval to ensure the task has been re-queued and executed. | ||
| time.Sleep(queryInterval + time.Millisecond) | ||
| synctest.Wait() | ||
|
|
||
| // Fourth call, the job has reached the terminal status. | ||
| jobInfo, err = asyncJobInfoQueryClient.GetJobInfo(ctx, jobId) | ||
| require.NoError(t, err) | ||
| assert.Equal(t, mockJobInfo, jobInfo) | ||
|
|
||
| // Wait for longer than queryInterval to ensure the task has been re-queued. | ||
| time.Sleep(queryInterval + time.Millisecond) | ||
| synctest.Wait() | ||
|
|
||
| // Fifth call, since the job has reached the terminal status, the task has removed from the worker. | ||
| // The GetJobInfo underneath should not be called again, and the cached job info should be returned. | ||
| jobInfo, err = asyncJobInfoQueryClient.GetJobInfo(ctx, jobId) | ||
| require.NoError(t, err) | ||
| assert.Equal(t, mockJobInfo, jobInfo) | ||
|
|
||
| // Wait for longer than cacheExpiry to ensure the cache has been expired and removed. | ||
| time.Sleep(cacheExpiry + 10*queryInterval) | ||
| synctest.Wait() | ||
|
|
||
| cached, ok := cacheStorage.Get(cacheKey(clusterName, jobId)) | ||
| assert.Nil(t, cached) | ||
| assert.False(t, ok) | ||
cursor[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| // Test with getting a persistent error, the cache should be removed eventually. | ||
| nonExistedJobId := "not-existed-job-id" | ||
|
Comment on lines
+107
to
+108
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another test in the same bubble. The goroutine is created by singleton and synctest is not allowed to access goroutine outside of bubble. |
||
|
|
||
| // earlier set up the mock expectation for the second call to avoid flaky test. | ||
| expectedError = errors.New("no such host") | ||
| mockClient.EXPECT().GetJobInfo(ctx, nonExistedJobId).Return(nil, expectedError).AnyTimes() | ||
|
|
||
| jobInfo, err = asyncJobInfoQueryClient.GetJobInfo(ctx, nonExistedJobId) | ||
| assert.Nil(t, jobInfo) | ||
| assert.Equal(t, ErrAgain, err) | ||
|
|
||
| time.Sleep(queryInterval + time.Millisecond) | ||
| synctest.Wait() | ||
|
|
||
| jobInfo, err = asyncJobInfoQueryClient.GetJobInfo(ctx, nonExistedJobId) | ||
| assert.Nil(t, jobInfo) | ||
| assert.Equal(t, expectedError, err) | ||
|
|
||
| // After error has been consumed previously, the job info is not in cache so it should return ErrAgain. | ||
| jobInfo, err = asyncJobInfoQueryClient.GetJobInfo(ctx, nonExistedJobId) | ||
| assert.Nil(t, jobInfo) | ||
| assert.Equal(t, ErrAgain, err) | ||
|
|
||
| time.Sleep(queryInterval + time.Millisecond) | ||
| synctest.Wait() | ||
|
|
||
| // Get the same error again without continuing to requeue the task. | ||
| jobInfo, err = asyncJobInfoQueryClient.GetJobInfo(ctx, nonExistedJobId) | ||
| assert.Nil(t, jobInfo) | ||
| assert.Equal(t, expectedError, err) | ||
|
|
||
| // The cache should be removed after previous GetJobInfo. | ||
| cached, ok = cacheStorage.Get(cacheKey(clusterName, nonExistedJobId)) | ||
| assert.Nil(t, cached) | ||
| assert.False(t, ok) | ||
| }) | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.