Skip to content

[Feature][history server] Add Google Cloud Storage (GCS) support to history server#4478

Open
chiayi wants to merge 4 commits intoray-project:masterfrom
chiayi:history-server-gcs
Open

[Feature][history server] Add Google Cloud Storage (GCS) support to history server#4478
chiayi wants to merge 4 commits intoray-project:masterfrom
chiayi:history-server-gcs

Conversation

@chiayi
Copy link
Contributor

@chiayi chiayi commented Feb 4, 2026

Why are these changes needed?

Provide Google Cloud Storage support for the history server

Related issue number

Part of #4453

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@chiayi chiayi force-pushed the history-server-gcs branch 3 times, most recently from 5c8c161 to 83a0115 Compare February 5, 2026 08:35
Copilot AI mentioned this pull request Feb 5, 2026
4 tasks
@chiayi
Copy link
Contributor Author

chiayi commented Feb 5, 2026

I was able to utilize the github.com/fsouza/fake-gcs-server/fakestorage library in the test suite but it also looks like there is also an emulator that we could use in our CI. I think that can be the next step.

@chiayi
Copy link
Contributor Author

chiayi commented Feb 5, 2026

To manually test, I upload collector and historyserver to Google Artifact Registry after building it using make -C historyserver localimage-build. Setup GKE cluster and created GCS Bucket. Both the AR and GCS should have correct permissions for access.

My GKE cluster was previously created so I updated it to use workload identity using

gcloud container clusters update <clustername> \
    --location=<region> \
    --workload-pool=<project_id>.svc.id.goog

Followed this guide to setup workload identity and connect Kubernetes SA with Google SA.

Deploy RayCluster with this collector:

- name: collector
          image: us-west1-docker.pkg.dev/aaronliang-dev/ray-ar/ray-collector:v0.1.0
          imagePullPolicy: Always
          env:
          - name: GCS_BUCKET
            value: "hs-ray-bucket"
          command:
          - collector
          - --role=Head
          - --runtime-class-name=gcs
          - --ray-cluster-name=raycluster-historyserver
          - --ray-root-dir=log
          - --events-port=8084
          volumeMounts:
          - name: historyserver
            mountPath: /tmp/ray
        tolerations:
        - key: ray
          operator: Equal
          value: cpu
        volumes:
        - name: historyserver
          emptyDir: {}

And historyserver:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: historyserver-demo
  labels:
    app: historyserver
spec:
  replicas: 1
  selector:
    matchLabels:
      app: historyserver
  template:
    metadata:
      labels:
        app: historyserver
    spec:
      serviceAccountName: historyserver
      containers:
      - name: historyserver
        env:
          - name: GCS_BUCKET
            value: "hs-ray-bucket"
        image: us-west1-docker.pkg.dev/aaronliang-dev/ray-ar/ray-historyserver:v0.1.0
        imagePullPolicy: Always
        command:
        - historyserver
        - --runtime-class-name=gcs
        - --ray-root-dir=log
        ports:
        - containerPort: 8080
        resources:
          limits:
            cpu: "500m"

Viewing GCS should then start populating and every current endpoint should work as expected.
image

@chiayi chiayi force-pushed the history-server-gcs branch from 83a0115 to 37ff80e Compare February 6, 2026 00:14
@chiayi chiayi marked this pull request as ready for review February 6, 2026 00:14
@chiayi chiayi force-pushed the history-server-gcs branch 2 times, most recently from 192ba5f to 06b8db3 Compare February 6, 2026 02:11
@chiayi chiayi force-pushed the history-server-gcs branch 2 times, most recently from 020734c to b50592e Compare February 6, 2026 19:25
@chiayi chiayi force-pushed the history-server-gcs branch from b50592e to e70ec5c Compare February 6, 2026 21:41
@chiayi chiayi force-pushed the history-server-gcs branch from e70ec5c to a268548 Compare February 6, 2026 23:21
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.


// Check if bucket exists
_, err = storageClient.Bucket(c.Bucket).Attrs(ctx)
if err == gstorage.ErrBucketNotExist {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equality check instead of errors.Is for sentinel error

Medium Severity

The check err == gstorage.ErrBucketNotExist uses direct equality instead of errors.Is(). If the error is wrapped, this comparison will fail, causing the code to fall through to the generic error branch instead of attempting bucket creation. The same file correctly uses errors.Is(err, gstorage.ErrObjectNotExist) on line 48, making this inconsistent.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant