Skip to content

Conversation

@aslonnie
Copy link
Collaborator

@aslonnie aslonnie commented Oct 8, 2025

No description provided.

Copilot AI review requested due to automatic review settings October 8, 2025 07:23
@gitar-bot
Copy link

gitar-bot bot commented Oct 8, 2025

Gitar automatically fixes CI failures and addresses comments starting with Gitar.

⚙️ Options:

  • Fix CI failures*

🔄 To revert changes, post a comment:

Gitar revert to commit <desired commit SHA>

📚 Docs

* Gitar never force pushes to your branch

@gemini-code-assist
Copy link

Summary of Changes

Hello @aslonnie, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This PR enhances wanda steps in the CI pipeline by adding cache hit prediction and step-level priority configuration. It introduces a new predict_cache.go file with logic to determine if a wanda build will hit the cache by checking registry tags. The raycicmd package is updated to integrate this prediction, marking cached steps in Buildkite and allowing priority to be set per wanda step, overriding global settings. Relevant tests have been added for both new features.

Highlights

  • Intent: This pull request introduces a 'wanda shortcut' by implementing cache hit prediction for wanda steps in the CI pipeline. This allows the system to determine if a wanda build would result in a cache hit without actually executing the build, potentially speeding up CI runs by marking steps as cached. Additionally, it adds support for specifying a step-level priority for wanda steps, which can override the global builder priority.
  • Key Changes: 1. Cache Hit Prediction Logic: A new file wanda/predict_cache.go is added, containing PredictCacheHit and resolveBases functions. PredictCacheHit calculates the build input digest for a given wanda spec file and checks for the existence of the corresponding cache tag in the container registry using go-containerregistry. It handles scenarios like disabled caching, forced rebuilds, or local-only builds.
  1. Integration into raycicmd: The wandaStep struct in raycicmd/wanda.go now includes priority (as an optional integer) and cacheHit (boolean) fields. The convert function is updated to parse the priority from the step configuration and call the new predictCacheHit function. The buildkiteStep function is modified to apply the step-level priority if provided, and to append [cache hit] to the Buildkite step label if cacheHit is true.
  2. Configuration Update: The wandaStepAllowedKeys list in raycicmd/rayci_pipeline.go is updated to include "priority", enabling its use in wanda step definitions.
  3. Testing: New unit tests are added in raycicmd/wanda_test.go to verify the priority handling and in wanda/predict_cache_test.go to cover various PredictCacheHit scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds predictive cache hit functionality for Wanda builds and introduces step-level priority overrides with corresponding Buildkite label annotation for cache hits. Key changes include implementing PredictCacheHit with base image resolution and integrating cache prediction into pipeline conversion, plus allowing per-step priority to override global builder priority.

  • Add PredictCacheHit logic to pre-check remote cache availability.
  • Add step-level priority and cache hit label decoration to Buildkite steps.
  • Extend allowed wanda step keys and tests for priority and negative cache prediction scenarios.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
wanda/predict_cache.go Adds cache prediction logic and base image resolution utilities.
wanda/predict_cache_test.go Adds tests for disabled caching, local-only builds, and forced rebuild behavior.
raycicmd/wanda.go Integrates cache prediction and adds step-level priority & cache-hit labeling.
raycicmd/wanda_test.go Tests step-level priority override behavior.
raycicmd/rayci_pipeline.go Allows priority key in wanda step configuration.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

for _, from := range froms {
// Local images (prefixed with @) - skip for prediction
// We can't reliably predict local image digests without docker
if from[0] == '@' {
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexing from[0] without checking length will panic if an empty string is present in froms. Add a length guard (e.g., if len(from) > 0 && from[0] == '@').

Suggested change
if from[0] == '@' {
if len(from) > 0 && from[0] == '@' {

Copilot uses AI. Check for mistakes.
// We can't reliably predict local image digests without docker
if from[0] == '@' {
log.Printf("skipping local image %s for cache prediction", from)
return nil, fmt.Errorf("cannot predict cache for local images: %s", from)
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment says 'skip for prediction' but the code returns an error, aborting prediction. Either update the comment to reflect the behavior or change the code to actually skip (e.g., continue) without returning an error.

Suggested change
return nil, fmt.Errorf("cannot predict cache for local images: %s", from)
continue

Copilot uses AI. Check for mistakes.
}

// resolveBases resolves base images for cache prediction
func resolveBases(froms []string, config *ForgeConfig, workDir string) (map[string]*imageSource, error) {
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workDir parameter is never used in this function. Remove it if unnecessary, or incorporate it if it was intended for resolving relative references.

Suggested change
func resolveBases(froms []string, config *ForgeConfig, workDir string) (map[string]*imageSource, error) {
func resolveBases(froms []string, config *ForgeConfig) (map[string]*imageSource, error) {

Copilot uses AI. Check for mistakes.
// Set environment variables for the prediction
// This allows the wanda package to expand variables in the spec file
for k, v := range envs {
_ = setEnvIfNotSet(k, v)
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring the error from setEnvIfNotSet discards potential OS-level failures. Capture and log (or propagate) the error instead of using the blank identifier.

Suggested change
_ = setEnvIfNotSet(k, v)
if err := setEnvIfNotSet(k, v); err != nil {
log.Printf("failed to set environment variable %s=%s: %v", k, v, err)
}

Copilot uses AI. Check for mistakes.
Comment on lines +155 to +160
func setEnvIfNotSet(key, value string) error {
if os.Getenv(key) == "" {
return os.Setenv(key, value)
}
return nil
}
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function returns an error that all callers currently ignore. Either handle the error at call sites or document that failures are intentionally suppressed.

Copilot uses AI. Check for mistakes.
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a feature to predict wanda cache hits and allow setting step-level priorities. The implementation is mostly solid, with good test coverage for the new functionality. I have a few suggestions to improve robustness and maintainability, mainly around avoiding global state mutation and improving error handling. Please see my detailed comments below.

Comment on lines +123 to +153
func (c *wandaConverter) predictCacheHit(file string, envs map[string]string) bool {
// Only predict cache hits if we have the necessary config
if c.config.CIWorkRepo == "" {
return false
}

// Set environment variables for the prediction
// This allows the wanda package to expand variables in the spec file
for k, v := range envs {
_ = setEnvIfNotSet(k, v)
}

forgeConfig := &wanda.ForgeConfig{
WorkDir: ".",
WorkRepo: c.config.CIWorkRepo,
NamePrefix: c.config.ForgePrefix,
BuildID: c.info.buildID,
Epoch: wanda.DefaultCacheEpoch(),
RayCI: true,
Rebuild: false,
}

cacheHit, err := wanda.PredictCacheHit(file, forgeConfig)
if err != nil {
// If prediction fails, log the error but don't fail the build
log.Printf("failed to predict cache hit for %s: %v", file, err)
return false
}

return cacheHit
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Modifying the process-wide environment using os.Setenv is not thread-safe and can lead to race conditions and unpredictable behavior if rayci ever runs converters concurrently. The environment is a global state.

A safer approach is to pass the environment information explicitly without mutating the global state. The wanda.PredictCacheHit function (also introduced in this PR) can be modified to accept a lookup function for environment variables.

Suggested Refactoring:

  1. Export lookupFunc in wanda/spec.go:
    Rename lookupFunc to LookupFunc to make it public.

  2. Update wanda.PredictCacheHit in wanda/predict_cache.go:
    Modify its signature to accept the lookup function.

    // In wanda/predict_cache.go
    // The wanda.LookupFunc type needs to be exported from the wanda package.
    func PredictCacheHit(specFile string, config *ForgeConfig, lookupEnv wanda.LookupFunc) (bool, error) {
        // ...
        spec, err := parseSpecFile(specFile)
        if err != nil {
            return false, fmt.Errorf("parse spec file: %w", err)
        }
    
        if lookupEnv == nil {
            lookupEnv = os.LookupEnv
        }
        // Expand env variables just like the actual build does
        spec = spec.expandVar(lookupEnv)
        // ...
    }
  3. Update predictCacheHit in this file (raycicmd/wanda.go):
    This suggestion replaces the function body to create a lookup function and pass it to wanda.PredictCacheHit, removing the need for setEnvIfNotSet.

func (c *wandaConverter) predictCacheHit(file string, envs map[string]string) bool {
	// Only predict cache hits if we have the necessary config
	if c.config.CIWorkRepo == "" {
		return false
	}

	// Create a lookup function for environment variables to avoid mutating global state.
	// This allows the wanda package to expand variables in the spec file safely.
	lookup := func(key string) (string, bool) {
		if val, ok := envs[key]; ok {
			return val, true
		}
		return os.LookupEnv(key)
	}

	forgeConfig := &wanda.ForgeConfig{
		WorkDir:    ".",
		WorkRepo:   c.config.CIWorkRepo,
		NamePrefix: c.config.ForgePrefix,
		BuildID:    c.info.buildID,
		Epoch:      wanda.DefaultCacheEpoch(),
		RayCI:      true,
		Rebuild:    false,
	}

	// Assumes PredictCacheHit is modified to accept a lookup function.
	cacheHit, err := wanda.PredictCacheHit(file, forgeConfig, lookup)
	if err != nil {
		// If prediction fails, log the error but don't fail the build
		log.Printf("failed to predict cache hit for %s: %v", file, err)
		return false
	}

	return cacheHit
}

Comment on lines +155 to +160
func setEnvIfNotSet(key, value string) error {
if os.Getenv(key) == "" {
return os.Setenv(key, value)
}
return nil
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function will no longer be needed after refactoring predictCacheHit as suggested in my other comment to avoid using global environment variables.

As a side note, os.Getenv(key) == "" is not a reliable way to check if an environment variable is unset, as it could be explicitly set to an empty string. The correct way is to use _, ok := os.LookupEnv(key).

Comment on lines +97 to +101
_, err = remote.Get(ct, remoteOpts...)
if err != nil {
// Cache miss or error checking
return false, nil
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic swallows all errors from remote.Get, treating them as a cache miss. This can hide underlying issues like authentication failures or network problems, making them hard to debug. It would be better to distinguish between a 'not found' error (which is a cache miss) and other unexpected errors.

You can achieve this by inspecting the error type and status code. You will need to add the following imports:

import (
    "errors"
    "net/http"

    "github.com/google/go-containerregistry/pkg/v1/remote/transport"
)
	_, err = remote.Get(ct, remoteOpts...)
	if err != nil {
		var terr *transport.Error
		if errors.As(err, &terr) && terr.StatusCode == http.StatusNotFound {
			// This is a cache miss, not an error.
			return false, nil
		}

		// Any other error is unexpected and should be returned to be logged by the caller.
		return false, fmt.Errorf("failed to check remote cache tag %q: %w", cacheTag, err)
	}

@aslonnie aslonnie marked this pull request as draft October 8, 2025 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants