Skip to content

feat: support parse source info from git repo #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 18, 2025
Merged

Conversation

chlins
Copy link
Contributor

@chlins chlins commented Apr 18, 2025

This pull request introduces the ability to capture and manage source control metadata (e.g., source URL and revision) within the build process. It includes changes to the build command, configuration structures, and backend logic, as well as the addition of a new source package for parsing source information. Below are the most important changes grouped by theme:

Feature: Source Metadata Integration

  • Added SourceURL and SourceRevision flags to the build command, allowing users to specify source metadata directly. (cmd/build.go)
  • Updated the Build configuration struct to include SourceURL and SourceRevision fields. (pkg/config/build.go)
  • Enhanced the backend Build function to retrieve source metadata using the new getSourceInfo helper. This includes parsing Git repository details if metadata is not explicitly provided. (pkg/backend/build.go) [1] [2]

Backend Enhancements

  • Extended the Model struct to include SourceURL and SourceRevision fields, which are populated during the build process. (pkg/backend/build/config/model.go)
  • Updated the buildModelConfig function to include source metadata in the model configuration. (pkg/backend/build/builder.go)

New source Package

  • Introduced the source package to handle source metadata parsing, including a Git parser that retrieves the remote URL, commit hash, and workspace cleanliness. (pkg/source/parser.go, pkg/source/git.go) [1] [2]
  • Added unit tests for the Git parser to validate its behavior with a sample Git repository. (pkg/source/git_test.go, pkg/source/testdata/git-repo) [1] [2]

Dependency Updates

  • Added github.com/go-git/go-git/v5 to the project dependencies for interacting with Git repositories. (go.mod)

These changes collectively enhance the build process by enabling automatic or user-specified tracking of source control metadata, improving traceability and reproducibility of builds.

Summary by CodeRabbit

  • New Features
    • Added support for capturing and embedding source repository URL and revision metadata during the build process.
    • Introduced new build command options to specify source URL and revision.
  • Bug Fixes
    • Improved error handling for source information retrieval during builds.
  • Tests
    • Added tests to verify correct extraction of Git repository metadata.
  • Chores
    • Updated dependencies to include Git handling libraries and related utilities.

@chlins chlins added the enhancement New feature or request label Apr 18, 2025
Copy link

coderabbitai bot commented Apr 18, 2025

Walkthrough

This update introduces source control metadata tracking into the model build process. The build command now accepts --source-url and --source-revision flags, which are recorded in the build configuration. If not provided, the system attempts to auto-detect Git repository information from the workspace using a new parser interface and implementation. The model configuration and its descriptor are extended to include source URL and revision fields. Supporting code for Git parsing and corresponding tests are added, along with new dependencies for Git operations.

Changes

File(s) Change Summary
cmd/build.go Added --source-url and --source-revision flags to build command, binding them to buildConfig.
pkg/config/build.go Added SourceURL and SourceRevision fields to the exported Build struct.
pkg/backend/build/config/model.go Added SourceURL and SourceRevision fields to the exported Model struct.
pkg/backend/build.go Integrated source info retrieval into the build process; added getSourceInfo helper; error handling for source parsing.
pkg/backend/build/builder.go Modified buildModelConfig to include SourceURL and Revision in the model descriptor.
pkg/source/parser.go Introduced Parser interface, Info struct, ParserTypeGit constant, and NewParser factory for source parsing.
pkg/source/git.go Implemented git parser for extracting repository URL, commit hash, and dirty state using go-git.
pkg/source/git_test.go Added unit test for the git parser using a test repository.
pkg/source/testdata/git-repo Added test data representing a Git submodule commit for parser testing.
.gitmodules Added Git submodule configuration for test data repository.
go.mod Added github.com/go-git/go-git/v5 and related dependencies for Git operations; updated/added several indirect dependencies.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI (build command)
    participant Backend
    participant Source Parser (Git)
    participant Model Config

    User->>CLI (build command): Run build with optional --source-url/--source-revision
    CLI (build command)->>Backend: Pass buildConfig (with or without source info)
    Backend->>Source Parser (Git): If source info missing, call Parse(workspace)
    Source Parser (Git)-->>Backend: Return source URL, commit, dirty state
    Backend->>Model Config: Populate SourceURL and SourceRevision fields
    Backend-->>CLI (build command): Complete build, now with source metadata
Loading

Poem

In the warren of code, a new trail appears,
With Gitty footprints marking our model’s frontiers.
Source URLs and revisions now hop along,
Ensuring our builds remember where they belong.
With every commit, a story to tell—
This rabbit’s delighted: provenance as well!
🐇✨

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (1.64.8)

Error: you are using a configuration file for golangci-lint v2 with golangci-lint v1: please use golangci-lint v2
Failed executing command with error: you are using a configuration file for golangci-lint v2 with golangci-lint v1: please use golangci-lint v2

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 33de02b and 0429c15.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (11)
  • .gitmodules (1 hunks)
  • cmd/build.go (1 hunks)
  • go.mod (3 hunks)
  • pkg/backend/build.go (5 hunks)
  • pkg/backend/build/builder.go (1 hunks)
  • pkg/backend/build/config/model.go (1 hunks)
  • pkg/config/build.go (1 hunks)
  • pkg/source/git.go (1 hunks)
  • pkg/source/git_test.go (1 hunks)
  • pkg/source/parser.go (1 hunks)
  • pkg/source/testdata/git-repo (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • cmd/build.go
  • pkg/source/git_test.go
  • pkg/source/parser.go
🚧 Files skipped from review as they are similar to previous changes (8)
  • .gitmodules
  • pkg/source/testdata/git-repo
  • pkg/backend/build/config/model.go
  • pkg/backend/build/builder.go
  • pkg/config/build.go
  • pkg/source/git.go
  • pkg/backend/build.go
  • go.mod

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (11)
pkg/config/build.go (2)

34-35: Initialize the new source fields in NewBuild() function.

The new fields SourceURL and SourceRevision are added to the Build struct but are not initialized in the NewBuild() function. While they will default to empty strings, it would be more consistent to explicitly initialize them.

func NewBuild() *Build {
	return &Build{
		Concurrency:  defaultBuildConcurrency,
		Target:       "",
		Modelfile:    "Modelfile",
		OutputRemote: false,
		PlainHTTP:    false,
		Insecure:     false,
		Nydusify:     false,
+		SourceURL:      "",
+		SourceRevision: "",
	}
}

50-70: Consider validating source URL format if non-empty.

If SourceURL is provided, it might be worth validating that it has a valid URL format. This would prevent invalid URLs from being propagated to the model descriptor.

func (b *Build) Validate() error {
	if b.Concurrency <= 0 {
		return fmt.Errorf("concurrency must be greater than 0")
	}

	if len(b.Target) == 0 {
		return fmt.Errorf("target model artifact name is required")
	}

	if len(b.Modelfile) == 0 {
		return fmt.Errorf("model file path is required")
	}

	if b.Nydusify {
		if !b.OutputRemote {
			return fmt.Errorf("nydusify only works with output remote")
		}
	}
+
+	// Validate SourceURL format if provided
+	if len(b.SourceURL) > 0 {
+		if _, err := url.Parse(b.SourceURL); err != nil {
+			return fmt.Errorf("invalid source URL format: %w", err)
+		}
+	}

	return nil
}

Note: You would need to add "net/url" to your imports.

pkg/source/git_test.go (2)

1-15: Update copyright year

The copyright year is set to 2025, which appears to be incorrect. This should be updated to match the actual year when the code was created (likely 2024).

-/*
- *     Copyright 2025 The CNAI Authors
- *
+/*
+ *     Copyright 2024 The CNAI Authors
+ *

25-32: Add tests for edge cases

The test validates the happy path for Git repository parsing but lacks coverage for edge cases such as:

  • Repositories without an "origin" remote
  • Repositories with uncommitted changes (dirty state)
  • Non-Git directories or invalid repositories

Consider adding more test cases to improve coverage of these scenarios.

pkg/source/git.go (2)

1-15: Update copyright year

The copyright year is set to 2025, which appears to be incorrect. This should be updated to match the actual year when the code was created (likely 2024).

-/*
- *     Copyright 2025 The CNAI Authors
- *
+/*
+ *     Copyright 2024 The CNAI Authors
+ *

25-63: Handle repositories without "origin" remote

The implementation assumes the repository has an "origin" remote. Consider adding fallback logic for repositories that use a different name for their primary remote.

A more robust approach would be to try "origin" first, but then fall back to using any available remote if "origin" doesn't exist:

 func (g *git) Parse(workspace string) (*Info, error) {
 	repo, err := gogit.PlainOpen(workspace)
 	if err != nil {
 		return nil, fmt.Errorf("failed to open repo: %w", err)
 	}
 
-	// By default, use the origin as the remote.
+	// Try to get the "origin" remote first
 	remote, err := repo.Remote("origin")
 	if err != nil {
-		return nil, fmt.Errorf("failed to get remote: %w", err)
+		// Fall back to the first available remote
+		remotes, err := repo.Remotes()
+		if err != nil {
+			return nil, fmt.Errorf("failed to get remotes: %w", err) 
+		}
+		if len(remotes) == 0 {
+			return nil, fmt.Errorf("no remotes found in repository")
+		}
+		remote = remotes[0]
 	}
-	url := remote.Config().URLs[0]
+	remoteURLs := remote.Config().URLs
+	if len(remoteURLs) == 0 {
+		return nil, fmt.Errorf("no URLs found for remote")
+	}
+	url := remoteURLs[0]
pkg/backend/build.go (1)

195-225: Clarify precedence between user-specified and auto-detected source information

The logic for when to use user-specified vs. auto-detected source information is not entirely clear. Currently, if the user specifies a source URL but not a revision, the auto-detection is skipped entirely.

Consider refining the logic to:

  1. Always attempt to detect Git info if in a Git repository
  2. Use user-specified values as overrides for any detected values
  3. Provide clear fallback behavior
 func getSourceInfo(workspace string, buildConfig *config.Build) (*source.Info, error) {
+	// Initialize with user-provided values
 	info := &source.Info{
 		URL:    buildConfig.SourceURL,
 		Commit: buildConfig.SourceRevision,
 	}
 
-	// Try to parse the source information if user not specified.
-	if info.URL == "" {
-		var parser source.Parser
-		gitPath := filepath.Join(workspace, ".git")
-		if _, err := os.Stat(gitPath); err == nil {
-			parser, err = source.NewParser(source.ParserTypeGit)
-			if err != nil {
-				return nil, err
-			}
-		}
-
-		// Parse the source information if available.
-		if parser != nil {
-			info, err := parser.Parse(workspace)
-			if err != nil {
-				return nil, err
-			}
-
-			return info, nil
+	// Try to detect Git repository information
+	gitPath := filepath.Join(workspace, ".git")
+	if _, err := os.Stat(gitPath); err == nil {
+		parser, err := source.NewParser(source.ParserTypeGit)
+		if err != nil {
+			return nil, fmt.Errorf("failed to create Git parser: %w", err)
+		}
+		
+		parsedInfo, err := parser.Parse(workspace)
+		if err != nil {
+			// Log the error but continue with user-provided values
+			fmt.Printf("Warning: Failed to parse Git repository: %v\n", err)
+		} else {
+			// Use parsed values if user didn't specify
+			if info.URL == "" {
+				info.URL = parsedInfo.URL
+			}
+			if info.Commit == "" {
+				info.Commit = parsedInfo.Commit
+			}
+			info.Dirty = parsedInfo.Dirty
 		}
 	}
 
 	return info, nil
pkg/source/parser.go (4)

2-2: Update copyright year to current year.

The copyright year is set to 2025, which is in the future. Please update to the current year or appropriate copyright year range.

- *     Copyright 2025 The CNAI Authors
+ *     Copyright 2024 The CNAI Authors

44-51: Add documentation comment for the NewParser function.

The NewParser function is missing a documentation comment. For consistency and better code documentation, please add a comment describing its purpose, parameters, and return values.

+// NewParser creates a new source parser of the specified type.
+// Currently only "git" type is supported.
+// Returns an error if the specified parser type is not supported.
 func NewParser(typ string) (Parser, error) {
     switch typ {
     case ParserTypeGit:
         return &git{}, nil
     default:
         return nil, fmt.Errorf("unsupported parser type: %s", typ)
     }
 }

21-24: Consider using iota for parser type constants.

If you plan to add more parser types in the future, consider using iota for the constants and string mapping functions for better type safety and extensibility.

 const (
 	// ParserTypeGit is the type of parser for git repositories.
-	ParserTypeGit = "git"
+	ParserTypeGit ParserType = iota
 )
+
+// ParserType represents the type of source parser
+type ParserType int
+
+// String returns the string representation of the parser type
+func (p ParserType) String() string {
+	switch p {
+	case ParserTypeGit:
+		return "git"
+	default:
+		return "unknown"
+	}
+}

Then update the NewParser function to use this type:

-func NewParser(typ string) (Parser, error) {
+func NewParser(typ ParserType) (Parser, error) {
 	switch typ {
 	case ParserTypeGit:
 		return &git{}, nil
 	default:
-		return nil, fmt.Errorf("unsupported parser type: %s", typ)
+		return nil, fmt.Errorf("unsupported parser type: %s", typ.String())
 	}
 }

33-41: Consider adding String() method to Info struct.

For better debugging and logging, consider adding a String() method to the Info struct to provide a standardized string representation.

// String returns a string representation of the source info
func (i *Info) String() string {
	dirtyStatus := ""
	if i.Dirty {
		dirtyStatus = " (dirty)"
	}
	return fmt.Sprintf("URL: %s, Commit: %s%s", i.URL, i.Commit, dirtyStatus)
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4764d65 and f2c662b.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (10)
  • cmd/build.go (1 hunks)
  • go.mod (3 hunks)
  • pkg/backend/build.go (5 hunks)
  • pkg/backend/build/builder.go (1 hunks)
  • pkg/backend/build/config/model.go (1 hunks)
  • pkg/config/build.go (1 hunks)
  • pkg/source/git.go (1 hunks)
  • pkg/source/git_test.go (1 hunks)
  • pkg/source/parser.go (1 hunks)
  • pkg/source/testdata/git-repo (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
pkg/source/git.go (1)
pkg/source/parser.go (1)
  • Info (32-42)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (8)
pkg/source/testdata/git-repo (1)

1-1: Approve the test fixture for Git parser

The testdata file correctly includes the submodule commit line needed for validating the Git parser implementation.

go.mod (2)

12-12: Looks good: Adding the go-git dependency enables source control information extraction.

This dependency will allow the application to parse Git repository details like remote URL and commit hash.


27-77: These indirect dependencies support the Git functionality correctly.

The added indirect dependencies (SSH configuration, Git utilities, etc.) are required by go-git to function properly.

Also applies to: 100-108

pkg/backend/build/builder.go (1)

269-270: Source metadata is correctly propagated to the model descriptor.

The implementation correctly maps the source information from the build configuration to the model descriptor, enabling traceability of the model's origin.

pkg/backend/build/config/model.go (1)

28-29: Source metadata fields are correctly added to the Model struct.

The addition of SourceURL and SourceRevision fields enables storing and tracking source control metadata at the model configuration level.

cmd/build.go (1)

62-63: Source tracking flags added correctly

The new flags for source URL and revision are well-integrated with the existing command structure. They provide users with the ability to explicitly specify source control metadata during the build process.

pkg/backend/build.go (1)

100-103: LGTM: Proper handling of dirty repository state

The code correctly handles the case where the repository has uncommitted changes by appending "-dirty" to the revision string. This follows common versioning practices for distinguishing clean vs. dirty builds.

pkg/source/parser.go (1)

26-42: Well-designed interface and struct with good documentation.

The Parser interface and Info struct have a clean design with appropriate documentation for each field. The comments provide good context and examples which will help users understand how to use this API.

@chlins chlins force-pushed the feat/source-git branch 2 times, most recently from 33de02b to a5cdf09 Compare April 18, 2025 04:22
Copy link
Contributor

@gaius-qi gaius-qi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gaius-qi gaius-qi merged commit 3a2aab2 into main Apr 18, 2025
6 checks passed
@gaius-qi gaius-qi deleted the feat/source-git branch April 18, 2025 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants