Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions pipelines/azure-benchmarks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ stages:
- job: RunBenchmark
displayName: 'Run Copilot Benchmarks'
steps:
- task: UsePythonVersion@0
displayName: 'Use Python 3.12'
inputs:
versionSpec: '3.12'
- template: /pipelines/templates/steps/install-uv.yml

- pwsh: uv python install 3.12 && uv python pin 3.12
displayName: 'Install Python 3.12 with uv'
Comment on lines +22 to +25
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv python install 3.12 installs a Python runtime, but nothing here ensures subsequent uv pip/uv run commands will actually use 3.12 (and UsePythonVersion@0 was removed). Consider explicitly selecting the interpreter (e.g., passing --python 3.12 to the uv commands or setting the appropriate env var) so the benchmark runs against the intended Python version.

Copilot uses AI. Check for mistakes.

- task: PipAuthenticate@1
displayName: 'Authenticate pip with MicrosoftSweBench feed'
Expand All @@ -36,4 +36,4 @@ stages:
addSpnToEnvironment: true
scriptType: 'pscore'
scriptLocation: 'scriptPath'
scriptPath: $(Build.SourcesDirectory)/pipelines/scripts/Invoke-CopilotBenchmarks.ps1
scriptPath: $(Build.SourcesDirectory)/pipelines/scripts/Invoke-CopilotBenchmarks.ps1
53 changes: 53 additions & 0 deletions pipelines/scripts/Activate-Venv.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
<#!
.SYNOPSIS
Activates a virtual environment for a CI machine. Any further usages of "python" will utilize this virtual environment.

.DESCRIPTION
When activating a virtual environment, only a few things are actually functionally changed on the machine.

# 1. PATH = path to the bin directory of the virtual env. "Scripts" on windows machines
# 2. VIRTUAL_ENV = path to root of the virtual env
# 3. VIRTUAL_ENV_PROMPT = the prompt that is displayed next to the CLI cursor when the virtual env is active
# within a CI machine, we only need the PATH and VIRTUAL_ENV variables to be set.
# 4. (optional and inconsistently) _OLD_VIRTUAL_PATH = the PATH before the virtual env was activated. This is not set in this script.

.PARAMETER VenvName
The name of the virtual environment to activate.

.PARAMETER RepoRoot
The root of the repository.
#>
param (
[string]$VenvName = "venv"
)

Set-StrictMode -Version 4
$ErrorActionPreference = "Stop"

$repoRoot = Join-Path $PSScriptRoot ".." ".." -Resolve
Comment on lines +21 to +27
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help text documents a RepoRoot parameter, but the script only accepts VenvName. Since this script is dot-sourced by the benchmark runner, callers passing -RepoRoot will error. Add a RepoRoot parameter (and use it for $venvPath) or remove the RepoRoot documentation and update callers accordingly.

Suggested change
[string]$VenvName = "venv"
)
Set-StrictMode -Version 4
$ErrorActionPreference = "Stop"
$repoRoot = Join-Path $PSScriptRoot ".." ".." -Resolve
[string]$VenvName = "venv",
[string]$RepoRoot
)
Set-StrictMode -Version 4
$ErrorActionPreference = "Stop"
if ($RepoRoot) {
$repoRoot = $RepoRoot
} else {
$repoRoot = Join-Path $PSScriptRoot ".." ".." -Resolve
}

Copilot uses AI. Check for mistakes.
$venvPath = Join-Path $repoRoot $VenvName
$pipelineRun = $env:TF_BUILD -eq "True"

if (-not (Test-Path $venvPath)) {
Write-Error "Virtual environment '$venvPath' does not exist at $venvPath"
exit 1
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this script is intended to be dot-sourced, calling exit 1 will terminate the entire PowerShell session immediately, which makes it hard for the caller to handle/report errors consistently. Prefer throw here so the calling script can fail naturally (and optionally catch/log) while still stopping execution.

Suggested change
exit 1
throw

Copilot uses AI. Check for mistakes.
}

Write-Host "Activating virtual environment '$VenvName' via VIRTUAL_ENV variable at $venvPath.'"
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log message has an extra trailing apostrophe after the venv path (... at $venvPath.'). This looks like a typo and can be confusing when reading pipeline logs.

Suggested change
Write-Host "Activating virtual environment '$VenvName' via VIRTUAL_ENV variable at $venvPath.'"
Write-Host "Activating virtual environment '$VenvName' via VIRTUAL_ENV variable at $venvPath."

Copilot uses AI. Check for mistakes.
$env:VIRTUAL_ENV = $venvPath
if ($pipelineRun) {
Write-Host "##vso[task.setvariable variable=VIRTUAL_ENV]$($env:VIRTUAL_ENV)"
}

$venvBinPath = $IsWindows ? (Join-Path $venvPath "Scripts") : (Join-Path $venvPath "bin")

Write-Host "Prepending path with $venvBinPath"
$env:PATH = $IsWindows ? "$venvBinPath;$($env:PATH)" : "$venvBinPath`:$($env:PATH)"
if ($pipelineRun) {
Write-Host "##vso[task.prependpath]$venvBinPath"
}

if ($pipelineRun) {
Write-Host "Unset of PYTHONHOME"
Write-Host "##vso[task.setvariable variable=PYTHONHOME]"
}
40 changes: 40 additions & 0 deletions pipelines/scripts/Create-Venv.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
<#!
.SYNOPSIS
Creates a virtual environment for a CI machine.

.DESCRIPTION
If the virtual environment directory already exists, it will skip the creation. The location of the virtual environment will be stored in a variable
named <VenvName>_LOCATION. The location will be RepoRoot + VenvName.

.PARAMETER VenvName
The name of the virtual environment which will be created.

.PARAMETER RepoRoot
The root of the repository.
#>
param(
[string] $VenvName = "venv"
)

$repoRoot = Join-Path $PSScriptRoot ".." ".." -Resolve
Comment on lines +16 to +19
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help text documents a RepoRoot parameter, but the script only accepts VenvName. This mismatch is also what causes callers to pass -RepoRoot and fail. Either add a RepoRoot parameter (and honor it when building $venvPath) or remove the RepoRoot documentation.

Suggested change
[string] $VenvName = "venv"
)
$repoRoot = Join-Path $PSScriptRoot ".." ".." -Resolve
[string] $VenvName = "venv",
[string] $RepoRoot
)
if ($PSBoundParameters.ContainsKey('RepoRoot') -and $RepoRoot) {
$repoRoot = $RepoRoot
}
else {
$repoRoot = Join-Path $PSScriptRoot ".." ".." -Resolve
}

Copilot uses AI. Check for mistakes.
$venvPath = Join-Path $repoRoot $VenvName

if (!(Test-Path $venvPath)) {
if (Get-Command uv -ErrorAction SilentlyContinue) {
Write-Host "Creating virtual environment '$VenvName' using uv."
uv venv $venvPath --verbose
}
else {
$invokingPython = (Get-Command "python").Source

Write-Host "Creating virtual environment '$VenvName' using virtualenv and python located at '$invokingPython'."
python -m pip install virtualenv==20.25.1
python -m virtualenv $venvPath
}
Comment on lines +22 to +33
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native command failures in this script aren’t checked. uv venv, python -m pip install ..., and python -m virtualenv ... can fail without throwing, and the script will continue (PowerShell doesn’t automatically stop on non-zero exit codes from native commands). Consider checking $LASTEXITCODE/$? after each native command and throwing on failure to avoid silently continuing with a missing/broken venv.

Copilot uses AI. Check for mistakes.
$pythonVersion = python --version
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$pythonVersion = python --version will report the invoking shell’s python, not necessarily the interpreter used to create the venv (especially in the uv venv path). If the intent is to log the venv’s Python version, query the venv interpreter directly (e.g., $venvPath/bin/python or $venvPath/Scripts/python.exe).

Suggested change
$pythonVersion = python --version
if ($IsWindows) {
$venvPython = Join-Path $venvPath "Scripts/python.exe"
}
else {
$venvPython = Join-Path $venvPath "bin/python"
}
$pythonVersion = & $venvPython --version

Copilot uses AI. Check for mistakes.
Write-Host "Virtual environment '$VenvName' created at directory path '$venvPath' utilizing python version $pythonVersion."
Write-Host "##vso[task.setvariable variable=$($VenvName)_LOCATION]$venvPath"
}
else {
Write-Host "Virtual environment '$VenvName' already exists. Skipping creation."
}
167 changes: 87 additions & 80 deletions pipelines/scripts/Invoke-CopilotBenchmarks.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -24,97 +24,104 @@
.LINK
https://github.com/devdiv-microsoft/MicrosoftSweBench/wiki
#>
param(
[string]$Benchmark = "azure",
[string]$Model = "claude-sonnet-4.5-autodev-test",
[switch]$NoWait
)

param(
[string]$Benchmark = "azure",
[string]$Model = "claude-sonnet-4.5-autodev-test",
[switch]$NoWait
)
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"

Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
if (!$Benchmark) {
throw "Benchmark parameter is required."
}

if (!$Benchmark) {
throw "Benchmark parameter is required."
}

if (!$Model) {
throw "Model parameter is required."
}

$vaultName = "kv-msbench-eval-azuremcp"
$secretName = "azure-eval-gh-pat"
if (!$Model) {
throw "Model parameter is required."
}

Write-Host "Benchmark: $Benchmark"
Write-Host "Model: $Model"
Write-Host "NoWait: $NoWait"
$repoRoot = Join-Path $PSScriptRoot ".." ".." -Resolve
$vaultName = "kv-msbench-eval-azuremcp"
$secretName = "azure-eval-gh-pat"

$pipelineRun = $env:TF_BUILD -eq "True"
Write-Host "Benchmark: $Benchmark"
Write-Host "Model: $Model"
Write-Host "NoWait: $NoWait"

# --- Retrieve GitHub PAT from KeyVault ---
try {
Write-Host "Retrieving GitHub PAT from KeyVault $vaultName secret $secretName"
$pat = az keyvault secret show --vault-name $vaultName --name $secretName --query value -o tsv
$pipelineRun = $env:TF_BUILD -eq "True"

if (!$pat) {
throw "Secret $secretName not found in KeyVault $vaultName."
}
. "$PSScriptRoot/Create-Venv.ps1" -VenvName "venv" -RepoRoot $repoRoot
. "$PSScriptRoot/Activate-Venv.ps1" -VenvName "venv" -RepoRoot $repoRoot
Comment on lines +54 to +55
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dot-sourced venv scripts are invoked with -RepoRoot $repoRoot, but neither Create-Venv.ps1 nor Activate-Venv.ps1 defines a RepoRoot parameter. This will fail with “A parameter cannot be found…” before the benchmark runs. Either add a [string]$RepoRoot param to both scripts (and use it instead of recomputing $repoRoot), or remove -RepoRoot from these calls.

Suggested change
. "$PSScriptRoot/Create-Venv.ps1" -VenvName "venv" -RepoRoot $repoRoot
. "$PSScriptRoot/Activate-Venv.ps1" -VenvName "venv" -RepoRoot $repoRoot
. "$PSScriptRoot/Create-Venv.ps1" -VenvName "venv"
. "$PSScriptRoot/Activate-Venv.ps1" -VenvName "venv"

Copilot uses AI. Check for mistakes.

$env:GITHUB_MCP_SERVER_TOKEN = $pat

# Log the PAT as a secret variable to avoid exposing it in logs
if ($pipelineRun) {
Write-Host "##vso[task.setsecret]$pat"
}
}
catch {
throw "Failed to retrieve GitHub PAT from KeyVault: $_"
}
# --- Retrieve GitHub PAT from KeyVault ---
try {
Write-Host "Retrieving GitHub PAT from KeyVault $vaultName secret $secretName"
$pat = az keyvault secret show --vault-name $vaultName --name $secretName --query value -o tsv

# --- Feed auth is handled by the PipAuthenticate@1 pipeline task ---
# PipAuthenticate sets PIP_EXTRA_INDEX_URL for the azure-sdk/internal/MicrosoftSweBench feed.
if ($env:PIP_EXTRA_INDEX_URL) {
Write-Host "PIP_EXTRA_INDEX_URL is set (feed auth configured by PipAuthenticate task)"
} else {
Write-Warning "PIP_EXTRA_INDEX_URL is not set. Feed authentication may fail. Ensure PipAuthenticate@1 runs before this script."
if (!$pat) {
throw "Secret $secretName not found in KeyVault $vaultName."
}

$pythonCommand = Get-Command python
Write-Host "Using python from: $($pythonCommand.Path). Version: $(python --version 2>&1)"

Write-Host "Install/upgrade pip"
python -m pip install --upgrade pip
if ($LASTEXITCODE -ne 0) {
throw "pip install/upgrade failed with exit code $LASTEXITCODE"
$env:GITHUB_MCP_SERVER_TOKEN = $pat

# Log the PAT as a secret variable to avoid exposing it in logs
if ($pipelineRun) {
Write-Host "##vso[task.setsecret]$pat"
}

Write-Host "Installing/upgrading MSBench CLI"
python -m pip install msbench-cli --no-input
if ($LASTEXITCODE -ne 0) {
throw "pip install msbench-cli failed with exit code $LASTEXITCODE"
}
catch {
throw "Failed to retrieve GitHub PAT from KeyVault: $_"
}

# --- Feed authentication ---
# In CI, PipAuthenticate@1 sets PIP_EXTRA_INDEX_URL automatically.
# For local runs, fall back to az CLI token acquisition.
if ($env:PIP_EXTRA_INDEX_URL) {
Write-Host "PIP_EXTRA_INDEX_URL is set (feed auth configured by PipAuthenticate task). Forwarding to UV_EXTRA_INDEX_URL for MSBench CLI."
$env:UV_EXTRA_INDEX_URL = $env:PIP_EXTRA_INDEX_URL
} else {
Write-Host "PIP_EXTRA_INDEX_URL not set — acquiring Azure DevOps AAD token for local feed auth"
$feedUrl = "https://pkgs.dev.azure.com/azure-sdk/internal/_packaging/MicrosoftSweBench/pypi/simple/"
$adoResourceId = "499b84ac-1321-427f-aa17-267ca6975798"
$adoAccessToken = az account get-access-token --resource $adoResourceId --query accessToken -o tsv

if (!$adoAccessToken) {
throw "Failed to acquire Azure DevOps AAD token. Run 'az login' first."
}

Write-Host "MSBench CLI version"
& 'msbench-cli' version
if ($LASTEXITCODE -ne 0) {
throw "msbench-cli version failed with exit code $LASTEXITCODE"
}

$runArgs = @(
"run",
"--agent", "github-copilot-cli",
"--benchmark", $Benchmark,
"--model", $Model,
"--env", "GITHUB_MCP_SERVER_TOKEN"
)

if ($NoWait) {
$runArgs += "--no-wait"
}

Write-Host "Running: msbench-cli $($runArgs -join ' ')"
& 'msbench-cli' @runArgs

if ($LASTEXITCODE -ne 0) {
throw "msbench-cli run failed with exit code $LASTEXITCODE"
}
$encodedToken = [System.Uri]::EscapeDataString($adoAccessToken)
$env:UV_EXTRA_INDEX_URL = $feedUrl -replace "https://", "https://vsts:$encodedToken@"
Write-Host "UV_EXTRA_INDEX_URL set via az CLI token"
}


Write-Host "`n> uv pip install msbench-cli"
& uv pip install msbench-cli
if ($LASTEXITCODE -ne 0) {
throw "uv pip install msbench-cli failed with exit code $LASTEXITCODE"
}

Write-Host "`n> uv run 'msbench-cli' version"
uv run 'msbench-cli' version
if ($LASTEXITCODE -ne 0) {
throw "uv run msbench-cli failed with exit code $LASTEXITCODE"
}

$runArgs = @(
"run",
"--agent", "github-copilot-cli",
"--benchmark", $Benchmark,
"--model", $Model,
"--env", "GITHUB_MCP_SERVER_TOKEN"
)

if ($NoWait) {
$runArgs += "--no-wait"
}

Write-Host "`n> msbench-cli $($runArgs -join ' ')"
uv run 'msbench-cli' @runArgs
if ($LASTEXITCODE -ne 0) {
throw "msbench-cli run failed with exit code $LASTEXITCODE"
}
25 changes: 25 additions & 0 deletions pipelines/templates/steps/install-uv.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
steps:
- task: Bash@3
displayName: 'Install uv (Linux/macOS)'
inputs:
targetType: inline
script: |
curl -LsSf https://astral.sh/uv/install.sh | sh
condition: or(eq(variables['Agent.OS'], 'Linux'), eq(variables['Agent.OS'], 'Darwin'))

- task: Bash@3
inputs:
targetType: inline
script: |
echo "##vso[task.prependpath]$HOME/.local/bin"
displayName: 'Prepend path for MacOS'
condition: eq(variables['Agent.OS'], 'Darwin')

- task: PowerShell@2
displayName: 'Install uv (Windows)'
inputs:
targetType: inline
script: |
iex (irm https://astral.sh/uv/install.ps1)
Write-Host "##vso[task.prependpath]$env:USERPROFILE\.local\bin"
condition: eq(variables['Agent.OS'], 'Windows_NT')
Loading