Azure · Dec 12, 2023
diff --git a/‎.pipelines/e2e.yaml
+4-1 b/‎.pipelines/e2e.yaml
+4-1
diff --git a/‎e2e/README.md
+61-13 b/‎e2e/README.md
+61-13
@@ -40,8 +40,11 @@ jobs:
       cd e2e
       go test -timeout 45m -v -run Test_All ./
     displayName: Run AgentBaker E2E
+    env:
+      VHD_BUILD_ID: $(VHD_BUILD_ID)
+      ADO_PAT: $(ADO_PAT)
   - publish: $(System.DefaultWorkingDirectory)/e2e/scenario-logs
     artifact: scenario-logs
     condition: always()
 
- 
+ 
@@ -6,34 +6,33 @@ E2E testing for Linux is currently implemented using a Golang framework built fr
 
 The goal of E2E testing with AgentBaker is to ensure that the node bootstrapping artifacts generted and returned by the primary AgentBaker API not only contain *expected* content, but also contain *correct* content that can be used as-is to bootstrap real Azure VMs so they can join real AKS clusters.
 
-From a high-level, each E2E scenario makes a call out to the primary node-bootstrapping API [GetLatestNodeBootstrapping](https://github.com/Azure/AgentBaker/blob/2e730b5a498c5be9b082d912fd08ac9346582db9/pkg/agent/bakerapi.go#L14) with a set of parameters (represented by a NodeBootstrappingConfiugration) which define the given scenario to generate CSE and custom data. A new VMSS containing a single VM will then be created and associated with an AKS cluster that is already running in the Azure. The CSE and custom data generated by AgentBaker will then be applied to the new VM such that it can be properly bootstrapped and register itself with the apiserver of the running cluster. Liveness and health checks and then run to make sure the new VM's kubelet is posting NodeReady to the cluster's apiserver, and that workload pods can successfully be run on it. Lastly, a set of validation commands are remotely executed on the VM after it has successfully been bootstrapped to ensure that its live state (file existsnce, sysctl settings, etc.) is as expected.
+From a high-level, each E2E scenario makes a call out to the primary node-bootstrapping API [GetLatestNodeBootstrapping](https://github.com/Azure/AgentBaker/blob/2e730b5a498c5be9b082d912fd08ac9346582db9/pkg/agent/bakerapi.go#L14) with a set of parameters (represented by a NodeBootstrappingConfiugration) which define the given scenario to generate CSE and custom data. A new VMSS containing a single VM will then be created and associated with an AKS cluster that is already running in Azure. The CSE and custom data generated by AgentBaker will then be applied to the new VM so it can bootstrap and register itself with the apiserver of the running cluster. Liveness and health checks and then run to make sure the new VM's kubelet is posting NodeReady to the cluster's apiserver, and that workload pods can successfully be run on it. Lastly, a set of validation commands are remotely executed on the VM to ensure its live state (file existsnce, sysctl settings, etc.) is as expected.
 
 ## Running Locally
 
 **Note: if you have changed code or artifacts used to generate custom data or custom script extension payloads, you should first run `make generate` from the root of the AgentBaker repository.**
 
-To run the Go implementation of the E2E test suite locally, simply use `e2e-local.sh`. This script will setup the call to `go test` for you while also implementing default logic for a set of required environment variables used to interact with Azure. These required environment variables include:
+To run the Go implementation of the E2E test suite locally, simply use `e2e-local.sh`. This script will setup the `go test` command for you while also implementing defaulting logic for a set of required environment variables used to interact with Azure. These environment variables include:
 
-- `SUBSCRIPTION_ID` - default `8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8`
-- `RESOURCE_GROUP_NAME` - defualt: `agentbaker-e2e-tests`
+- `SUBSCRIPTION_ID` - default `8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8` (ACS Test Subscription)
 - `LOCATION` - default: `eastus`
-- `CLUSTER_NAME` - default `agentbaker-e2e-test-cluster`
 - `AZURE_TENANT_ID` - default: `72f988bf-86f1-41af-91ab-2d7cd011db47`
 
+<br>
+
 `SCENARIOS_TO_RUN` may also optionally be set to specify a subset of the E2E scenarios to run during the testing session as a comma-separated list, for example:
 
 ```bash
-SCENARIOS_TO_RUN=base,gpu ./e2e-local.sh
+SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
 ```
 
 Furthermore, `SCENARIOS_TO_EXCLUDE` may also optionally be set to specify the set of scenarios which will be excluded from the testing session as a commma-separated list. If both `SCENARIOS_TO_RUN` and `SCENARIOS_TO_EXCLUDE` are specified, `SCENARIOS_TO_RUN` will take precedence.
 
-`KEEP_VMSS` can also be optionally specified to have the test suite retain the bootstrapped VMSS VMs for further debugging. When this option is specified, the private SSH key used to bootstrap the VMs will be included within each scenario's log bundle.
-NOTE: if this option is specified please make sure to manually delete your bootstrapped VMs later. Though, all bootstrapped VMs will eventually be deleted by the ACS test GC regardless.
+`KEEP_VMSS` can also be optionally specified to have the test suite retain the bootstrapped VM(s) for further debugging. When this option is specified, the private SSH key used to connect to each VM will be included within each scenario's log bundle respectively.
 
-**Note that when using `e2e-local.sh`, a timeout value of 30 minutes is applied to the `go test` command.**
+**Note that when using `e2e-local.sh`, a timeout value of 45 minutes is applied to the `go test` command.**
 
-You may also run the test command yourself assuming you've properly setup the required environment variables like so:
+You may also run the test command with custom arguments yourself (assuming you've properly setup the required environment variables) from within the `e2e/` directory like so:
 
 ```bash
 go test -timeout 30m -v -run Test_All ./
@@ -47,10 +46,59 @@ The `e2e_test` package has a dependency on subpackage located in the [scenario](
 
 The primary testing function is located in [suite_test.go](suite_test.go), which is run by `go test ...`.
 
-## Updating the Test Images
-The [images.go](scenario/images.go) file contains the hard-coded references to a set of delete-locked SIG versions used by the e2e scenarios.
+## E2E VHDs
+When configuring E2E scenarios, a `VHDSelector` must be specified in order to tell the suite which particular VHD it should use to bootstrap the VM.
+
+`VHDSelector`s select from a "base" VHD catalog, initialized from [scenario/base_vhd_catalog.json](scenario/base_vhd_catalog.json) as an embedding. Each entry in the catalog is represented as a `VHD`, which contains a resource ID that gets injected into the VMSS model when the given scenario is ran. The aforementioned JSON file contains configurations for the current set of default catalog entries. At any given time, those default entries will point to VHDs stored within our testing subscription, guarded by resouce deletion locks.
+
+For example, [scenario_ubuntu2204.go](scenario/scenario_ubuntu2204.go) defines the Ubuntu 2204 scenario, which specifies the `Ubuntu2204Gen2Containerd` VHD selector. This selector will always select the Ubuntu2204/gen2 VHD catalog entry from the base catalog. If running the suite using some arbitrary VHD build for testing, then the selector will take the corresponding Ubuntu2204/gen2 VHD from the given build instead of the default entry.
+
+
+### Updating Default Catalog Entries
+To update the set of default VHD catalog entries to point towards new VHDs, simply update the `resourceId` field of the respective VHD within [scenario/base_vhd_catalog.json](scenario/base_vhd_catalog.json). If you're making this change as a part of a PR, you need to make sure to lock the new VHDs with resource deletion locks to ensure they're always available going forward. Note that if you run the suite in a region other than eastus, you'll need to make sure the VHDs you point the suite towards are appropriately replicated in the given region as well.
+
+### Using Arbitrary VHD Builds
+If you'd like to run the E2E suite using a set of VHDs built from some arbitrary run of the VHD build pipeline in the MSFT tenant, you can do so by specifying the ID of the build. This is an alternative to manually updating the set of default VHD catalog entries. If a given scenario is ran which selects a VHD that was not built as a part of the specified VHD build, the selector will select the corresponding default catalog entry instead.
+
+To use a build, simply specify its ID using the `VHD_BUILD_ID` environment variable like so:
+
+```bash
+VHD_BUILD_ID=123456789 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
+```
+
+***NOTE: To utilize this feature, you'll also need to provide the suite with an ADO PAT (personal access token) with which it can access the ADO resources to download the appropriate build artifacts.*** 
+
+To specify your PAT, simply set the `ADO_PAT` environment variable accordingly:
+
+```bash
+ADO_PAT=<secret> VHD_BUILD_ID=123456789 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
+```
+
+or:
+
+```bash
+export ADO_PAT=<secret>
+VHD_BUILD_ID=123456789 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
+VHD_BUILD_ID=234567891 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
+...
+VHD_BUILD_ID=345678912 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
+```
+
+
+### Registering New VHD SKUs for E2E Testing
+When adding a new scenario which uses a VHD that doesn't currently have an associated entry in the base catalog, please make sure to follow these steps to register it with the suite:
+
+1. Build and delete-lock the underlying image version to be referenced in the base catalog
+2. Update [base_vhd_catalog](scenario/base_vhd_catalog.json).json with a new entry, referencing the resource ID of the new VHD built in the previous step, as well as the VHD's artifact name. The artifact name is used when downloading publishing info artifacts from VHD builds in ADO. To determine this value:
+    1. Navigate to the latest run of the `[TEST All VHDs] AKS Linux VHD Build - Msft Tenant` build which has built the SKU you'd like to register (or queue a new build which includes the particular SKU).
+    2. Navigate to the particular run's published artifacts and identitfy the `publishing-info-<artifactName>` artifact for your SKU. The suffix of this string after `publishing-info-` is the name of the artifact.
+    3. Alternatively, you can get this value from navigating to [.vsts-vhd-builder-release.yaml](../.pipelines/.vsts-vhd-builder-release.yaml), identifying the corresponding build stage for your SKU, and looking at the value of `artifactName` specified when calling the `.builder-release-template.yaml` template.
+3. Within [scenario/vhd.go](scenario/vhd.go), update the corresponding subcatalog struct (e.g. `Ubuntu2204`, `AzureLinuxV2`) with the new entry, and correctly add its corresponding JSON tag used to unmarshal from base_vhd_catalog.json
+4. Also within scenario/vhd.go, add a corresponding case block to the switch statement within `addEntryFromPublishingInfo()` to make sure the VHD's name (parsed from the publishing info file) is associated with the new subcatalog entry added in the previous step - this is to ensure that catalog entries are properly overwritten when using VHDs from arbitrary testing builds
+5. Add a new `VHDSelector` within scenario/vhd.go in the form of a method on the `*VHDCatalog` type, which returns the new entry of the given subcatalog added in step 3
+6. Reference the new `VHDSelector` added in the previous step when defining the new E2E scenario(s).
 
-**If you decide to update some or all of these SIG versions, you need to make sure to add delete locks to each one via the Azure Portal so they don't get automatically deleted and eventually cause failuires**
+Example PR: TODO(cameissner)
 
 ## Scenarios