Skip to content

Commit a5fb524

Browse files
cameronmeissnerCameron Meissner
and
Cameron Meissner
authoredDec 12, 2023
feat: run abe2e off of arbitrary VHD builds (#3845)
Co-authored-by: Cameron Meissner <[email protected]>

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+1023
-433
lines changed
 

‎.pipelines/e2e.yaml

+4-1
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,11 @@ jobs:
4040
cd e2e
4141
go test -timeout 45m -v -run Test_All ./
4242
displayName: Run AgentBaker E2E
43+
env:
44+
VHD_BUILD_ID: $(VHD_BUILD_ID)
45+
ADO_PAT: $(ADO_PAT)
4346
- publish: $(System.DefaultWorkingDirectory)/e2e/scenario-logs
4447
artifact: scenario-logs
4548
condition: always()
4649

47-
50+

‎e2e/README.md

+61-13
Original file line numberDiff line numberDiff line change
@@ -6,34 +6,33 @@ E2E testing for Linux is currently implemented using a Golang framework built fr
66

77
The goal of E2E testing with AgentBaker is to ensure that the node bootstrapping artifacts generted and returned by the primary AgentBaker API not only contain *expected* content, but also contain *correct* content that can be used as-is to bootstrap real Azure VMs so they can join real AKS clusters.
88

9-
From a high-level, each E2E scenario makes a call out to the primary node-bootstrapping API [GetLatestNodeBootstrapping](https://github.com/Azure/AgentBaker/blob/2e730b5a498c5be9b082d912fd08ac9346582db9/pkg/agent/bakerapi.go#L14) with a set of parameters (represented by a NodeBootstrappingConfiugration) which define the given scenario to generate CSE and custom data. A new VMSS containing a single VM will then be created and associated with an AKS cluster that is already running in the Azure. The CSE and custom data generated by AgentBaker will then be applied to the new VM such that it can be properly bootstrapped and register itself with the apiserver of the running cluster. Liveness and health checks and then run to make sure the new VM's kubelet is posting NodeReady to the cluster's apiserver, and that workload pods can successfully be run on it. Lastly, a set of validation commands are remotely executed on the VM after it has successfully been bootstrapped to ensure that its live state (file existsnce, sysctl settings, etc.) is as expected.
9+
From a high-level, each E2E scenario makes a call out to the primary node-bootstrapping API [GetLatestNodeBootstrapping](https://github.com/Azure/AgentBaker/blob/2e730b5a498c5be9b082d912fd08ac9346582db9/pkg/agent/bakerapi.go#L14) with a set of parameters (represented by a NodeBootstrappingConfiugration) which define the given scenario to generate CSE and custom data. A new VMSS containing a single VM will then be created and associated with an AKS cluster that is already running in Azure. The CSE and custom data generated by AgentBaker will then be applied to the new VM so it can bootstrap and register itself with the apiserver of the running cluster. Liveness and health checks and then run to make sure the new VM's kubelet is posting NodeReady to the cluster's apiserver, and that workload pods can successfully be run on it. Lastly, a set of validation commands are remotely executed on the VM to ensure its live state (file existsnce, sysctl settings, etc.) is as expected.
1010

1111
## Running Locally
1212

1313
**Note: if you have changed code or artifacts used to generate custom data or custom script extension payloads, you should first run `make generate` from the root of the AgentBaker repository.**
1414

15-
To run the Go implementation of the E2E test suite locally, simply use `e2e-local.sh`. This script will setup the call to `go test` for you while also implementing default logic for a set of required environment variables used to interact with Azure. These required environment variables include:
15+
To run the Go implementation of the E2E test suite locally, simply use `e2e-local.sh`. This script will setup the `go test` command for you while also implementing defaulting logic for a set of required environment variables used to interact with Azure. These environment variables include:
1616

17-
- `SUBSCRIPTION_ID` - default `8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8`
18-
- `RESOURCE_GROUP_NAME` - defualt: `agentbaker-e2e-tests`
17+
- `SUBSCRIPTION_ID` - default `8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8` (ACS Test Subscription)
1918
- `LOCATION` - default: `eastus`
20-
- `CLUSTER_NAME` - default `agentbaker-e2e-test-cluster`
2119
- `AZURE_TENANT_ID` - default: `72f988bf-86f1-41af-91ab-2d7cd011db47`
2220

21+
<br>
22+
2323
`SCENARIOS_TO_RUN` may also optionally be set to specify a subset of the E2E scenarios to run during the testing session as a comma-separated list, for example:
2424

2525
```bash
26-
SCENARIOS_TO_RUN=base,gpu ./e2e-local.sh
26+
SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
2727
```
2828

2929
Furthermore, `SCENARIOS_TO_EXCLUDE` may also optionally be set to specify the set of scenarios which will be excluded from the testing session as a commma-separated list. If both `SCENARIOS_TO_RUN` and `SCENARIOS_TO_EXCLUDE` are specified, `SCENARIOS_TO_RUN` will take precedence.
3030

31-
`KEEP_VMSS` can also be optionally specified to have the test suite retain the bootstrapped VMSS VMs for further debugging. When this option is specified, the private SSH key used to bootstrap the VMs will be included within each scenario's log bundle.
32-
NOTE: if this option is specified please make sure to manually delete your bootstrapped VMs later. Though, all bootstrapped VMs will eventually be deleted by the ACS test GC regardless.
31+
`KEEP_VMSS` can also be optionally specified to have the test suite retain the bootstrapped VM(s) for further debugging. When this option is specified, the private SSH key used to connect to each VM will be included within each scenario's log bundle respectively.
3332

34-
**Note that when using `e2e-local.sh`, a timeout value of 30 minutes is applied to the `go test` command.**
33+
**Note that when using `e2e-local.sh`, a timeout value of 45 minutes is applied to the `go test` command.**
3534

36-
You may also run the test command yourself assuming you've properly setup the required environment variables like so:
35+
You may also run the test command with custom arguments yourself (assuming you've properly setup the required environment variables) from within the `e2e/` directory like so:
3736

3837
```bash
3938
go test -timeout 30m -v -run Test_All ./
@@ -47,10 +46,59 @@ The `e2e_test` package has a dependency on subpackage located in the [scenario](
4746

4847
The primary testing function is located in [suite_test.go](suite_test.go), which is run by `go test ...`.
4948

50-
## Updating the Test Images
51-
The [images.go](scenario/images.go) file contains the hard-coded references to a set of delete-locked SIG versions used by the e2e scenarios.
49+
## E2E VHDs
50+
When configuring E2E scenarios, a `VHDSelector` must be specified in order to tell the suite which particular VHD it should use to bootstrap the VM.
51+
52+
`VHDSelector`s select from a "base" VHD catalog, initialized from [scenario/base_vhd_catalog.json](scenario/base_vhd_catalog.json) as an embedding. Each entry in the catalog is represented as a `VHD`, which contains a resource ID that gets injected into the VMSS model when the given scenario is ran. The aforementioned JSON file contains configurations for the current set of default catalog entries. At any given time, those default entries will point to VHDs stored within our testing subscription, guarded by resouce deletion locks.
53+
54+
For example, [scenario_ubuntu2204.go](scenario/scenario_ubuntu2204.go) defines the Ubuntu 2204 scenario, which specifies the `Ubuntu2204Gen2Containerd` VHD selector. This selector will always select the Ubuntu2204/gen2 VHD catalog entry from the base catalog. If running the suite using some arbitrary VHD build for testing, then the selector will take the corresponding Ubuntu2204/gen2 VHD from the given build instead of the default entry.
55+
56+
57+
### Updating Default Catalog Entries
58+
To update the set of default VHD catalog entries to point towards new VHDs, simply update the `resourceId` field of the respective VHD within [scenario/base_vhd_catalog.json](scenario/base_vhd_catalog.json). If you're making this change as a part of a PR, you need to make sure to lock the new VHDs with resource deletion locks to ensure they're always available going forward. Note that if you run the suite in a region other than eastus, you'll need to make sure the VHDs you point the suite towards are appropriately replicated in the given region as well.
59+
60+
### Using Arbitrary VHD Builds
61+
If you'd like to run the E2E suite using a set of VHDs built from some arbitrary run of the VHD build pipeline in the MSFT tenant, you can do so by specifying the ID of the build. This is an alternative to manually updating the set of default VHD catalog entries. If a given scenario is ran which selects a VHD that was not built as a part of the specified VHD build, the selector will select the corresponding default catalog entry instead.
62+
63+
To use a build, simply specify its ID using the `VHD_BUILD_ID` environment variable like so:
64+
65+
```bash
66+
VHD_BUILD_ID=123456789 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
67+
```
68+
69+
***NOTE: To utilize this feature, you'll also need to provide the suite with an ADO PAT (personal access token) with which it can access the ADO resources to download the appropriate build artifacts.***
70+
71+
To specify your PAT, simply set the `ADO_PAT` environment variable accordingly:
72+
73+
```bash
74+
ADO_PAT=<secret> VHD_BUILD_ID=123456789 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
75+
```
76+
77+
or:
78+
79+
```bash
80+
export ADO_PAT=<secret>
81+
VHD_BUILD_ID=123456789 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
82+
VHD_BUILD_ID=234567891 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
83+
...
84+
VHD_BUILD_ID=345678912 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
85+
```
86+
87+
88+
### Registering New VHD SKUs for E2E Testing
89+
When adding a new scenario which uses a VHD that doesn't currently have an associated entry in the base catalog, please make sure to follow these steps to register it with the suite:
90+
91+
1. Build and delete-lock the underlying image version to be referenced in the base catalog
92+
2. Update [base_vhd_catalog](scenario/base_vhd_catalog.json).json with a new entry, referencing the resource ID of the new VHD built in the previous step, as well as the VHD's artifact name. The artifact name is used when downloading publishing info artifacts from VHD builds in ADO. To determine this value:
93+
1. Navigate to the latest run of the `[TEST All VHDs] AKS Linux VHD Build - Msft Tenant` build which has built the SKU you'd like to register (or queue a new build which includes the particular SKU).
94+
2. Navigate to the particular run's published artifacts and identitfy the `publishing-info-<artifactName>` artifact for your SKU. The suffix of this string after `publishing-info-` is the name of the artifact.
95+
3. Alternatively, you can get this value from navigating to [.vsts-vhd-builder-release.yaml](../.pipelines/.vsts-vhd-builder-release.yaml), identifying the corresponding build stage for your SKU, and looking at the value of `artifactName` specified when calling the `.builder-release-template.yaml` template.
96+
3. Within [scenario/vhd.go](scenario/vhd.go), update the corresponding subcatalog struct (e.g. `Ubuntu2204`, `AzureLinuxV2`) with the new entry, and correctly add its corresponding JSON tag used to unmarshal from base_vhd_catalog.json
97+
4. Also within scenario/vhd.go, add a corresponding case block to the switch statement within `addEntryFromPublishingInfo()` to make sure the VHD's name (parsed from the publishing info file) is associated with the new subcatalog entry added in the previous step - this is to ensure that catalog entries are properly overwritten when using VHDs from arbitrary testing builds
98+
5. Add a new `VHDSelector` within scenario/vhd.go in the form of a method on the `*VHDCatalog` type, which returns the new entry of the given subcatalog added in step 3
99+
6. Reference the new `VHDSelector` added in the previous step when defining the new E2E scenario(s).
52100

53-
**If you decide to update some or all of these SIG versions, you need to make sure to add delete locks to each one via the Azure Portal so they don't get automatically deleted and eventually cause failuires**
101+
Example PR: TODO(cameissner)
54102

55103
## Scenarios
56104

0 commit comments

Comments
 (0)
Please sign in to comment.