-
Notifications
You must be signed in to change notification settings - Fork 100
Create top-level AKS folder and SKILL.md #1029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
julia-yin
wants to merge
43
commits into
microsoft:main
Choose a base branch
from
julia-yin:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
b10b649
Create AKS folder and SKILL.md
julia-yin 8ff360f
Add azure-kubernetes to skill.json
julia-yin cc20821
Update skills.json
julia-yin 769631d
Fix issue of postgres skill missing from skills.json
julia-yin c37988c
Fix skills.json
julia-yin 0a6efd5
Add AKS to architecture.md and testing for AKS skill
julia-yin eb5c9a2
Update SKILL.md
julia-yin 31523df
Update plugin/skills/azure-kubernetes/SKILL.md
julia-yin be258cd
Remove trailing empty lines
julia-yin 295a9ed
Add AKS to integration test schedule
julia-yin 2c1d3a2
Fix pr.yaml creating leading space
julia-yin 7d79459
Update SKILL.md
julia-yin afc8b05
Update triggers.test.ts.snap
julia-yin 0d6e6ef
Add in missing best practices (ephemeral disk, auto upgrades, reliabi…
julia-yin 2e15f00
Add security best practices
julia-yin bd1e5e5
Streamline and reduce token count
julia-yin 8d8b187
Add azure-kubernetes to skills.json
julia-yin cfc9cc0
Fix naming issues
julia-yin 9d12083
Update trigger and unit tests
julia-yin 14540ae
Bump azure-prepare version to 1.0.1
julia-yin f6fcda1
Fix metadata.version
julia-yin f6dc996
Add metadata to azure-kubernetes skill
julia-yin 453c478
Apply suggestion from @Copilot
julia-yin 0436b19
Apply suggestion from @Copilot
julia-yin c65c53f
Bump azure-prepare skill version
julia-yin 1bf5c35
Revert pr.yml
julia-yin 3e578e0
Add license to AKS skill
julia-yin 9144b03
Add back azure-prepare description
julia-yin 3294051
Bump azure-prepare version to 1.0.4
julia-yin 4c6c56a
Add back license to azure-prepare
julia-yin 02fa831
Fix description
julia-yin ad0d1f3
Merge branch 'main' into main
julia-yin 6dbda36
Fix Copilot feedback
julia-yin 55bd882
Merge branch 'main' of https://github.com/julia-yin/GitHub-Copilot-fo…
julia-yin 078bd9d
Update description and tests
julia-yin 244f5dd
Remove AKS MCP and add in kubectl commands to AKS skill
julia-yin 91839ec
Merge branch 'main' into main
julia-yin c1b9455
Add skip reason + fix typo in integration tests
julia-yin 0e69ae6
Merge branch 'main' of https://github.com/julia-yin/GitHub-Copilot-fo…
julia-yin b8d7141
Fixed Copilot comments + bumped up version for azure prepare skill
julia-yin b3e960b
Update integration.test.ts
julia-yin b7ce4d2
Update integration.test.ts
julia-yin fc839ba
Revert agent-runner.ts changes from c1b9455 (debug timing logs + Perm…
julia-yin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| --- | ||
| name: azure-kubernetes | ||
| license: MIT | ||
| metadata: | ||
| author: Microsoft | ||
| version: "1.0.0" | ||
| description: "Plan, create, and configure production-ready Azure Kubernetes Service (AKS) clusters. Covers Day-0 checklist, SKU selection (Automatic vs Standard), networking options (private API server, Azure CNI Overlay, egress configuration), security (workload identity, Azure Policy, Key Vault CSI driver, Deployment Safeguards), and operations (monitoring, upgrade strategy, autoscaling, cost analysis, node pools). WHEN: provision AKS cluster, design AKS networking, choose AKS SKU, secure AKS, set up AKS." | ||
| --- | ||
julia-yin marked this conversation as resolved.
Show resolved
Hide resolved
julia-yin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # Azure Kubernetes Service | ||
|
|
||
| > **AUTHORITATIVE GUIDANCE — MANDATORY COMPLIANCE** | ||
| > | ||
| > This skill produces a **recommended AKS cluster configuration** based on user requirements, distinguishing **Day-0 decisions** (networking, API server — hard to change later) from **Day-1 features** (can enable post-creation). See [CLI reference](./references/cli-reference.md) for commands. | ||
|
|
||
julia-yin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ## Quick Reference | ||
| | Property | Value | | ||
| |----------|-------| | ||
| | Best for | AKS cluster planning and Day-0 decisions | | ||
| | MCP Tools | `mcp_azure_mcp_aks` | | ||
| | CLI | `az aks create`, `az aks show`, `kubectl get`, `kubectl describe` | | ||
| | Related skills | azure-diagnostics (troubleshooting AKS), azure-deploy (app deployment) | | ||
|
|
||
julia-yin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ## When to Use This Skill | ||
julia-yin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Activate this skill when user wants to: | ||
| - Create a new AKS cluster | ||
| - Plan AKS cluster configuration for production workloads | ||
| - Design AKS networking (API server access, pod IP model, egress) | ||
julia-yin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - Set up AKS identity and secrets management | ||
| - Configure AKS governance (Azure Policy, Deployment Safeguards) | ||
| - Enable AKS observability (monitoring, Prometheus, Grafana) | ||
| - Define AKS upgrade and patching strategy | ||
| - Enable AKS cost visibility and analysis | ||
| - Understand AKS Automatic vs Standard SKU differences | ||
| - Get a Day-0 checklist for AKS cluster setup and configuration | ||
|
|
||
| ## Rules | ||
| 1. Start with the user's requirements for provisioning compute, networking, security, and other settings. | ||
| 2. Use the `azure` MCP server and its AKS-related MCP tools (`mcp_azure_mcp_aks`, `mcp_aks_mcp_az_aks_operations`) to invoke Azure APIs and perform AKS and kubectl operations; fall back to Azure CLI (`az aks`) only when required functionality is not available via MCP tools. | ||
| 3. Determine if AKS Automatic or Standard SKU is more appropriate based on the user's need for control vs convenience. Default to AKS Automatic unless specific customizations are required. | ||
| 4. Document decisions and rationale for cluster configuration choices, especially for Day-0 decisions that are hard to change later (networking, API server access). | ||
julia-yin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
julia-yin marked this conversation as resolved.
Show resolved
Hide resolved
julia-yin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Required Inputs (Ask only what’s needed) | ||
| If the user is unsure, use safe defaults. | ||
| - Cluster environment: dev/test or production | ||
| - Region(s), availability zones, preferred node VM sizes | ||
| - Expected scale (node/cluster count, workload size) | ||
| - Networking requirements (API server access, pod IP model, ingress/egress control) | ||
| - Security and identity requirements, including image registry | ||
| - Upgrade and observability preferences | ||
| - Cost constraints | ||
|
|
||
| ## Workflow | ||
|
|
||
| ### 1. Cluster Type | ||
| - **AKS Automatic** (default): Best for most production workloads, provides a curated experience with pre-configured best practices for security, reliability, and performance. Use unless you have specific custom requirements for networking, autoscaling, or node pool configurations not supported by Node Auto-Provisioning (NAP). | ||
| - **AKS Standard**: Use if you need full control over cluster configuration, will require additional overhead to set up and manage. | ||
|
|
||
| ### 2. Networking (Pod IP, Egress, Ingress, Dataplane) | ||
|
|
||
| **Pod IP Model** (Key Day-0 decision): | ||
| - **Azure CNI Overlay** (recommended): pod IPs from private overlay range, not VNet-routable, scales to large clusters and good for most workloads | ||
| - **Azure CNI (VNet-routable)**: pod IPs directly from VNet (pod subnet or node subnet), use when pods must be directly addressable from VNet or on-prem | ||
| - Docs: https://learn.microsoft.com/azure/aks/azure-cni-overlay | ||
|
|
||
| **Dataplane & Network Policy**: | ||
| - **Azure CNI powered by Cilium** (recommended): eBPF-based for high-performance packet processing, network policies, and observability | ||
|
|
||
| **Egress**: | ||
| - **Static Egress Gateway** for stable, predictable outbound IPs | ||
| - For restricted egress: UDR + Azure Firewall or NVA | ||
|
|
||
| **Ingress**: | ||
| - **App Routing addon with Gateway API** — recommended default for HTTP/HTTPS workloads | ||
| - **Istio service mesh with Gateway API** — for advanced traffic management, mTLS, canary deployments | ||
| - **Application Gateway for Containers** — for L7 load balancing with WAF integration | ||
|
|
||
| **DNS**: | ||
| - Enable **LocalDNS** on all node pools for reliable, performant DNS resolution | ||
|
|
||
| ### 3. Security | ||
| - Use **Microsoft Entra ID** everywhere (control plane, Workload Identity for pods, node access). Avoid static credentials. | ||
| - Azure Key Vault via **Secrets Store CSI Driver** for secrets | ||
| - Enable **Azure Policy** + **Deployment Safeguards** | ||
| - Enable **Encryption at rest** for etcd/API server; **in-transit** for node-to-node | ||
| - Allow only signed, policy-approved images (Azure Policy + Ratify), prefer **Azure Container Registry** | ||
| - **Isolation**: Use namespaces, network policies, scoped logging | ||
|
|
||
| ### 4. Observability | ||
| - Use Azure Monitor and Container Insights for AKS monitoring enablement (logs + Prometheus + Grafana). | ||
|
|
||
| ### 5. Upgrades & Patching | ||
| - Configure **Maintenance Windows** for controlled upgrade timing | ||
| - Enable **auto-upgrades** for cluster and node OS to stay up-to-date with security patches and Kubernetes versions | ||
| - Consider **LTS versions** for enterprise stability (2-year support) by upgrading your cluster to the AKS Premium tier | ||
| - **Multi-cluster upgrades**: Use **AKS Fleet Manager** for staged rollout across test → production clusters | ||
|
|
||
| ### 6. Performance | ||
| - Use **Ephemeral OS disks** (`--node-osdisk-type Ephemeral`) for faster node startup | ||
| - Select **Azure Linux** as node OS (smaller footprint, faster boot) | ||
| - Enable **KEDA** for event-driven autoscaling beyond HPA | ||
julia-yin marked this conversation as resolved.
Show resolved
Hide resolved
julia-yin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ### 7. Node Pools & Compute | ||
| - **Dedicated system node pool**: At least 2 nodes, tainted for system workloads only (`CriticalAddonsOnly`) | ||
| - Enable **Node Auto Provisioning (NAP)** on all pools for cost savings and responsive scaling | ||
| - Use **latest generation SKUs (v5/v6)** for host-level optimizations | ||
| - **Avoid B-series VMs** — burstable SKUs cause performance/reliability issues | ||
| - Use SKUs with **at least 4 vCPUs** for production workloads | ||
| - Set **topology spread constraints** to distribute pods across hosts/zones per SLO | ||
|
|
||
| ### 8. Reliability | ||
| - Deploy across **3 Availability Zones** (`--zones 1 2 3`) | ||
julia-yin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - Use **Standard tier** for zone-redundant control plane + 99.95% SLA for API server availability | ||
| - Enable **Microsoft Defender for Containers** for runtime protection | ||
| - Configure **PodDisruptionBudgets** for all production workloads | ||
| - Use **topology spread constraints** to ensure pod distribution across failure domains | ||
|
|
||
| ### 9. Cost Controls | ||
| - Use **Spot node pools** for batch/interruptible workloads (up to 90% savings) | ||
| - **Stop/Start** dev/test clusters: `az aks stop/start` | ||
| - Consider **Reserved Instances** or **Savings Plans** for steady-state workloads | ||
julia-yin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Guardrails / Safety | ||
| - Do not request or output secrets (tokens, keys, subscription IDs). | ||
| - If requirements are ambiguous for day-0 critical decisions, ask the user clarifying questions. For day-1 enabled features, propose 2–3 safe options with tradeoffs and choose a conservative default. | ||
| - Do not promise zero downtime; advise workload safeguards (PDBs, probes, replicas) and staged upgrades along with best practices for reliability and performance. | ||
|
|
||
| ## MCP Tools | ||
| | Tool | Purpose | Key Parameters | | ||
| |------|---------|----------------| | ||
| | `mcp_azure_mcp_aks` | Create and query AKS clusters at subscription scope | `subscription_id`, `resource_group` | | ||
|
|
||
| ## Error Handling | ||
| | Error / Symptom | Likely Cause | Remediation | | ||
| |-----------------|--------------|-------------| | ||
| | MCP tool call fails or times out | Invalid credentials, subscription, or cluster context | Verify `az login`, check subscription ID and resource group | | ||
| | Quota exceeded | Regional vCPU or resource limits | Request quota increase or select different region/VM SKU | | ||
| | Networking conflict (IP exhaustion) | Pod subnet too small for overlay/CNI | Re-plan IP ranges; may require cluster recreation (Day-0) | | ||
| | Workload Identity not working | Missing OIDC issuer or federated credential | Enable `--enable-oidc-issuer --enable-workload-identity`, configure federated identity | | ||
33 changes: 33 additions & 0 deletions
33
plugin/skills/azure-kubernetes/references/cli-reference.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| # CLI Reference for AKS | ||
|
|
||
| ```bash | ||
| # List AKS clusters | ||
| az aks list --output table | ||
|
|
||
| # Show cluster details | ||
| az aks show --name <cluster-name> --resource-group <resource-group> | ||
|
|
||
| # Get available Kubernetes versions | ||
| az aks get-versions --location <location> --output table | ||
|
|
||
| # Create AKS Automatic cluster | ||
| az aks create --name <cluster-name> --resource-group <resource-group> --sku automatic \ | ||
| --network-plugin azure --network-plugin-mode overlay \ | ||
| --enable-oidc-issuer --enable-workload-identity | ||
|
|
||
| # Create AKS Standard cluster | ||
| az aks create --name <cluster-name> --resource-group <resource-group> \ | ||
| --node-count 3 --zones 1 2 3 \ | ||
| --network-plugin azure --network-plugin-mode overlay \ | ||
| --enable-cluster-autoscaler --min-count 1 --max-count 10 | ||
|
|
||
| # Get credentials | ||
| az aks get-credentials --name <cluster-name> --resource-group <resource-group> | ||
|
|
||
| # List node pools | ||
| az aks nodepool list --cluster-name <cluster-name> --resource-group <resource-group> --output table | ||
|
|
||
| # Enable monitoring | ||
| az aks enable-addons --name <cluster-name> --resource-group <resource-group> \ | ||
| --addons monitoring --workspace-resource-id <workspace-resource-id> | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
125 changes: 125 additions & 0 deletions
125
tests/azure-kubernetes/__snapshots__/triggers.test.ts.snap
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,125 @@ | ||
| // Jest Snapshot v1, https://goo.gl/fbAQLP | ||
|
|
||
| exports[`azure-kubernetes - Trigger Tests Trigger Keywords Snapshot skill description triggers match snapshot 1`] = ` | ||
| { | ||
| "description": "Plan, create, and configure production-ready Azure Kubernetes Service (AKS) clusters. Covers Day-0 checklist, SKU selection (Automatic vs Standard), networking options (private API server, Azure CNI Overlay, egress configuration), security (workload identity, Azure Policy, Key Vault CSI driver, Deployment Safeguards), and operations (monitoring, upgrade strategy, autoscaling, cost analysis, node pools). WHEN: provision AKS cluster, design AKS networking, choose AKS SKU, secure AKS, set up AKS.", | ||
| "extractedKeywords": [ | ||
| "aks", | ||
| "analysis", | ||
| "automatic", | ||
| "autoscaling", | ||
| "azure", | ||
| "checklist", | ||
| "choose", | ||
| "cli", | ||
| "cluster", | ||
| "clusters", | ||
| "configuration", | ||
| "configure", | ||
| "container", | ||
| "cost", | ||
| "covers", | ||
| "create", | ||
| "day-0", | ||
| "deploy", | ||
| "deployment", | ||
| "design", | ||
| "diagnostic", | ||
| "driver", | ||
| "egress", | ||
| "entra", | ||
| "function", | ||
| "identity", | ||
| "key vault", | ||
| "kubernetes", | ||
| "mcp", | ||
| "monitor", | ||
| "monitoring", | ||
| "networking", | ||
| "node", | ||
| "observability", | ||
| "operations", | ||
| "options", | ||
| "overlay", | ||
| "plan", | ||
| "policy", | ||
| "pools", | ||
| "private", | ||
| "production-ready", | ||
| "provision", | ||
| "safeguards", | ||
| "secure", | ||
| "security", | ||
| "selection", | ||
| "server", | ||
| "service", | ||
| "standard", | ||
| "strategy", | ||
| "upgrade", | ||
| "vault", | ||
| "when", | ||
| "workload", | ||
| ], | ||
| "name": "azure-kubernetes", | ||
| } | ||
| `; | ||
|
|
||
| exports[`azure-kubernetes - Trigger Tests Trigger Keywords Snapshot skill keywords match snapshot 1`] = ` | ||
| [ | ||
| "aks", | ||
| "analysis", | ||
| "automatic", | ||
| "autoscaling", | ||
| "azure", | ||
| "checklist", | ||
| "choose", | ||
| "cli", | ||
| "cluster", | ||
| "clusters", | ||
| "configuration", | ||
| "configure", | ||
| "container", | ||
| "cost", | ||
| "covers", | ||
| "create", | ||
| "day-0", | ||
| "deploy", | ||
| "deployment", | ||
| "design", | ||
| "diagnostic", | ||
| "driver", | ||
| "egress", | ||
| "entra", | ||
| "function", | ||
| "identity", | ||
| "key vault", | ||
| "kubernetes", | ||
| "mcp", | ||
| "monitor", | ||
| "monitoring", | ||
| "networking", | ||
| "node", | ||
| "observability", | ||
| "operations", | ||
| "options", | ||
| "overlay", | ||
| "plan", | ||
| "policy", | ||
| "pools", | ||
| "private", | ||
| "production-ready", | ||
| "provision", | ||
| "safeguards", | ||
| "secure", | ||
| "security", | ||
| "selection", | ||
| "server", | ||
| "service", | ||
| "standard", | ||
| "strategy", | ||
| "upgrade", | ||
| "vault", | ||
| "when", | ||
| "workload", | ||
| ] | ||
| `; |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.