Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
b10b649
Create AKS folder and SKILL.md
julia-yin Feb 25, 2026
8ff360f
Add azure-kubernetes to skill.json
julia-yin Feb 25, 2026
cc20821
Update skills.json
julia-yin Feb 25, 2026
769631d
Fix issue of postgres skill missing from skills.json
julia-yin Feb 25, 2026
c37988c
Fix skills.json
julia-yin Feb 25, 2026
0a6efd5
Add AKS to architecture.md and testing for AKS skill
julia-yin Feb 27, 2026
eb5c9a2
Update SKILL.md
julia-yin Feb 28, 2026
31523df
Update plugin/skills/azure-kubernetes/SKILL.md
julia-yin Feb 28, 2026
be258cd
Remove trailing empty lines
julia-yin Feb 28, 2026
295a9ed
Add AKS to integration test schedule
julia-yin Feb 28, 2026
2c1d3a2
Fix pr.yaml creating leading space
julia-yin Feb 28, 2026
7d79459
Update SKILL.md
julia-yin Feb 28, 2026
afc8b05
Update triggers.test.ts.snap
julia-yin Feb 28, 2026
0d6e6ef
Add in missing best practices (ephemeral disk, auto upgrades, reliabi…
julia-yin Mar 2, 2026
2e15f00
Add security best practices
julia-yin Mar 2, 2026
bd1e5e5
Streamline and reduce token count
julia-yin Mar 2, 2026
8d8b187
Add azure-kubernetes to skills.json
julia-yin Mar 2, 2026
cfc9cc0
Fix naming issues
julia-yin Mar 2, 2026
9d12083
Update trigger and unit tests
julia-yin Mar 2, 2026
14540ae
Bump azure-prepare version to 1.0.1
julia-yin Mar 3, 2026
f6fcda1
Fix metadata.version
julia-yin Mar 3, 2026
f6dc996
Add metadata to azure-kubernetes skill
julia-yin Mar 3, 2026
453c478
Apply suggestion from @Copilot
julia-yin Mar 3, 2026
0436b19
Apply suggestion from @Copilot
julia-yin Mar 3, 2026
c65c53f
Bump azure-prepare skill version
julia-yin Mar 4, 2026
1bf5c35
Revert pr.yml
julia-yin Mar 5, 2026
3e578e0
Add license to AKS skill
julia-yin Mar 5, 2026
9144b03
Add back azure-prepare description
julia-yin Mar 5, 2026
3294051
Bump azure-prepare version to 1.0.4
julia-yin Mar 5, 2026
4c6c56a
Add back license to azure-prepare
julia-yin Mar 5, 2026
02fa831
Fix description
julia-yin Mar 5, 2026
ad0d1f3
Merge branch 'main' into main
julia-yin Mar 9, 2026
6dbda36
Fix Copilot feedback
julia-yin Mar 9, 2026
55bd882
Merge branch 'main' of https://github.com/julia-yin/GitHub-Copilot-fo…
julia-yin Mar 9, 2026
078bd9d
Update description and tests
julia-yin Mar 10, 2026
244f5dd
Remove AKS MCP and add in kubectl commands to AKS skill
julia-yin Mar 11, 2026
91839ec
Merge branch 'main' into main
julia-yin Mar 11, 2026
c1b9455
Add skip reason + fix typo in integration tests
julia-yin Mar 11, 2026
0e69ae6
Merge branch 'main' of https://github.com/julia-yin/GitHub-Copilot-fo…
julia-yin Mar 11, 2026
b8d7141
Fixed Copilot comments + bumped up version for azure prepare skill
julia-yin Mar 11, 2026
b3e960b
Update integration.test.ts
julia-yin Mar 11, 2026
b7ce4d2
Update integration.test.ts
julia-yin Mar 12, 2026
fc839ba
Revert agent-runner.ts changes from c1b9455 (debug timing logs + Perm…
julia-yin Mar 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 140 additions & 0 deletions plugin/skills/azure-kubernetes/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
name: azure-kubernetes
license: MIT
metadata:
author: Microsoft
version: "1.0.0"
description: "Plan, create, and configure production-ready Azure Kubernetes Service (AKS) clusters. Covers Day-0 checklist, SKU selection (Automatic vs Standard), networking options (private API server, Azure CNI Overlay, egress configuration), security (workload identity, Azure Policy, Key Vault CSI driver, Deployment Safeguards), and operations (monitoring, upgrade strategy, autoscaling, cost analysis, node pools). WHEN: provision AKS cluster, design AKS networking, choose AKS SKU, secure AKS, set up AKS."
---

# Azure Kubernetes Service

> **AUTHORITATIVE GUIDANCE — MANDATORY COMPLIANCE**
>
> This skill produces a **recommended AKS cluster configuration** based on user requirements, distinguishing **Day-0 decisions** (networking, API server — hard to change later) from **Day-1 features** (can enable post-creation). See [CLI reference](./references/cli-reference.md) for commands.

## Quick Reference
| Property | Value |
|----------|-------|
| Best for | AKS cluster planning and Day-0 decisions |
| MCP Tools | `mcp_azure_mcp_aks` |
| CLI | `az aks create`, `az aks show`, `kubectl get`, `kubectl describe` |
| Related skills | azure-diagnostics (troubleshooting AKS), azure-deploy (app deployment) |

## When to Use This Skill
Activate this skill when user wants to:
- Create a new AKS cluster
- Plan AKS cluster configuration for production workloads
- Design AKS networking (API server access, pod IP model, egress)
- Set up AKS identity and secrets management
- Configure AKS governance (Azure Policy, Deployment Safeguards)
- Enable AKS observability (monitoring, Prometheus, Grafana)
- Define AKS upgrade and patching strategy
- Enable AKS cost visibility and analysis
- Understand AKS Automatic vs Standard SKU differences
- Get a Day-0 checklist for AKS cluster setup and configuration

## Rules
1. Start with the user's requirements for provisioning compute, networking, security, and other settings.
2. Use the `azure` MCP server and its AKS-related MCP tools (`mcp_azure_mcp_aks`, `mcp_aks_mcp_az_aks_operations`) to invoke Azure APIs and perform AKS and kubectl operations; fall back to Azure CLI (`az aks`) only when required functionality is not available via MCP tools.
3. Determine if AKS Automatic or Standard SKU is more appropriate based on the user's need for control vs convenience. Default to AKS Automatic unless specific customizations are required.
4. Document decisions and rationale for cluster configuration choices, especially for Day-0 decisions that are hard to change later (networking, API server access).


## Required Inputs (Ask only what’s needed)
If the user is unsure, use safe defaults.
- Cluster environment: dev/test or production
- Region(s), availability zones, preferred node VM sizes
- Expected scale (node/cluster count, workload size)
- Networking requirements (API server access, pod IP model, ingress/egress control)
- Security and identity requirements, including image registry
- Upgrade and observability preferences
- Cost constraints

## Workflow

### 1. Cluster Type
- **AKS Automatic** (default): Best for most production workloads, provides a curated experience with pre-configured best practices for security, reliability, and performance. Use unless you have specific custom requirements for networking, autoscaling, or node pool configurations not supported by Node Auto-Provisioning (NAP).
- **AKS Standard**: Use if you need full control over cluster configuration, will require additional overhead to set up and manage.

### 2. Networking (Pod IP, Egress, Ingress, Dataplane)

**Pod IP Model** (Key Day-0 decision):
- **Azure CNI Overlay** (recommended): pod IPs from private overlay range, not VNet-routable, scales to large clusters and good for most workloads
- **Azure CNI (VNet-routable)**: pod IPs directly from VNet (pod subnet or node subnet), use when pods must be directly addressable from VNet or on-prem
- Docs: https://learn.microsoft.com/azure/aks/azure-cni-overlay

**Dataplane & Network Policy**:
- **Azure CNI powered by Cilium** (recommended): eBPF-based for high-performance packet processing, network policies, and observability

**Egress**:
- **Static Egress Gateway** for stable, predictable outbound IPs
- For restricted egress: UDR + Azure Firewall or NVA

**Ingress**:
- **App Routing addon with Gateway API** — recommended default for HTTP/HTTPS workloads
- **Istio service mesh with Gateway API** — for advanced traffic management, mTLS, canary deployments
- **Application Gateway for Containers** — for L7 load balancing with WAF integration

**DNS**:
- Enable **LocalDNS** on all node pools for reliable, performant DNS resolution

### 3. Security
- Use **Microsoft Entra ID** everywhere (control plane, Workload Identity for pods, node access). Avoid static credentials.
- Azure Key Vault via **Secrets Store CSI Driver** for secrets
- Enable **Azure Policy** + **Deployment Safeguards**
- Enable **Encryption at rest** for etcd/API server; **in-transit** for node-to-node
- Allow only signed, policy-approved images (Azure Policy + Ratify), prefer **Azure Container Registry**
- **Isolation**: Use namespaces, network policies, scoped logging

### 4. Observability
- Use Azure Monitor and Container Insights for AKS monitoring enablement (logs + Prometheus + Grafana).

### 5. Upgrades & Patching
- Configure **Maintenance Windows** for controlled upgrade timing
- Enable **auto-upgrades** for cluster and node OS to stay up-to-date with security patches and Kubernetes versions
- Consider **LTS versions** for enterprise stability (2-year support) by upgrading your cluster to the AKS Premium tier
- **Multi-cluster upgrades**: Use **AKS Fleet Manager** for staged rollout across test → production clusters

### 6. Performance
- Use **Ephemeral OS disks** (`--node-osdisk-type Ephemeral`) for faster node startup
- Select **Azure Linux** as node OS (smaller footprint, faster boot)
- Enable **KEDA** for event-driven autoscaling beyond HPA

### 7. Node Pools & Compute
- **Dedicated system node pool**: At least 2 nodes, tainted for system workloads only (`CriticalAddonsOnly`)
- Enable **Node Auto Provisioning (NAP)** on all pools for cost savings and responsive scaling
- Use **latest generation SKUs (v5/v6)** for host-level optimizations
- **Avoid B-series VMs** — burstable SKUs cause performance/reliability issues
- Use SKUs with **at least 4 vCPUs** for production workloads
- Set **topology spread constraints** to distribute pods across hosts/zones per SLO

### 8. Reliability
- Deploy across **3 Availability Zones** (`--zones 1 2 3`)
- Use **Standard tier** for zone-redundant control plane + 99.95% SLA for API server availability
- Enable **Microsoft Defender for Containers** for runtime protection
- Configure **PodDisruptionBudgets** for all production workloads
- Use **topology spread constraints** to ensure pod distribution across failure domains

### 9. Cost Controls
- Use **Spot node pools** for batch/interruptible workloads (up to 90% savings)
- **Stop/Start** dev/test clusters: `az aks stop/start`
- Consider **Reserved Instances** or **Savings Plans** for steady-state workloads

## Guardrails / Safety
- Do not request or output secrets (tokens, keys, subscription IDs).
- If requirements are ambiguous for day-0 critical decisions, ask the user clarifying questions. For day-1 enabled features, propose 2–3 safe options with tradeoffs and choose a conservative default.
- Do not promise zero downtime; advise workload safeguards (PDBs, probes, replicas) and staged upgrades along with best practices for reliability and performance.

## MCP Tools
| Tool | Purpose | Key Parameters |
|------|---------|----------------|
| `mcp_azure_mcp_aks` | Create and query AKS clusters at subscription scope | `subscription_id`, `resource_group` |

## Error Handling
| Error / Symptom | Likely Cause | Remediation |
|-----------------|--------------|-------------|
| MCP tool call fails or times out | Invalid credentials, subscription, or cluster context | Verify `az login`, check subscription ID and resource group |
| Quota exceeded | Regional vCPU or resource limits | Request quota increase or select different region/VM SKU |
| Networking conflict (IP exhaustion) | Pod subnet too small for overlay/CNI | Re-plan IP ranges; may require cluster recreation (Day-0) |
| Workload Identity not working | Missing OIDC issuer or federated credential | Enable `--enable-oidc-issuer --enable-workload-identity`, configure federated identity |
33 changes: 33 additions & 0 deletions plugin/skills/azure-kubernetes/references/cli-reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# CLI Reference for AKS

```bash
# List AKS clusters
az aks list --output table

# Show cluster details
az aks show --name <cluster-name> --resource-group <resource-group>

# Get available Kubernetes versions
az aks get-versions --location <location> --output table

# Create AKS Automatic cluster
az aks create --name <cluster-name> --resource-group <resource-group> --sku automatic \
--network-plugin azure --network-plugin-mode overlay \
--enable-oidc-issuer --enable-workload-identity

# Create AKS Standard cluster
az aks create --name <cluster-name> --resource-group <resource-group> \
--node-count 3 --zones 1 2 3 \
--network-plugin azure --network-plugin-mode overlay \
--enable-cluster-autoscaler --min-count 1 --max-count 10

# Get credentials
az aks get-credentials --name <cluster-name> --resource-group <resource-group>

# List node pools
az aks nodepool list --cluster-name <cluster-name> --resource-group <resource-group> --output table

# Enable monitoring
az aks enable-addons --name <cluster-name> --resource-group <resource-group> \
--addons monitoring --workspace-resource-id <workspace-resource-id>
```
2 changes: 1 addition & 1 deletion plugin/skills/azure-prepare/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: "Prepare Azure apps for deployment (infra Bicep/Terraform, azure.ya
license: MIT
metadata:
author: Microsoft
version: "1.0.6"
version: "1.0.7"
---

# Azure Prepare
Expand Down
38 changes: 33 additions & 5 deletions plugin/skills/azure-prepare/references/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,18 +22,46 @@ Select hosting stack and map components to Azure services.
| Workflow / orchestration | | ✓✓ (Durable Functions + DTS) | |
| Minimal ops overhead | | ✓✓ | ✓ |

### Container Hosting: Container Apps vs AKS

| Factor | Container Apps | AKS |
|--------|:--------------:|:---:|
| **Scale to zero** | ✓✓ | |
| **Kubernetes API access** | | ✓✓ |
| **Custom operators/CRDs** | | ✓✓ |
| **Service mesh** | Dapr (built-in) | Istio, Cilium |
| **GPU workloads** | | ✓✓ |
| **Best for** | Microservices, event-driven | Full K8s control, complex workloads |

#### When to Use Container Apps
- Microservices without Kubernetes complexity
- Event-driven workloads (KEDA built-in)
- Need scale-to-zero for cost optimization
- Teams without Kubernetes expertise

#### When to Use AKS
- Need Kubernetes API/kubectl access
- Require custom operators or CRDs
- Service mesh requirements (Istio, Linkerd)
- GPU/ML workloads
- Complex networking or multi-tenant architectures

> **AKS Planning:** For AKS SKU selection (Automatic vs Standard), networking, identity, scaling, and security configuration, invoke the **azure-kubernetes** skill.

## Service Mapping

### Hosting

| Component Type | Primary Service | Alternatives |
|----------------|-----------------|--------------|
| SPA Frontend | Static Web Apps | Blob + CDN |
| SSR Web App | Container Apps | App Service |
| REST/GraphQL API | Container Apps | App Service, Functions |
| Background Worker | Container Apps | Functions |
| Scheduled Task | Functions (Timer) | Container Apps Jobs |
| Event Processor | Functions | Container Apps |
| SSR Web App | Container Apps | App Service, AKS |
| REST/GraphQL API | Container Apps | App Service, Functions, AKS |
| Background Worker | Container Apps | Functions, AKS |
| Scheduled Task | Functions (Timer) | Container Apps Jobs, Kubernetes CronJob (on AKS) |
| Event Processor | Functions | Container Apps, AKS + KEDA |
| Microservices (full K8s) | AKS | Container Apps |
| GPU/ML Workloads | AKS | Azure ML |

### Data

Expand Down
125 changes: 125 additions & 0 deletions tests/azure-kubernetes/__snapshots__/triggers.test.ts.snap
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
// Jest Snapshot v1, https://goo.gl/fbAQLP

exports[`azure-kubernetes - Trigger Tests Trigger Keywords Snapshot skill description triggers match snapshot 1`] = `
{
"description": "Plan, create, and configure production-ready Azure Kubernetes Service (AKS) clusters. Covers Day-0 checklist, SKU selection (Automatic vs Standard), networking options (private API server, Azure CNI Overlay, egress configuration), security (workload identity, Azure Policy, Key Vault CSI driver, Deployment Safeguards), and operations (monitoring, upgrade strategy, autoscaling, cost analysis, node pools). WHEN: provision AKS cluster, design AKS networking, choose AKS SKU, secure AKS, set up AKS.",
"extractedKeywords": [
"aks",
"analysis",
"automatic",
"autoscaling",
"azure",
"checklist",
"choose",
"cli",
"cluster",
"clusters",
"configuration",
"configure",
"container",
"cost",
"covers",
"create",
"day-0",
"deploy",
"deployment",
"design",
"diagnostic",
"driver",
"egress",
"entra",
"function",
"identity",
"key vault",
"kubernetes",
"mcp",
"monitor",
"monitoring",
"networking",
"node",
"observability",
"operations",
"options",
"overlay",
"plan",
"policy",
"pools",
"private",
"production-ready",
"provision",
"safeguards",
"secure",
"security",
"selection",
"server",
"service",
"standard",
"strategy",
"upgrade",
"vault",
"when",
"workload",
],
"name": "azure-kubernetes",
}
`;

exports[`azure-kubernetes - Trigger Tests Trigger Keywords Snapshot skill keywords match snapshot 1`] = `
[
"aks",
"analysis",
"automatic",
"autoscaling",
"azure",
"checklist",
"choose",
"cli",
"cluster",
"clusters",
"configuration",
"configure",
"container",
"cost",
"covers",
"create",
"day-0",
"deploy",
"deployment",
"design",
"diagnostic",
"driver",
"egress",
"entra",
"function",
"identity",
"key vault",
"kubernetes",
"mcp",
"monitor",
"monitoring",
"networking",
"node",
"observability",
"operations",
"options",
"overlay",
"plan",
"policy",
"pools",
"private",
"production-ready",
"provision",
"safeguards",
"secure",
"security",
"selection",
"server",
"service",
"standard",
"strategy",
"upgrade",
"vault",
"when",
"workload",
]
`;
Loading
Loading