feat(kubernetes): add service account name to KubernetesRunner config… #64
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…uration
Changes: AWS IRSA Support and Kubernetes Security Enhancements
Overview
This document describes two critical enhancements to Zoe's Kubernetes runner that enable secure access to AWS resources (like MSK) and compliance with modern Kubernetes security policies.
1. AWS IRSA (IAM Roles for Service Accounts) Support
What is IRSA?
IRSA is an AWS EKS feature that allows Kubernetes pods to assume IAM roles without requiring:
Instead, EKS uses a Service Account with annotations to map it to an IAM role, and pods automatically receive temporary AWS credentials via the AWS STS (Security Token Service).
Changes Made
1.1. KubernetesRunner Configuration (
zoe-service/src/runners/kubernetes.kt)Added
serviceAccountNamefield to the Config data class:Modified pod generation to set the service account:
1.2. CLI Configuration (
zoe-cli/src/config/config.kt)Extended KubernetesRunnerConfig:
Usage Example
Example configuration file (
examples/config/kubernetes/service-account-example.yml):Required Kubernetes Service Account setup:
IAM Role Trust Policy:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.region.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "oidc.eks.region.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:zoe-namespace:zoe-service-account" } } } ] }Benefits
✅ No credential management: No need to inject AWS credentials into pods
✅ Fine-grained access control: Each service account can have different IAM permissions
✅ Audit trail: AWS CloudTrail logs show which service account made each API call
✅ Automatic credential rotation: STS credentials are automatically refreshed
✅ Works with MSK IAM authentication: Enables secure access to AWS MSK clusters
2. Kubernetes Security Context Enhancements
The Problem
Modern Kubernetes clusters (especially EKS, GKE in production environments) enforce strict security policies via admission webhooks like Gatekeeper or Pod Security Admission. These policies reject pods that don't meet minimum security standards.
Original error encountered:
Root Cause
The original
pod.template.jsonhad nosecurityContextdefinitions, which meant:allowPrivilegeEscalation: truefalseRuntimeDefaultChanges Made to
pod.template.json2.1. Pod-level Security Context
Added to
spec.securityContext:{ "spec": { "securityContext": { "runAsNonRoot": true, // Prevents running as root "runAsUser": 1000, // Forces non-privileged UID "fsGroup": 1000, // Ensures shared volume access "seccompProfile": { "type": "RuntimeDefault" // Enables syscall filtering } }, // ... rest of spec } }Why this matters:
/outputvolumeptrace,reboot,mount, etc.2.2. Container-level Security Contexts
Added to each container (init container
create-output-file, main containerzoe, and sidecartailer):{ "securityContext": { "allowPrivilegeEscalation": false, // Blocks setuid/setgid exploits "capabilities": { "drop": ["ALL"] // Removes all Linux capabilities }, "runAsNonRoot": true, "runAsUser": 1000, "seccompProfile": { "type": "RuntimeDefault" } } }Linux Capabilities Dropped:
By dropping
ALLcapabilities, we remove privileges like:CAP_NET_RAW: Creating raw sockets (network sniffing)CAP_SYS_ADMIN: Mounting filesystems, loading kernel modulesCAP_DAC_OVERRIDE: Bypassing file permissionsCAP_KILL: Sending signals to arbitrary processesCAP_SETUID/CAP_SETGID: Changing user/group IDsWhy Zoe doesn't need these capabilities:
/outputvolumeNone of these operations require special Linux capabilities.
Security Benefits
✅ Least privilege principle: Pods run with minimal permissions
✅ Defense in depth: Security at both pod and container levels
✅ Exploit mitigation: Even if a container is compromised, damage is limited
✅ Compliance: Meets PCI-DSS, SOC2, and other security standards
✅ Production-ready: Compatible with hardened Kubernetes clusters
Before and After Comparison
Before (INSECURE):
{ "name": "create-output-file", "image": "alpine:3.9.5", "command": ["touch", "/output/response.txt"], "volumeMounts": [...] // ❌ No securityContext - runs with default privileges }After (SECURE):
{ "name": "create-output-file", "image": "alpine:3.9.5", "command": ["touch", "/output/response.txt"], "volumeMounts": [...], "securityContext": { "allowPrivilegeEscalation": false, "capabilities": { "drop": ["ALL"] }, "runAsNonRoot": true, "runAsUser": 1000, "seccompProfile": { "type": "RuntimeDefault" } } }Testing
Test Coverage
Added comprehensive tests in
zoe-service/test/runners/KubernetesRunnerTest.kt:Manual Testing
Deployment Considerations
For IRSA Support
eks.amazonaws.com/role-arnannotationFor Security Contexts
✅ No additional setup required - these changes make Zoe compatible with secured clusters by default
runAsUserin the template.Backward Compatibility
✅ Fully backward compatible:
serviceAccountNameis optional - if not specified, pods use the default service accountReferences