Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,50 @@ They can be set in the [conf](../conf/spark-defaults.conf) file.
Annotations should be in the format key=value, e.g. `nginx.ingress.kubernetes.io/rewrite-target=/`.
- `spark.armada.driver.ingress.certName` - The name of the TLS certificate to use for the Ingress resource.
This is used when `spark.armada.driver.ingress.tls.enabled` is set to true.
- `spark.armada.driver.ingress.port` - The port to expose via Ingress. If not set, defaults to OAuth proxy port (if enabled) or Spark UI port.

### OAuth2 Authentication Configuration

`armada-spark` supports OAuth2-based authentication for the Spark Driver WebUI using OAuth2-Proxy as a native sidecar.
For detailed setup instructions and examples, see [UI Access Documentation](./ui.md).

- `spark.armada.oauth.enabled` - Enable OAuth2 authentication for Spark UI.
- `spark.armada.oauth.clientId` - OAuth2 client ID.
- `spark.armada.oauth.clientSecret` - OAuth2 client secret.
- `spark.armada.oauth.clientSecretK8s` - Name of Kubernetes secret containing client secret.
- `spark.armada.oauth.issuerUrl` - OIDC issuer URL.
- `spark.armada.oauth.redirectUrl` - OAuth redirect URL.
- `spark.armada.oauth.proxy.image` - OAuth2-proxy Docker image.
- `spark.armada.oauth.proxy.port` - Port for OAuth2-proxy to listen on.
- `spark.armada.oauth.providerDisplayName` - Provider name shown in OAuth UI.
- `spark.armada.oauth.skipProviderDiscovery` - Skip OIDC discovery and use explicit endpoints.
- `spark.armada.oauth.loginUrl` - OIDC authorization endpoint.
- `spark.armada.oauth.redeemUrl` - OIDC token endpoint.
- `spark.armada.oauth.validateUrl` - OIDC userinfo endpoint.
- `spark.armada.oauth.jwksUrl` - OIDC JWKS endpoint.
- `spark.armada.oauth.extraAudiences` - Comma-separated list of additional OIDC audiences.
- `spark.armada.oauth.emailDomain` - Allowed email domains.
- `spark.armada.oauth.skipJwtBearerTokens` - Skip JWT bearer token validation.
- `spark.armada.oauth.skipProviderButton` - Skip provider selection button.
- `spark.armada.oauth.skipAuthPreflight` - Skip authentication for OPTIONS requests.
- `spark.armada.oauth.passHostHeader` - Pass Host header to upstream.
- `spark.armada.oauth.whitelistDomain` - Whitelist redirect domains.
- `spark.armada.oauth.cookieName` - OAuth session cookie name.
- `spark.armada.oauth.cookiePath` - Cookie path.
- `spark.armada.oauth.cookieSecure` - Require HTTPS for cookies.
- `spark.armada.oauth.cookieSamesite` - SameSite cookie attribute.
- `spark.armada.oauth.cookieCsrfPerRequest` - Enable CSRF per request.
- `spark.armada.oauth.cookieCsrfExpire` - CSRF cookie expiration duration.
- `spark.armada.oauth.tls.caCertPath` - Path to CA certificate for custom TLS validation.
- `spark.armada.oauth.tls.caBundlePath` - Path to CA bundle for custom TLS validation.
- `spark.armada.oauth.skipVerify` - Skip TLS certificate verification.
- `spark.armada.oauth.insecureSkipIssuerVerification` - Skip OIDC issuer verification.
- `spark.armada.oauth.insecureAllowUnverifiedEmail` - Allow unverified email addresses.
- `spark.armada.oauth.codeChallengeMethod` - PKCE code challenge method.
- `spark.armada.oauth.resources.cpu` - CPU resource limit/request for OAuth proxy.
- `spark.armada.oauth.resources.memory` - Memory resource limit/request for OAuth proxy.

See [UI Access Documentation](./ui.md) for examples and troubleshooting.

# Building `armada-spark`

Expand Down
140 changes: 140 additions & 0 deletions docs/ui.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# Spark Driver UI Access

## Direct Access

### Port-forward (simplest)

```bash
kubectl -n <namespace> port-forward <driver-pod-name> 4040:4040
```

Then open: `http://localhost:4040`

**Finding the pod:** Check Lookout UI for job details. Pod name is typically `armada-<job-id>-0`.

### Basic Ingress (no auth)

```bash
--conf spark.armada.driver.ingress.enabled=true
```

**Warning:** Exposes UI publicly without authentication!

---

## OAuth2-Protected Access

Uses [oauth2-proxy](https://oauth2-proxy.github.io/oauth2-proxy/) as a native sidecar for authentication.

### Quick Start

```bash
/opt/spark/bin/spark-class org.apache.spark.deploy.ArmadaSparkSubmit \
--master armada://localhost:50051 \
--deploy-mode cluster \
--name my-secure-job \
--class org.apache.spark.examples.SparkPi \
--conf spark.armada.container.image=armada-spark \
--conf spark.armada.oauth.enabled=true \
--conf spark.armada.oauth.clientId=spark-oauth-client \
--conf spark.armada.oauth.clientSecret=your-secret \
--conf spark.armada.oauth.issuerUrl=https://keycloak.example.com/realms/spark \
--conf spark.armada.driver.ingress.enabled=true \
--conf spark.armada.driver.ingress.tls.enabled=true \
--conf spark.armada.driver.ingress.certName=my-tls-cert \
local:///opt/spark/examples/jars/spark-examples.jar
```

**What happens:**
1. `oauth` sidecar container added to driver pod
2. Ingress → oauth2-proxy (port 4180) → authenticates user → Spark UI (localhost:4040)
3. oauth2-proxy terminates when driver completes

### Configuration Examples

See [OAuth2 Authentication Configuration](./architecture.md#oauth2-authentication-configuration) for all parameters.

**Using OIDC discovery:**
```bash
--conf spark.armada.oauth.enabled=true \
--conf spark.armada.oauth.clientId=my-client \
--conf spark.armada.oauth.clientSecret=my-secret \
--conf spark.armada.oauth.issuerUrl=https://provider.com/realms/spark \
--conf spark.armada.driver.ingress.enabled=true \
--conf spark.armada.driver.ingress.tls.enabled=true
```

**Manual endpoints (no discovery):**
```bash
--conf spark.armada.oauth.enabled=true \
--conf spark.armada.oauth.skipProviderDiscovery=true \
--conf spark.armada.oauth.loginUrl=https://provider.com/auth \
--conf spark.armada.oauth.redeemUrl=http://provider.svc.cluster.local/token \
--conf spark.armada.oauth.validateUrl=http://provider.svc.cluster.local/userinfo \
--conf spark.armada.oauth.jwksUrl=http://provider.svc.cluster.local/certs
```

**Use cluster-internal URLs** for `redeemUrl`/`validateUrl`/`jwksUrl`, external URL for `loginUrl`.

**Using K8s secrets (recommended):**
```bash
kubectl create secret generic spark-oauth-secret \
--from-literal=client-secret=your-secret -n spark-jobs

--conf spark.armada.oauth.clientId=my-client \
--conf spark.armada.oauth.clientSecretK8s=spark-oauth-secret
```

---

## Troubleshooting

### 502 Bad Gateway after login

**Cause:** Spark UI not running (job finished too quickly or UI disabled)

**Check logs:**
```bash
kubectl logs -n <namespace> <driver-pod> -c oauth
```

Look for: `Error proxying to upstream server: dial tcp 127.0.0.1:4040: connect: connection refused`

**Solutions:**
- Use longer-running job (Spark Pi finishes in seconds)
- Spark UI has 90s delay after job completion by default
- Verify `spark.ui.enabled=true` (default)

### Authentication keeps redirecting

**Cause:** Cookie config or OIDC provider issues

**Solutions:**
```bash
# For HTTP (dev only):
--conf spark.armada.oauth.cookieSecure=false

# Check SameSite:
--conf spark.armada.oauth.cookieSamesite=lax

# Verify redirect URL matches OIDC provider config:
--conf spark.armada.oauth.redirectUrl=https://your-host/oauth2/callback
```

### Finding ingress URL

In Lookout, under Result tab, as soon as a Job is leased to a Cluster and bound to a Node, the Ingress URL will be accessible in that tab.

Or alternatively, the Ingress URL can be looked up by fetching the Ingress from the namespace where the Job is scheduled.
```bash
kubectl get ingress -n <namespace>
# Output: oauth-4180-armada-<job-id>-0.namespace.svc
```

---

## Resources

- [OAuth2 Configuration Reference](./architecture.md#oauth2-authentication-configuration)
- [oauth2-proxy docs](https://oauth2-proxy.github.io/oauth2-proxy/)
- [Spark UI docs](https://spark.apache.org/docs/latest/web-ui.html)
Loading