Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
b635db5
Start adding support for running against remote Armada cluster server.
richscott Nov 17, 2025
49aa0a7
Merge branch 'master' into rich/support-remote-server
richscott Nov 17, 2025
b14a8e8
Add script for getting TLS cert files for remote server
richscott Nov 19, 2025
16fc24e
Merge branch 'master' into rich/support-remote-server
richscott Nov 19, 2025
7d3a456
Move K8sClient class from src/test into src/main
richscott Nov 20, 2025
5bc5858
Merge branch 'master' into rich/support-remote-server
richscott Nov 21, 2025
aa1e144
Remove debugging messages; re-enable init-cluster on e2e script.
richscott Nov 21, 2025
c538b1b
In E2E script, if Armada server is not localhost, don't do certain st…
richscott Nov 21, 2025
0c06f07
Add more README.md content on setting up for using a remote Armada se…
richscott Nov 21, 2025
fee4552
Quote evaluated script vars, for CI checker
richscott Nov 24, 2025
e4e4a83
Disable building/testing for Spark 3.5.5 config for now.
richscott Nov 24, 2025
7e49435
Scala linter fixes; Bash lint fixes
richscott Nov 24, 2025
2b705ae
Add ARMADA_LOOKOUT_URL to init.sh
richscott Nov 24, 2025
cc7639a
Check if TLS cert vars are defined before referencing
richscott Nov 24, 2025
b3d4265
Conditionally add TLS cert properties to test invocation
richscott Nov 24, 2025
616dcac
Dynamically extract external IP addr for TLS cert setup
richscott Dec 1, 2025
8127bcb
Check for busybox image for init container
richscott Dec 2, 2025
5d51466
Add init container image name.
richscott Dec 2, 2025
1172550
Add diagnostic search log for .spark-* directory
richscott Dec 2, 2025
a0c0d8a
More GH Actions debugging
richscott Dec 2, 2025
233e413
More diagnostics for finding spark source dir
richscott Dec 2, 2025
9e0b309
Clone the Spark repo to get bin/spark-class and jars.
richscott Dec 2, 2025
6345155
Always rebuild the Spark jars before running E2E.
richscott Dec 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,16 @@ jobs:
fail-fast: false
matrix:
include:
- scala_version: "2.12.15"
spark_version: "3.3.4"
java_version: "11"
# - scala_version: "2.12.15"
# spark_version: "3.3.4"
# java_version: "11"
- scala_version: "2.12.18"
spark_version: "3.5.5"
java_version: "17"

- scala_version: "2.13.8"
spark_version: "3.3.4"
java_version: "11"
# - scala_version: "2.13.8"
# spark_version: "3.3.4"
# java_version: "11"
- scala_version: "2.13.8"
spark_version: "3.5.5"
java_version: "17"
Expand All @@ -35,4 +35,4 @@ jobs:
with:
spark_version: ${{ matrix.spark_version }}
scala_version: ${{ matrix.scala_version }}
java_version: ${{ matrix.java_version }}
java_version: ${{ matrix.java_version }}
12 changes: 6 additions & 6 deletions .github/workflows/e2e.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,16 @@ jobs:
fail-fast: false
matrix:
include:
- scala_version: "2.12.15"
spark_version: "3.3.4"
java_version: "11"
# - scala_version: "2.12.15"
# spark_version: "3.3.4"
# java_version: "11"
- scala_version: "2.12.18"
spark_version: "3.5.5"
java_version: "17"

- scala_version: "2.13.8"
spark_version: "3.3.4"
java_version: "11"
# - scala_version: "2.13.8"
# spark_version: "3.3.4"
# java_version: "11"
- scala_version: "2.13.8"
spark_version: "3.5.5"
java_version: "17"
Expand Down
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,51 @@ Run the following command to load the Armada Spark image into your local kind cl
kind load docker-image $IMAGE_NAME --name armada
```

### Running a remote Armada server using Armada Operator
The default Armada Operator setup allows only localhost access. You can quickly set up a local Armada server
configured to allow external access from other hosts, useful for client development and testing. For this
configuration:

- Copy the file `e2e/kind-config-external-access.yaml` in this repository to `hack/kind-config.yaml`
in your `armada-operator` repository.

- Edit the newly-copied `hack/kind-config.yaml` as noted in the beginning comments of that file.

- Run the armada-operator setup commands (usually `make kind-all`) to create and start your Armada instance.

- Copy the `$HOME/.kube/config` and `$HOME/.armadctl.yaml` (that Armada Operator will generate) from the Armada
server host to your `$HOME` directory on the client (local) host. Then edit the local `.kube/config` and on
the line that has `server: https://0.0.0.0:6443`, change the `0.0.0.0` address to the IP address or hostname
of the remote Armada server system.

- Generate a copy of the client TLS key, cert, and CA-cert files: (1) go into the `e2e` subdirectory, and
run `./extract-kind-cert.sh` - it will generate `client.crt`, `client.key`, and `ca.crt`, from the output
of `kubectl config view`. These files can be left in this directory.

Comment on lines +102 to +103
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should e2e/*crt be in .gitignore?

- Copy the `$HOME/.armadactl.yaml` from the Armada server host to your home directory on your client system.

- You should then be able to run `kubectl get pods -A` and see a list of the running pods on the remote
Armada server, as well as running `armadactl get queues`.

- Verify the functionality of your setup by editing `scripts/config.sh` and changing the following line:
```
ARMADA_MASTER=armada://192.168.12.135:30002
```
to the IP address or hostname of your Armada server. You should not need to change the port number.

Also, set the location of the three TLS certificate files by adding/setting:
```
CLIENT_CERT_FILE=e2e/client.crt
CLIENT_KEY_FILE=e2e/client.key
CLUSTER_CA_FILE=e2e/ca.crt
```

- You should be able to now verify the armada-spark configuration by running the E2E tests:
```
$ ./scripts/dev-e2e.sh
```
This will save its output to `e2e-test.log` for further debugging.

---

## Development
Expand Down
25 changes: 25 additions & 0 deletions e2e/extract-kind-cert.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash

CONTEXT="kind-armada"

E2E_DIR=$(realpath "$0" | xargs dirname)

cd "$E2E_DIR" || (echo "Error: could not cd to $E2E_DIR"; exit 1)

# What These Files Are
# - client.crt: Your user (client) certificate
# - client.key: The private key associated with the certificate
# - ca.crt: The CA certificate used by the Kubernetes API server (for verifying client and server certs)

# Extract the client certificate
kubectl config view --raw -o json | jq -r \
".users[] | select(.name == \"${CONTEXT}\") | .user.[\"client-certificate-data\"]" | base64 -d > client.crt

# Extract the client key
kubectl config view --raw -o json | jq -r \
".users[] | select(.name == \"${CONTEXT}\") | .user.[\"client-key-data\"]" | base64 -d > client.key

# Extract the cluster CA certificate
kubectl config view --raw -o json | jq -r \
".clusters[] | select(.name == \"${CONTEXT}\") | .cluster.[\"certificate-authority-data\"]" | base64 -d > ca.crt

52 changes: 52 additions & 0 deletions e2e/kind-config-external-access.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# A kind configuration for running an Armada server that can be accessed
# outside the host system, for working/developing with remote clients,
# such as Armada-Spark clients.
#
# This configuration will allow you to run kubectl and armadactl
# against the Armada instance on this system. To use this:
# - Copy your $HOME/.kube/config on this system to the same directory
# on your remote client host, then modify that copied file so the
# IP address in there (0.0.0.0) is the address of the external interface
# mentioned below.
# - Copy your $HOME/.armadactl.yaml to your $HOME directory on the remote
# client host, in that copied file, change the value of the 'armadaUrl'
# field from 'localhost' to the hostname (or IP address) of this server,
# and below that line a new line (at same indent level), add the entry
# forceNoTls: true
# You should then be able to run `kubectl cluster-info` or
# `armadactl get queues` without errors on the remote client host.
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: ClusterConfiguration
apiServer:
certSANs:
- localhost
- 127.0.0.1
# replace the following line with the IP address
# of the external interface on this system
- 192.168.12.135
- 0.0.0.0
extraPortMappings:
# Lookout UI
- containerPort: 30000
hostPort: 30000
protocol: TCP
# Armada Server REST API
- containerPort: 30001
hostPort: 30001
protocol: TCP
# Armada Server gRPC API
- containerPort: 30002
hostPort: 30002
protocol: TCP
# Kubernetes API
- containerPort: 6443
hostPort: 6443
protocol: TCP
- role: worker
labels:
armada-spark: true
109 changes: 94 additions & 15 deletions scripts/dev-e2e.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ source "$scripts"/init.sh

STATUSFILE="$(mktemp)"
AOREPO='https://github.com/armadaproject/armada-operator.git'
AOHOME="$scripts/../../armada-operator"
AOHOME=$(realpath "$scripts/../../armada-operator")
ARMADACTL_VERSION='0.19.1'

GREEN='\033[0;32m'
Expand Down Expand Up @@ -87,12 +87,37 @@ start-armada() {
fi
fi

echo "Running 'make kind-all' to install and start Armada; this may take up to 6 minutes"
kind_extern_cfg='e2e/kind-config-external-access.yaml'
if ! cp "$kind_extern_cfg" "$AOHOME/hack/kind-config.yaml"; then
err "There was an error copying $kind_extern_cfg to $AOHOME/hack/kind-config.yaml"
exit 1
fi

# Get IP address of first network interface that is not loopback or a K8S internal network interface
external_ip=$(ifconfig -a| grep -w 'inet' | grep -v 'inet 127\.0\.0' | grep -v 'inet 172\.' | awk '{print $2}' | sed -ne '1p')
if [ "$(uname -s)" = 'Darwin' ]; then
sed_opt='-I .bak'
else
sed_opt='-i.bak'
fi

if ! sed "$sed_opt" -e "s/192.168.12.135/$external_ip/" "$AOHOME/hack/kind-config.yaml"; then
err "There was an error modifying $AOHOME/hack/kind-config.yaml"
exit 1
fi

echo "Running 'make kind-all' to install and start Armada; this may take up to 6 minutes"
if ! (cd "$AOHOME"; make kind-all 2>&1) | tee armada-start.txt; then
echo ""
err "There was a problem starting Armada; exiting now"
exit 1
fi

echo "Extracting TLS client certificate files from Kind cluster"
if ! e2e/extract-kind-cert.sh; then
err "There was a problem extracting the certificates"
exit 1
fi
}

init-cluster() {
Expand All @@ -102,6 +127,12 @@ init-cluster() {
exit 1
fi

if ! (echo "$INIT_CONTAINER_IMAGE" | grep -Eq '^[[:alnum:]_]+:[[:alnum:]_]+$'); then
err "INIT_CONTAINER_IMAGE is not defined. Please set it in $scripts/config.sh, for example:"
err "INIT_CONTAINER_IMAGE=busybox:latest"
exit 1
fi

if [ -z "$ARMADA_QUEUE" ]; then
err "ARMADA_QUEUE is not defined. Please set it in $scripts/config.sh, for example:"
err "ARMADA_QUEUE=spark-test"
Expand All @@ -120,6 +151,17 @@ init-cluster() {
exit 1
fi

echo "Checking if image $INIT_CONTAINER_IMAGE is available"
if ! docker image inspect "$INIT_CONTAINER_IMAGE" > /dev/null 2>&1; then
echo "Image $INIT_CONTAINER_IMAGE not found in local Docker instance; pulling it from Docker Hub."
if ! docker pull "$INIT_CONTAINER_IMAGE"; then
err "Could not pull $INIT_CONTAINER_IMAGE; please try running"
err " docker pull $INIT_CONTAINER_IMAGE"
err "then run this script again"
exit 1
fi
fi

echo "Checking to see if Armada cluster is available ..."

if ! "$scripts"/armadactl get queues > "$STATUSFILE" 2>&1 ; then
Expand All @@ -140,43 +182,80 @@ init-cluster() {

mkdir -p "$scripts/.tmp"

TMPDIR="$scripts/.tmp" "$AOHOME/bin/tooling/kind" load docker-image "$IMAGE_NAME" --name armada 2>&1 \
| log_group "Loading Docker image $IMAGE_NAME into Armada cluster";
if [[ "$ARMADA_MASTER" == *"//localhost"* ]] ; then
for IMG in "$IMAGE_NAME" "$INIT_CONTAINER_IMAGE"; do
TMPDIR="$scripts/.tmp" "$AOHOME/bin/tooling/kind" load docker-image "$IMG" --name armada 2>&1 \
| log_group "Loading Docker image $IMG into Armada (Kind) cluster";
done
fi

# configure the defaults for the e2e test
cp $scripts/../e2e/spark-defaults.conf $scripts/../conf/spark-defaults.conf
cp "$scripts/../e2e/spark-defaults.conf" "$scripts/../conf/spark-defaults.conf"

log "Waiting 60 seconds for Armada to stabilize ..."
sleep 60
# If using a remote Armada server, assume it is already running and ready
if [[ "$ARMADA_MASTER" == *"//localhost"* ]] ; then
log "Waiting 60 seconds for Armada to stabilize ..."
sleep 60
fi
}

run-test() {
echo "Running Scala E2E test suite..."

if [[ ! -d ".spark-$SPARK_VERSION" ]]; then
echo "Checking out Spark sources for tag v$SPARK_VERSION."
git clone https://github.com/apache/spark --branch v$SPARK_VERSION --depth 1 --no-tags ".spark-$SPARK_VERSION"
fi

cd ".spark-$SPARK_VERSION"
# Spark 3.3.4 does not compile without this fix
if [[ "$SPARK_VERSION" == "3.3.4" ]]; then
sed -i -e "s%<scala.version>2.13.8</scala.version>%<scala.version>2.13.6</scala.version>%" pom.xml
# Fix deprecated openjdk base image - use eclipse-temurin:11-jammy instead.
spark_dockerfile="resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile"
if [ -f "$spark_dockerfile" ]; then
sed -i -e 's|FROM openjdk:|FROM eclipse-temurin:|g' "$spark_dockerfile"
sed -i -E 's/^ARG java_image_tag=11-jre-slim$/ARG java_image_tag=11-jammy/' "$spark_dockerfile"
fi
fi
./dev/change-scala-version.sh $SCALA_BIN_VERSION
# by packaging the assembly project specifically, jars of all depending Spark projects are fetch from Maven
# spark-examples jars are not released, so we need to build these from sources
./build/mvn --batch-mode clean
./build/mvn --batch-mode package -pl examples
./build/mvn --batch-mode package -Pkubernetes -Pscala-$SCALA_BIN_VERSION -pl assembly
cd ..

# Add armadactl to PATH so the e2e framework can access it
PATH="$scripts:$AOHOME/bin/tooling/:$PATH"
export PATH

# Change to armada-spark directory
cd "$scripts/.."

tls_args=()
test -n "${CLIENT_CERT_FILE:-}" && tls_args+=( -Dclient_cert_file="$CLIENT_CERT_FILE" )
test -n "${CLIENT_KEY_FILE:-}" && tls_args+=( -Dclient_key_file="$CLIENT_KEY_FILE" )
test -n "${CLUSTER_CA_FILE:-}" && tls_args+=( -Dcluster_ca_file="$CLUSTER_CA_FILE" )

# Run the Scala E2E test suite
mvn scalatest:test -Dsuites="org.apache.spark.deploy.armada.e2e.ArmadaSparkE2E" \
# env MAVEN_OPTS='-Dcom.sun.net.ssl.checkRevocation=false'
env KUBERNETES_TRUST_CERTIFICATES=true \
mvn -e scalatest:test -Dsuites="org.apache.spark.deploy.armada.e2e.ArmadaSparkE2E" \
-Dcontainer.image="$IMAGE_NAME" \
-Dscala.version="$SCALA_VERSION" \
-Dscala.binary.version="$SCALA_BIN_VERSION" \
-Dspark.version="$SPARK_VERSION" \
-Darmada.queue="$ARMADA_QUEUE" \
-Darmada.master="armada://localhost:30002" \
-Darmada.lookout.url="http://localhost:30000" \
-Darmadactl.path="$scripts/armadactl" 2>&1 | \
tee e2e-test.log

-Darmada.master="armada://$ARMADA_MASTER" \
-Darmada.lookout.url="$ARMADA_LOOKOUT_URL" \
-Darmadactl.path="$scripts/armadactl" \
${tls_args[@]:-} 2>&1 | tee e2e-test.log
TEST_EXIT_CODE=${PIPESTATUS[0]}

if [ "$TEST_EXIT_CODE" -ne 0 ]; then
err "E2E tests failed with exit code $TEST_EXIT_CODE"
exit $TEST_EXIT_CODE
exit "$TEST_EXIT_CODE"
fi

log "E2E tests completed successfully"
Expand All @@ -187,4 +266,4 @@ main() {
run-test
}

main
main
14 changes: 12 additions & 2 deletions scripts/init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -58,19 +58,29 @@ export USE_KIND="${USE_KIND:-false}"
export STATIC_MODE="${STATIC_MODE:-false}"
export IMAGE_NAME="${IMAGE_NAME:-spark:armada}"
export ARMADA_MASTER="${ARMADA_MASTER:-armada://localhost:30002}"
export ARMADA_LOOKOUT_URL="${ARMADA_LOOKOUT_URL:-https://localhost:30000}"
export ARMADA_QUEUE="${ARMADA_QUEUE:-test}"
export ARMADA_AUTH_TOKEN=${ARMADA_AUTH_TOKEN:-}
export SCALA_CLASS="${SCALA_CLASS:-org.apache.spark.examples.SparkPi}"
export RUNNING_E2E_TESTS="${RUNNING_E2E_TESTS:-false}"
export INIT_CONTAINER_IMAGE="${INIT_CONTAINER_IMAGE:-busybox:latest}"

if [ -n "${CLIENT_CERT_FILE:-}" ]; then
export CLIENT_CERT_FILE="${CLIENT_CERT_FILE}"
fi
if [ -n "${CLIENT_CERT_KEY:-}" ]; then
export CLIENT_CERT_KEY="${CLIENT_CERT_KEY}"
fi
if [ -n "${CLUSTER_CA_FILE:-}" ]; then
export CLUSTER_CA_FILE="${CLUSTER_CA_FILE}"
fi

if [ -z "${PYTHON_SCRIPT:-}" ]; then
PYTHON_SCRIPT="/opt/spark/examples/src/main/python/pi.py"
else
INCLUDE_PYTHON=true
fi



# derive Scala and Spark versions from pom.xml, set via ./scripts/set-version.sh
if [[ -z "${SCALA_VERSION:-}" ]]; then
export SCALA_VERSION=$(cd "$scripts/.."; mvn help:evaluate -Dexpression=scala.version -q -DforceStdout)
Expand Down
Loading
Loading