Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

driver.ports and executor.ports fields in SparkApplication not honoured #2113

Open
marcoboi opened this issue Aug 4, 2024 · 7 comments
Open

Comments

@marcoboi
Copy link

marcoboi commented Aug 4, 2024

I've deployed the spark-operator using the helm chart version 1.2.14.
I'm trying to expose a port on the driver and executors.
According to the docs, this should be possible.
Yet, when I try that, the port is not actually exposed on the pod and the following error is produced:

W0804 14:01:34.375459   32829 warnings.go:70] unknown field "spec.driver.ports"
W0804 14:01:34.375494   32829 warnings.go:70] unknown field "spec.executor.ports"

Here is the configuration I'm using:

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: my_namespace
spec:
  type: Scala
  mode: cluster
  image: "apache/spark:v3.3.1"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.3.1.jar"
  arguments: [ "1000000" ]
  sparkVersion: "3.3.1"
  sparkUIOptions:
    serviceLabels:
      test-label/v1: 'true'
  sparkConf:
    spark.ui.prometheus.enabled: "true"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  monitoring:
    exposeDriverMetrics: true
    exposeExecutorMetrics: true
  driver:
    cores: 1
    memory: "4g"
    memoryOverhead: "1g"
    labels:
      version: 3.3.1
    serviceAccount: {{ .Values.serviceAccount }}
    ports:
      - name: metrics
        containerPort: 8081
        protocol: TCP
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    ports:
      - name: metrics
        containerPort: 8081
        protocol: TCP
    cores: 1
    memory: "2g"
    memoryOverhead: "1g"
    instances: 2
    labels:
      version: 3.3.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
@marcoboi marcoboi changed the title driver.ports field in SparkApplication not honoured driver.ports and executor.ports fields in SparkApplication not honoured Aug 4, 2024
@marcoboi
Copy link
Author

marcoboi commented Aug 5, 2024

It seems the same issue has already been reported.
There's a PR open to solve the issue for over one year.

@ChenYi015
Copy link
Contributor

@marcoboi The CRD files have been updated in chart version 1.4.5 and ports can be exposed with driver.ports and executor.ports. Your SparkApplication example should work, even though the port definition is not the same as corev1.ContainerPort.

@marcoboi
Copy link
Author

marcoboi commented Aug 5, 2024

Thank you @ChenYi015.
I've upgraded to version 1.4.6.
It seems the error has disappeared, but I still do not see the port marked on the pods.
Here's the driver manifest generated by the deployment:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2024-08-05T13:39:27Z"
  labels:
    app.kubernetes.io/managed-by: Helm
    spark-app-name: spark-pi
    spark-app-selector: spark-fef2ba950ea143b993696409098e78ce
    spark-role: driver
    spark-version: 3.5.0
    sparkoperator.k8s.io/app-name: spark-pi
    sparkoperator.k8s.io/launched-by-spark-operator: "true"
    sparkoperator.k8s.io/submission-id: d0b05b47-2dc5-4e52-9d1b-3277675857f2
    version: 3.3.1
  name: spark-pi-driver
  namespace: spark-pi
  resourceVersion: "27610"
  uid: 203e82de-5fc7-4d68-adf1-0263d4403056
spec:
  containers:
  - args:
    - driver
    - --properties-file
    - /opt/spark/conf/spark.properties
    - --class
    - org.apache.spark.examples.SparkPi
    - local:///opt/spark/examples/jars/spark-examples_2.12-3.3.1.jar
    - "1000000"
    env:
    - name: SPARK_USER
      value: root
    - name: SPARK_APPLICATION_ID
      value: spark-fef2ba950ea143b993696409098e78ce
    - name: SPARK_DRIVER_BIND_ADDRESS
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    - name: SPARK_LOCAL_DIRS
      value: /var/data/spark-166126c1-a5df-4e59-886b-cdca3202021f
    - name: SPARK_CONF_DIR
      value: /opt/spark/conf
    image: apache/spark:v3.3.1
    imagePullPolicy: Always
    name: spark-kubernetes-driver
    ports:
    - containerPort: 7078
      name: driver-rpc-port
      protocol: TCP
    - containerPort: 7079
      name: blockmanager
      protocol: TCP
    - containerPort: 4040
      name: spark-ui
      protocol: TCP
    resources:
      limits:
        memory: 5Gi
      requests:
        cpu: "1"
        memory: 5Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/data/spark-166126c1-a5df-4e59-886b-cdca3202021f
      name: spark-local-dir-1
    - mountPath: /opt/spark/conf
      name: spark-conf-volume-driver
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-7jb44
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-10-0-11-150.eu-west-1.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: spark-pi
  serviceAccountName: spark-pi
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: spark-local-dir-1
  - configMap:
      defaultMode: 420
      items:
      - key: spark.properties
        mode: 420
        path: spark.properties
      name: spark-drv-f7d3679122c429ef-conf-map
    name: spark-conf-volume-driver
  - name: kube-api-access-7jb44
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-08-05T13:39:28Z"
    status: "True"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2024-08-05T13:39:27Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-08-05T13:39:28Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-08-05T13:39:28Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-08-05T13:39:27Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://dfa048b5f6f5f85da7911187380d34b48cb6d49eb16914377b4dd535f5c3f1ff
    image: docker.io/apache/spark:v3.3.1
    imageID: docker.io/apache/spark@sha256:b97ef98cd1456fc7aed6d6428f1a6f859e61b1491cf498eceed33c67da83dd2c
    lastState: {}
    name: spark-kubernetes-driver
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-08-05T13:39:28Z"
  hostIP: 10.0.11.150
  hostIPs:
  - ip: 10.0.11.150
  phase: Running
  podIP: 10.0.11.94
  podIPs:
  - ip: 10.0.11.94
  qosClass: Burstable
  startTime: "2024-08-05T13:39:27Z"

Same for the executors (I only report the port section for brevity):

    ports:
    - containerPort: 7079
      name: blockmanager
      protocol: TCP

@ChenYi015
Copy link
Contributor

@marcoboi Did you enable the webhook? The ports are patched by webhook server, so you need to enable the webhook by adding --set webhook.enable=true to your helm install command when installing the helm chart.

@marcoboi
Copy link
Author

marcoboi commented Aug 8, 2024

Thank you @ChenYi015 .
I've re-deployed using the values you provided.
The ports are now present in the manifest of the driver and executor pods created from the SparkApplication manifest, as expected.

I would be worth documenting this more in detail.
If you point me to the appropriate location I can edit the documentation to clarify this point.

@ChenYi015
Copy link
Contributor

@marcoboi Thank you for the advice about improving the docs Writing a SparkApplication | Kubeflow, the docs are somewhat outdated. Though, we are preparing to release Spark operator version 2.0.0-rc.0 shortly, and in this new version, the webhook will be enabled by default. Notably, there will no longer be an option such as webhook.enable to toggle the webhook's activation status. So I think there will be no need to clarify how to enable the webhook anymore.

Copy link

github-actions bot commented Nov 6, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants