Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Service Account not working different namespace #2103

Open
devscheffer opened this issue Jul 26, 2024 · 4 comments
Open

[BUG] Service Account not working different namespace #2103

devscheffer opened this issue Jul 26, 2024 · 4 comments

Comments

@devscheffer
Copy link

Description

I use the helmchart of spark operator, it is deployed at the namespace spark-operator I configure on the helmrelease sparkJobNamespaces: spark-jobs that is the namespace where I want to run the jobs.
However, I'm getting this error

Name: "pyspark-pi", Namespace: "spark-jobs"
from server for: "STDIN": sparkapplications.sparkoperator.k8s.io "pyspark-pi" is forbidden: User "system:serviceaccount:spark-jobs:spark-sa" cannot get resource "sparkapplications" in API group "sparkoperator.k8s.io" in the namespace "spark-jobs"

@ChenYi015
Copy link
Contributor

@devscheffer Could you provide detailed information about how you install the helm chart? Is this service account spark-sa created by helm or by yourself?

@devscheffer
Copy link
Author

devscheffer commented Jul 29, 2024

it is created by the helm.

---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  labels:
    app: spark-operator
  name: spark-operator
  namespace: spark-operator
spec:
  chart:
    spec:
      chart: spark-operator
      reconcileStrategy: ChartVersion
      sourceRef:
        kind: HelmRepository
        name: spark-operator
      version: 1.4.0
  interval: 5m0s
  releaseName: spark-operator
  values:
    image:
      repository: docker.io/kubeflow/spark-operator
      pullPolicy: IfNotPresent
      tag: ""
    rbac:
      create: false
      createRole: true
      createClusterRole: true
      annotations: {}
    serviceAccounts:
      spark:
        create: true
        name: "spark-sa"
        annotations: {}
      sparkoperator:
        create: true
        name: "spark-operator-sa"
        annotations: {}
    sparkJobNamespaces:
      - spark-operator
      - team-1
    webhook:
      enable: true
      port: 443
      portName: webhook
      namespaceSelector: ""
      timeout: 30
    metrics:
      enable: true
      port: 10254
      portName: metrics
      endpoint: /metrics
      prefix: ""  
    tolerations:
      - key: "CriticalAddonsOnly"
        operator: "Exists"
        effect: "NoSchedule"

It works when I do manually through the terminal however when I execute from airflow I get this error
from server for: "STDIN": sparkapplications.sparkoperator.k8s.io "pyspark-pi2" is forbidden: User "system:serviceaccount:team-1:spark-sa" cannot get resource "sparkapplications" in API group "sparkoperator.k8s.io" in the namespace "team-1"

here is the task in airflow

spark_kpo = KubernetesPodOperator(
        task_id="kpo",
        name="spark-app-submission",
        namespace=namespace,
        image="bitnami/kubectl:1.28.11",
        cmds=["/bin/bash", "-c"],
        arguments=[f"echo '{spark_app_manifest_content}' | kubectl apply -f -"],
        in_cluster=True,
        get_logs=True,
        service_account_name=service_account_name,
        on_finish_action="keep_pod",
    )
    ```

@ChenYi015
Copy link
Contributor

@devscheffer The service account spark-sa actually does not have any permissions for SparkApplication, and it is used by spark driver pods. If you want to submit SparkApplication in airflow, you can configure the service account name to spark-operator-sa in KubernetesPodOperator instead. Or you can create a ServiceAccount manually and grant it with all permissions to SparkApplication.

@alexz0nder
Copy link

Hello.
I'd like to say, that I do have the same result.
I deployed helm v2.0.2 like so:

helm install spark-operator ./spark-operator \
    --version 2.0.2 \
    --create-namespace \
    --namespace spark-operator \
    --set 'spark.jobNamespaces={,airflow}' \
    --values ./values.yaml

With values.yaml for it was like:

nameOverride: ""
fullnameOverride: ""
commonLabels: {}

image:
  registry: docker.io
  repository: kubeflow/spark-operator
  tag: ""
  pullPolicy: IfNotPresent
  pullSecrets: []

controller:
  replicas: 1
  workers: 10
  logLevel: info
  uiService:
    enable: true
  uiIngress:
    enable: false
    urlFormat: ""
  batchScheduler:
    enable: true
    kubeSchedulerNames:
      - volcano
    default: ""
  serviceAccount:
    create: true
    name: ""
    annotations: {}
  rbac:
    create: true
    annotations: {}
  labels: {}
  annotations: {}
  volumes: []
  nodeSelector: {}
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - test-node
  tolerations:
    - key: "airflow"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  priorityClassName: ""
  podSecurityContext: {}
  topologySpreadConstraints: []
  env: []
  envFrom: []
  volumeMounts: []
  resources: {}
  securityContext: {}
  sidecars: []
  podDisruptionBudget:
    enable: false
    minAvailable: 1
  pprof:
    enable: false
    port: 6060
    portName: pprof
  workqueueRateLimiter:
    bucketQPS: 50
    bucketSize: 500
    maxDelay:
      enable: true
      duration: 6h

webhook:
  enable: true
  replicas: 1
  logLevel: info
  port: 9443
  portName: webhook
  failurePolicy: Fail
  timeoutSeconds: 10
  resourceQuotaEnforcement:
    enable: false
  serviceAccount:
    create: true
    name: ""
    annotations: {}
  rbac:
    create: true
    annotations: {}
  labels: {}
  annotations: {}
  sidecars: []
  volumes: []
  nodeSelector: {}
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - test-node
  tolerations:
    - key: "airflow"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  priorityClassName: ""
  podSecurityContext: {}
  topologySpreadConstraints: []
  env: []
  envFrom: []
  volumeMounts: []
  resources: {}
  securityContext: {}
  podDisruptionBudget:
    enable: false
    minAvailable: 1

spark:
  jobNamespaces:
  - "airflow"
  serviceAccount:
    create: true
    name: ""
    annotations: {}
  rbac:
    create: true
    annotations: {}

prometheus:
  metrics:
    enable: true
    port: 8080
    portName: metrics
    endpoint: /metrics
    prefix: ""
  podMonitor:
    create: true
    labels: {}
    jobLabel: spark-operator-podmonitor
    podMetricsEndpoint:
      scheme: http
      interval: 5s

And right after that, if I run a DAG from Airflow, as a result I have a POD spark-submit which fails with the next error:

Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '35324a3b-9f01-4c3b-bf56-445ea8746423', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '8bae74e0-9f4b-483f-8878-77b94fe77097', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'b1662841-0cf0-4ed4-8ade-b34262bca683', 'Date': 'Fri, 18 Oct 2024 08:05:50 GMT', 'Content-Length': '483'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"sparkapplications.sparkoperator.k8s.io \"spark-submit-soyzhqvo\" is forbidden: User \"system:serviceaccount:transgran-spreads:airflow-worker\" cannot get resource \"sparkapplications/status\" in API group \"sparkoperator.k8s.io\" in the namespace \"airflow\"","reason":"Forbidden","details":{"name":"spark-submit-soyzhqvo","group":"sparkoperator.k8s.io","kind":"sparkapplications"},"code":403}

This can be fixed by adding:

  • to airflow-pod-launcher-role (Role)
- apiGroups:
  - sparkoperator.k8s.io
  resources:
  - '*'
  verbs:
  - '*'
  • to spark-operator-spark (RoleBinding):
- kind: ServiceAccount
  name: default
  namespace: airflow

With all this above, I'd like to ask why this fixes wasn't added by the helm chart ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants