Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod. #2114

Open
1 task
kokibas opened this issue Aug 6, 2024 · 5 comments
Labels
question Further information is requested

Comments

@kokibas
Copy link

kokibas commented Aug 6, 2024

  • ✋ I have searched the open/closed issues and my issue is not listed.

I am trying to deploy a Spark application on Kubernetes using the Spark Operator, but I'm encountering an issue related to CPU allocation for my executor pods. Below is my current SparkApplication YAML configuration
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-test-workorderpnhz
namespace: default
spec:
type: Scala
mode: cluster
image: "10.123.13.133:8082/repository/spark-test:latest"
imagePullSecrets: ["nexus-registry-secret"]
mainClass: org.example.WorkorderPnhz
mainApplicationFile: "local:///app/app.jar"
sparkVersion: "3.1.1"
restartPolicy:
type: Never
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.1.1
serviceAccount: spark-operator-spark
volumeMounts:
- name: spark-conf-volume-driver
mountPath: /opt/spark/conf
javaOptions: "-Dconfig.file=/opt/spark/conf/application.conf"
podName: "spark-test-workorderpnhz"
envVars:
MY_ENV_VAR: "value"
executor:
cores: 1
coreLimit: "1200m"
instances: 1
memory: "512m"
labels:
version: 3.1.1
volumeMounts:
- name: spark-conf-volume-exec
mountPath: /opt/spark/conf
envVars:
SPARK_EXECUTOR_MEMORY: "512Mi"
SPARK_EXECUTOR_CORES: "1"
SPARK_EXECUTOR_INSTANCES: "2"
SPARK_EXECUTOR_MEMORYOVERHEAD: "500m"
dynamicAllocation:
enabled: true
initialExecutors: 1
maxExecutors: 2
minExecutors: 1
sparkConf:
"spark.dynamicAllocation.executorIdleTimeout": "60s"
"spark.shuffle.service.enabled": "true"
volumes:
- name: spark-conf-volume-driver
configMap:
name: spark-drv-conf-map
- name: spark-conf-volume-exec
configMap:
name: spark-exec-conf-map

When I try to deploy this configuration, I receive the following error message:
0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
I have specified coreRequest and coreLimit for both the driver and executor, but the pods still seem to require more CPU resources than available.
The executor is set to use 1 core and a coreLimit of 1200m, but the node can't accommodate this.
I am using Spark Operator version v1beta2-1.4.2-3.5.0 with Kubernetes.

How can I adjust my SparkApplication YAML to ensure the executor pods are scheduled successfully with the available CPU resources? Are there any best practices for configuring CPU requests and limits for Spark executors on Kubernetes to avoid resource insufficiency errors?

@kokibas kokibas added the question Further information is requested label Aug 6, 2024
@ChenYi015
Copy link
Contributor

@kokibas The error message indicates that your Kubernetes cluster doesn't have enough CPU resources to schedule the Spark pods. You can use kubectl top node to checkout the node capacity. You need to scale your Kubernetes cluster or you can reduce the CPU request for your spark pods if possible.

@kokibas
Copy link
Author

kokibas commented Aug 7, 2024

@ChenYi015 Thank you for the response. I ran kubectl top node, and here are the results:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% kuber.spark.io 1168m 14% 11189Mi 70%
As you can see, the node is using 1168m CPU (14%) and 11189Mi memory (70%).

In addition, my application logs show the following message:
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources. It seems like the cluster might not have enough resources allocated to run the application, especially regarding memory. Could this be causing the issue? What would be the best way to resolve this, given the current resource usage?

I have also tried to reduce the CPU consumption for the Spark executors by setting coreRequest and coreLimit to lower values (e.g., 400m and 800m), but they still seem to use 1 CPU each. I want to ensure that the executors are not over-allocating resources unnecessarily. Could there be an issue with how Spark Operator is interpreting these values?

@ChenYi015
Copy link
Contributor

@kokibas Currently, the driver.cores and executor.cores must be integers, that means driver.cores and executor.cores must be at least 1. So you will need at least 2 cpu core (i.e. 2000m) to run the application, but your cluster only have 1168m cpu, so you must scale up your cluster.

@ChenYi015
Copy link
Contributor

@kokibas Spark operator will exec spark-submit inside it, and when you run spark-submit --help, you will find only integers are supported to specify num of cores for driver and executors.

@kokibas
Copy link
Author

kokibas commented Aug 7, 2024

@ChenYi015 Thank you for the clarification. I understand now that the driver.cores and executor.cores must be set to integer values, requiring at least 1 CPU core each. Given my current cluster's available CPU resources (1168m), it seems I don't have enough to meet the 2 CPU core requirement for the Spark application.

I will work on scaling up my cluster to ensure that enough CPU resources are available. If you have any further advice on optimizing resource usage or scaling best practices, I'd appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants