Unable to recognize GPU address - Spark Distributor Tensorflow #171

ghost · 2020-09-22T11:50:58Z

I currently have a local spark cluster 3.0 which consists of 3 machines. Two machines have 2 NVIDIA GPUS and One machine is the spark client master which has no NVIDIA GPU.
When I create a spark cluster, I see it recognizes the GPUs as resources on the dashboard.
I'm trying to run the example posted for the Spark Distributor Tensorflow page.
When I create a spark context:

sc = pyspark.SparkContext(master = "spark://192.168.1.113:7077", 
                         appName="Spark GPU"
                          )

I see that the GPUs are being utilized as resource executors.

However, when I run the following:

MirroredStrategyRunner(num_slots=8).run(train)

It results in the following errors:

raise ValueError(f'Found GPU addresses {addresses} which '
ValueError: Found GPU addresses [''] which are not all in the correct format for CUDA_VISIBLE_DEVICES, which requires integers with no zero padding.

I'm not sure why it wasn't able to detect the GPUs on the remote machines.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to recognize GPU address - Spark Distributor Tensorflow #171

Unable to recognize GPU address - Spark Distributor Tensorflow #171

ghost commented Sep 22, 2020

Unable to recognize GPU address - Spark Distributor Tensorflow #171

Unable to recognize GPU address - Spark Distributor Tensorflow #171

Comments

ghost commented Sep 22, 2020