You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I currently have a local spark cluster 3.0 which consists of 3 machines. Two machines have 2 NVIDIA GPUS and One machine is the spark client master which has no NVIDIA GPU.
When I create a spark cluster, I see it recognizes the GPUs as resources on the dashboard.
I'm trying to run the example posted for the Spark Distributor Tensorflow page.
When I create a spark context:
I see that the GPUs are being utilized as resource executors.
However, when I run the following:
MirroredStrategyRunner(num_slots=8).run(train)
It results in the following errors:
raise ValueError(f'Found GPU addresses {addresses} which '
ValueError: Found GPU addresses [''] which are not all in the correct format for CUDA_VISIBLE_DEVICES, which requires integers with no zero padding.
I'm not sure why it wasn't able to detect the GPUs on the remote machines.
The text was updated successfully, but these errors were encountered:
I currently have a local spark cluster 3.0 which consists of 3 machines. Two machines have 2 NVIDIA GPUS and One machine is the spark client master which has no NVIDIA GPU.
When I create a spark cluster, I see it recognizes the GPUs as resources on the dashboard.
I'm trying to run the example posted for the Spark Distributor Tensorflow page.
When I create a spark context:
I see that the GPUs are being utilized as resource executors.
However, when I run the following:
It results in the following errors:
I'm not sure why it wasn't able to detect the GPUs on the remote machines.
The text was updated successfully, but these errors were encountered: