You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Distributed View is not available, and I think due to this error
E0618 17:14:15.276058 131298609845824 loader.py:150] Number of communication kernels don't match between workers in run: gpu_resnet50_cifar10_ddp_batch512_precision32_nodes3
Data Visualization Env:
MacBook Air M2
OS: Version 14.5 (23F79)
tensorboard==2.17.0
tensorboard-data-server==0.7.2
tensorboard_plugin_profile==2.15.1
tensorboardX==2.6.2.2
torch-tb-profiler==0.4.3
The text was updated successfully, but these errors were encountered:
oabuhamdan
changed the title
Number of communication kernels don't match between workers in run
[BUG] Number of communication kernels don't match between workers in run
Jun 18, 2024
Distributed View is not available, and I think due to this error
Data Collection Env:
Python version: 3.11.7
GCC (GCC) 12.2.0
Torch: '2.3.1+cu121'
PyTorch lightning: '2.3.0'
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.8.2003 (Core)
Release: 7.8.2003
Codename: Core
SLURM environment
Cuda 12.4.1
DeepSpeed 0.14.3
Data Visualization Env:
MacBook Air M2
OS: Version 14.5 (23F79)
tensorboard==2.17.0
tensorboard-data-server==0.7.2
tensorboard_plugin_profile==2.15.1
tensorboardX==2.6.2.2
torch-tb-profiler==0.4.3
The text was updated successfully, but these errors were encountered: