Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #4151
Unanswered
RamakrishnaChaitanya
asked this question in
General Q&A
Replies: 1 comment 5 replies
-
Can you check whether it still occurs in the fork? |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I'm trying to finetune the fastpitch model (custom model from IndicTTS) on the OpenSLR dataset. However, after precomputation, I'm getting an error stating that the "Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!". I'm using following command on the console -
CUDA_VISIBLE_DEVICES="7" python TTS/bin/train_tts_rk.py --config_path /data/Ramakrishna/Projects/TTS/Github_repos/en+hi/fastpitch/config_2.json --restore_path /data/Ramakrishna/Projects/TTS/Github_repos/en+hi/fastpitch/best_model.pth
I'm facing this issue in the embedding() definition which was implemented in functional.py.
Traceback (most recent call last): File "/data/Condaenvs/coqui-tts/lib/python3.10/site-packages/trainer/trainer.py", line 1833, in fit self._fit() File "/data/Condaenvs/coqui-tts/lib/python3.10/site-packages/trainer/trainer.py", line 1785, in _fit self.train_epoch() File "/data/Condaenvs/coqui-tts/lib/python3.10/site-packages/trainer/trainer.py", line 1504, in train_epoch outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time) File "/data/Condaenvs/coqui-tts/lib/python3.10/site-packages/trainer/trainer.py", line 1360, in train_step outputs, loss_dict_new, step_time = self.optimize( File "/data/Condaenvs/coqui-tts/lib/python3.10/site-packages/trainer/trainer.py", line 1226, in optimize outputs, loss_dict = self._compute_loss( File "/data/Condaenvs/coqui-tts/lib/python3.10/site-packages/trainer/trainer.py", line 1157, in _compute_loss outputs, loss_dict = self._model_train_step(batch, model, criterion) File "/data/Condaenvs/coqui-tts/lib/python3.10/site-packages/trainer/trainer.py", line 1116, in _model_train_step return model.train_step(*input_args) File "/data/Condaenvs/coqui-tts/lib/python3.10/site-packages/TTS/tts/models/forward_tts.py", line 729, in train_step outputs = self.forward( File "/data/Condaenvs/coqui-tts/lib/python3.10/site-packages/TTS/tts/models/forward_tts.py", line 616, in forward o_en, x_mask, g, x_emb = self._forward_encoder(x, x_mask, g) File "/data/Condaenvs/coqui-tts/lib/python3.10/site-packages/TTS/tts/models/forward_tts.py", line 401, in _forward_encoder g = self.emb_g(g) # [B, C, 1] File "/data/Condaenvs/coqui-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data/Condaenvs/coqui-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/data/Condaenvs/coqui-tts/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 190, in forward return F.embedding( File "/data/Condaenvs/coqui-tts/lib/python3.10/site-packages/torch/nn/functional.py", line 2559, in embedding return **torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)** RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
When I digged into each of the argument in torch.embedding(), I was getting something like
weight is GPU device - 0, input is on GPU device -1, padding_idx is on CPU, scale_grad_by_freq is on CPU and sparse is on CPU
.Even after explicitly setting CUDA_VISIBLE_DEVICES="7" (out of 8 GPU'S), Why 'm getting that one tensor is on GPU 0 and another tensor GPU -1? How to address this multiple devices issue? If anyone has faced similar issue earlier, kindly request you to help.
Just FYI, i'm using the original and unmaintained repository :(
Beta Was this translation helpful? Give feedback.
All reactions