You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The relevant code that caused the error is in the Controllable Text Generation section, after the model trained for 6 epochs and started evaluating, it raised a KeyError: 'eval_loss'
#65
Open
Markkk111 opened this issue
Apr 19, 2023
· 2 comments
You're welcome! I'm glad to assist you with this question!
The error message is as follows:
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
{'loss': 0.0185, 'learning_rate': 7.973102785782901e-06, 'epoch': 5.04}
{'loss': 0.0185, 'learning_rate': 6.9724623759205894e-06, 'epoch': 5.16}
{'loss': 0.0188, 'learning_rate': 5.971821966058277e-06, 'epoch': 5.28}
{'loss': 0.0178, 'learning_rate': 4.971181556195966e-06, 'epoch': 5.4}
wandb: Network error (ReadTimeout), entering retry loop.
{'loss': 0.0183, 'learning_rate': 3.970541146333654e-06, 'epoch': 5.52}
{'loss': 0.018, 'learning_rate': 2.9699007364713415e-06, 'epoch': 5.64}
{'loss': 0.0179, 'learning_rate': 1.96926032660903e-06, 'epoch': 5.76}
{'loss': 0.0174, 'learning_rate': 9.68619916746718e-07, 'epoch': 5.88}
[INFO|trainer.py:1901] 2023-04-19 17:43:28,689 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 1749.401, 'train_samples_per_second': 142.815, 'train_steps_per_second': 14.281, 'train_loss': 0.023130248267032992, 'epoch': 6.0}
[INFO|trainer.py:2709] 2023-04-19 17:43:28,693 >> Saving model checkpoint to classifier_models/e2e-tgt-tree_e=6_b=10_m=bert-base-uncased_wikitext-103-raw-v1_101_wp_None
[INFO|configuration_utils.py:453] 2023-04-19 17:43:28,694 >> Configuration saved in classifier_models/e2e-tgt-tree_e=6_b=10_m=bert-base-uncased_wikitext-103-raw-v1_101_wp_None/config.json
[INFO|modeling_utils.py:1704] 2023-04-19 17:43:29,841 >> Model weights saved in classifier_models/e2e-tgt-tree_e=6_b=10_m=bert-base-uncased_wikitext-103-raw-v1_101_wp_None/pytorch_model.bin
***** train metrics *****
epoch = 6.0
train_loss = 0.0231
train_runtime = 0:29:09.40
train_samples = 41640
train_samples_per_second = 142.815
train_steps_per_second = 14.281
04/19/2023 17:43:29 - INFO - main - *** Evaluate ***
[INFO|trainer.py:710] 2023-04-19 17:43:29,848 >> The following columns in the evaluation set don't have a corresponding argument in Classifier_Tree.forward and have been ignored: chart_lst. If chart_lst are not expected by Classifier_Tree.forward, you can safely ignore this message.
[INFO|trainer.py:2964] 2023-04-19 17:43:29,850 >> ***** Running Evaluation *****
[INFO|trainer.py:2966] 2023-04-19 17:43:29,850 >> Num examples = 421
[INFO|trainer.py:2969] 2023-04-19 17:43:29,851 >> Batch size = 10
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
{'eval_runtime': 1.4868, 'eval_samples_per_second': 283.16, 'eval_steps_per_second': 28.921, 'epoch': 6.0}
Traceback (most recent call last):
File "/home/name/diffusion-LM/transformers/examples/pytorch/language-modeling/run_clm.py", line 1704, in
main()
File "/home/name/diffusion-LM/transformers/examples/pytorch/language-modeling/run_clm.py", line 1675, in main
perplexity = math.exp(metrics["eval_loss"])
KeyError: 'eval_loss'
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
Exception ignored in atexit callback: <function _Manager._atexit_setup.. at 0x7f2f280f1fc0>
Traceback (most recent call last):
File "/home/name/anaconda3/lib/python3.10/site-packages/wandb/sdk/wandb_manager.py", line 166, in
self._atexit_lambda = lambda: self._atexit_teardown()
File "/home/name/anaconda3/lib/python3.10/site-packages/wandb/sdk/wandb_manager.py", line 175, in _atexit_teardown
self._teardown(exit_code)
File "/home/name/anaconda3/lib/python3.10/site-packages/wandb/sdk/wandb_manager.py", line 186, in _teardown
result = self._service.join()
File "/home/name/anaconda3/lib/python3.10/site-packages/wandb/sdk/service/service.py", line 216, in join
ret = self._internal_proc.wait()
File "/home/name/anaconda3/lib/python3.10/subprocess.py", line 1204, in wait
return self._wait(timeout=timeout)
File "/home/name/anaconda3/lib/python3.10/subprocess.py", line 1938, in _wait
(pid, sts) = self._try_wait(0)
File "/home/name/anaconda3/lib/python3.10/subprocess.py", line 1896, in _try_wait
(pid, sts) = os.waitpid(self.pid, wait_flags)
KeyboardInterrupt:
(diffusion-LM) name@taizun-SYS-4029GP-TRT:/diffusion-LM$ wandb: - 0.010 MB of 0.010 MB uploaded (0.(diffusion-LM) name@taizun-SYS-4029GP-TRT:/diffusion-LM$ wandb: / 0.010 MB of 0.010 MB uploaded (0.wandb: \ 0.010 MB of 0.010 MB uploaded (0.000 MB deduped)
The relevant code that caused the error is as follows:
Training
if training_args.do_train:
checkpoint = None
if training_args.resume_from_checkpoint is not None:
checkpoint = training_args.resume_from_checkpoint
elif last_checkpoint is not None:
checkpoint = last_checkpoint
train_result = trainer.train(resume_from_checkpoint=checkpoint)
trainer.save_model() # Saves the tokenizer too for easy upload
metrics = train_result.metrics
max_train_samples = (
data_args.max_train_samples if data_args.max_train_samples is not None else len(train_dataset)
)
metrics["train_samples"] = min(max_train_samples, len(train_dataset))
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)
trainer.save_state()
Evaluation
if training_args.do_eval:
logger.info("*** Evaluate ***")
kwargs = {"finetuned_from": model_args.model_name_or_path, "tasks": "text-generation"}
if data_args.dataset_name is not None:
kwargs["dataset_tags"] = data_args.dataset_name
if data_args.dataset_config_name is not None:
kwargs["dataset_args"] = data_args.dataset_config_name
kwargs["dataset"] = f"{data_args.dataset_name} {data_args.dataset_config_name}"
else:
kwargs["dataset"] = data_args.dataset_name
if training_args.push_to_hub:
trainer.push_to_hub(**kwargs)
else:
trainer.create_model_card(**kwargs)
The text was updated successfully, but these errors were encountered:
Hello! I see that you have the same problem as me, did you solve it? If solved, how to solve it? Looking forward to your answer, very anxious! @Markkk111
You're welcome! I'm glad to assist you with this question!
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
{'loss': 0.0185, 'learning_rate': 7.973102785782901e-06, 'epoch': 5.04}
{'loss': 0.0185, 'learning_rate': 6.9724623759205894e-06, 'epoch': 5.16}
{'loss': 0.0188, 'learning_rate': 5.971821966058277e-06, 'epoch': 5.28}
{'loss': 0.0178, 'learning_rate': 4.971181556195966e-06, 'epoch': 5.4}
wandb: Network error (ReadTimeout), entering retry loop.
{'loss': 0.0183, 'learning_rate': 3.970541146333654e-06, 'epoch': 5.52}
{'loss': 0.018, 'learning_rate': 2.9699007364713415e-06, 'epoch': 5.64}
{'loss': 0.0179, 'learning_rate': 1.96926032660903e-06, 'epoch': 5.76}
{'loss': 0.0174, 'learning_rate': 9.68619916746718e-07, 'epoch': 5.88}
[INFO|trainer.py:1901] 2023-04-19 17:43:28,689 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 1749.401, 'train_samples_per_second': 142.815, 'train_steps_per_second': 14.281, 'train_loss': 0.023130248267032992, 'epoch': 6.0}
[INFO|trainer.py:2709] 2023-04-19 17:43:28,693 >> Saving model checkpoint to classifier_models/e2e-tgt-tree_e=6_b=10_m=bert-base-uncased_wikitext-103-raw-v1_101_wp_None
[INFO|configuration_utils.py:453] 2023-04-19 17:43:28,694 >> Configuration saved in classifier_models/e2e-tgt-tree_e=6_b=10_m=bert-base-uncased_wikitext-103-raw-v1_101_wp_None/config.json
[INFO|modeling_utils.py:1704] 2023-04-19 17:43:29,841 >> Model weights saved in classifier_models/e2e-tgt-tree_e=6_b=10_m=bert-base-uncased_wikitext-103-raw-v1_101_wp_None/pytorch_model.bin
***** train metrics *****
epoch = 6.0
train_loss = 0.0231
train_runtime = 0:29:09.40
train_samples = 41640
train_samples_per_second = 142.815
train_steps_per_second = 14.281
04/19/2023 17:43:29 - INFO - main - *** Evaluate ***
[INFO|trainer.py:710] 2023-04-19 17:43:29,848 >> The following columns in the evaluation set don't have a corresponding argument in
Classifier_Tree.forward
and have been ignored: chart_lst. If chart_lst are not expected byClassifier_Tree.forward
, you can safely ignore this message.[INFO|trainer.py:2964] 2023-04-19 17:43:29,850 >> ***** Running Evaluation *****
[INFO|trainer.py:2966] 2023-04-19 17:43:29,850 >> Num examples = 421
[INFO|trainer.py:2969] 2023-04-19 17:43:29,851 >> Batch size = 10
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
{'eval_runtime': 1.4868, 'eval_samples_per_second': 283.16, 'eval_steps_per_second': 28.921, 'epoch': 6.0}
Traceback (most recent call last):
File "/home/name/diffusion-LM/transformers/examples/pytorch/language-modeling/run_clm.py", line 1704, in
main()
File "/home/name/diffusion-LM/transformers/examples/pytorch/language-modeling/run_clm.py", line 1675, in main
perplexity = math.exp(metrics["eval_loss"])
KeyError: 'eval_loss'
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
Exception ignored in atexit callback: <function _Manager._atexit_setup.. at 0x7f2f280f1fc0>
Traceback (most recent call last):
File "/home/name/anaconda3/lib/python3.10/site-packages/wandb/sdk/wandb_manager.py", line 166, in
self._atexit_lambda = lambda: self._atexit_teardown()
File "/home/name/anaconda3/lib/python3.10/site-packages/wandb/sdk/wandb_manager.py", line 175, in _atexit_teardown
self._teardown(exit_code)
File "/home/name/anaconda3/lib/python3.10/site-packages/wandb/sdk/wandb_manager.py", line 186, in _teardown
result = self._service.join()
File "/home/name/anaconda3/lib/python3.10/site-packages/wandb/sdk/service/service.py", line 216, in join
ret = self._internal_proc.wait()
File "/home/name/anaconda3/lib/python3.10/subprocess.py", line 1204, in wait
return self._wait(timeout=timeout)
File "/home/name/anaconda3/lib/python3.10/subprocess.py", line 1938, in _wait
(pid, sts) = self._try_wait(0)
File "/home/name/anaconda3/lib/python3.10/subprocess.py", line 1896, in _try_wait
(pid, sts) = os.waitpid(self.pid, wait_flags)
KeyboardInterrupt:
(diffusion-LM) name@taizun-SYS-4029GP-TRT:
/diffusion-LM$ wandb: - 0.010 MB of 0.010 MB uploaded (0.(diffusion-LM) name@taizun-SYS-4029GP-TRT:/diffusion-LM$ wandb: / 0.010 MB of 0.010 MB uploaded (0.wandb: \ 0.010 MB of 0.010 MB uploaded (0.000 MB deduped)The relevant code that caused the error is as follows:
Training
if training_args.do_train:
checkpoint = None
if training_args.resume_from_checkpoint is not None:
checkpoint = training_args.resume_from_checkpoint
elif last_checkpoint is not None:
checkpoint = last_checkpoint
train_result = trainer.train(resume_from_checkpoint=checkpoint)
trainer.save_model() # Saves the tokenizer too for easy upload
Evaluation
if training_args.do_eval:
logger.info("*** Evaluate ***")
kwargs = {"finetuned_from": model_args.model_name_or_path, "tasks": "text-generation"}
if data_args.dataset_name is not None:
kwargs["dataset_tags"] = data_args.dataset_name
if data_args.dataset_config_name is not None:
kwargs["dataset_args"] = data_args.dataset_config_name
kwargs["dataset"] = f"{data_args.dataset_name} {data_args.dataset_config_name}"
else:
kwargs["dataset"] = data_args.dataset_name
if training_args.push_to_hub:
trainer.push_to_hub(**kwargs)
else:
trainer.create_model_card(**kwargs)
The text was updated successfully, but these errors were encountered: