Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeroDivisionError: division by zero #1568

Open
thangnv02 opened this issue Jan 21, 2025 · 3 comments
Open

ZeroDivisionError: division by zero #1568

thangnv02 opened this issue Jan 21, 2025 · 3 comments

Comments

@thangnv02
Copy link

thangnv02 commented Jan 21, 2025

i try to train according this introduction: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma2_(9B)-Alpaca.ipynb#scrollTo=LjY75GoYUCB8
But im facing with this error:

triton_bmm_14 9.8560 ms 45.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=4, num_warps=8
  triton_bmm_10 10.7090 ms 41.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=4, num_warps=8
  triton_bmm_5 11.5794 ms 38.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=2, num_warps=4
  triton_bmm_13 11.7135 ms 38.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=3, num_warps=4
  triton_bmm_15 11.8303 ms 38.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=2, num_warps=8
  triton_bmm_18 11.9009 ms 37.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=5, num_warps=8
  triton_bmm_6 12.9382 ms 34.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=2, num_warps=4
  triton_bmm_9 13.1103 ms 34.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=3, num_warps=4
  triton_bmm_3 15.4839 ms 29.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=5, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 7.0911 seconds and 0.0009 seconds precompiling
AUTOTUNE bmm(256x2048x2048, 256x2048x256

File "/work/20010985/rag/train_llm/training_gemma.py", line 164, in <module>
    trainner_result = trainer.train()
  File "<string>", line 157, in train
  File "<string>", line 382, in _fast_inner_training_loop
  File "<string>", line 64, in _unsloth_training_step
  File "/home/20010985/anaconda3/envs/thang/lib/python3.10/site-packages/accelerate/accelerator.py", line 2248, in backward
    loss.backward(**kwargs)
  File "/home/20010985/anaconda3/envs/thang/lib/python3.10/site-packages/torch/_tensor.py", line 581, in backward
    torch.autograd.backward(
  File "/home/20010985/anaconda3/envs/thang/lib/python3.10/site-packages/torch/autograd/__init__.py", line 347, in backward
    _engine_run_backward(
  File "/home/20010985/anaconda3/envs/thang/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/20010985/anaconda3/envs/thang/lib/python3.10/site-packages/torch/autograd/function.py", line 307, in apply
    return user_fn(self, *args)
  File "/home/20010985/anaconda3/envs/thang/lib/python3.10/site-packages/cut_cross_entropy/cce.py", line 94, in backward
    grad_scale = 1 / lse.numel()
ZeroDivisionError: division by zero

PS: its normally when i train with other model
can someone help me

@thangnv02
Copy link
Author

additionally, the unsloth raised:

Unsloth: Most labels in your dataset are -100. Training losses will be all 0.
For example, are you sure you used `train_on_responses_only` correctly?
Or did you mask our tokens incorrectly? Maybe this is intended?

@danielhanchen
Copy link
Contributor

@thangnv02 Apologies on the delay - did you happen to use train_on_responses_only? If so, the correct chat template needs to be provided otherwise all labels will be -100, and hence the error - I will however add a check to error out for lse.numel()

@thangnv02
Copy link
Author

@danielhanchen so what must i do ?
turn off train_on_responses_only ?
why it happends with gemma llm ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants