Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error splitting the input into NAL units. #7087

Open
1 task done
MengHao666 opened this issue Feb 26, 2025 · 0 comments
Open
1 task done

Error splitting the input into NAL units. #7087

MengHao666 opened this issue Feb 26, 2025 · 0 comments
Labels
bug Something isn't working pending This problem is yet to be addressed

Comments

@MengHao666
Copy link

MengHao666 commented Feb 26, 2025

Reminder

  • I have read the above rules and searched the existing issues.

System Info

latest llamafactory version

Reproduction

I am trying to finetune qwen2.5-vl on 16 * 80G GPUS, and I followed the official script in examplefolder and set preprocessing_num_workers=16. However, I met the following error and the program seem to got crush.

The error logging is like following:

Converting format of dataset (num_proc=16): 100%|█████████▉| 19265/19267 [11:44<00:00,  5.88 examples/s]
Converting format of dataset (num_proc=16): 100%|█████████▉| 19266/19267 [11:44<00:00,  5.02 examples/s]
Converting format of dataset (num_proc=16): 100%|██████████| 19267/19267 [11:44<00:00,  5.44 examples/s]
Converting format of dataset (num_proc=16): 100%|██████████| 19267/19267 [11:44<00:00, 27.34 examples/s]

Running tokenizer on dataset (num_proc=16):   0%|          | 0/19267 [00:00<?, ? examples/s]
Invalid NAL unit size (45405 > 35540).
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (7131 > 3225).
missing picture in access unit with size 54860
Invalid NAL unit size (48042 > 33645).
missing picture in access unit with size 3229
missing picture in access unit with size 33649
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (48042 > 33645).
Error splitting the input into NAL units.
missing picture in access unit with size 35544
Invalid NAL unit size (45405 > 35540).
Error splitting the input into NAL units.
Error splitting the input into NAL units.
Invalid NAL unit size (8187 > 7069).
missing picture in access unit with size 7073
Invalid NAL unit size (8187 > 7069).
Error splitting the input into NAL units.
Invalid NAL unit size (7131 > 3225).
Error splitting the input into NAL units.
Invalid NAL unit size (14013 > 5998).
missing picture in access unit with size 6002
Invalid NAL unit size (14013 > 5998).
Error splitting the input into NAL units.
Invalid NAL unit size (17173 > 7231).
missing picture in access unit with size 7235
Invalid NAL unit size (17173 > 7231).
Error splitting the input into NAL units.
Invalid NAL unit size (16964 > 6055).
missing picture in access unit with size 6059
Invalid NAL unit size (16964 > 6055).
Exception in thread Thread-9 (accepter)Error splitting the input into NAL units.
:
Traceback (most recent call last):
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 1016, in _bootstrap_inner

Running tokenizer on dataset (num_proc=16):   0%|          | 0/19267 [13:22<?, ? examples/s]    self.run()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 953, in run
    Invalid NAL unit size (7032 > 2927).
missing picture in access unit with size 2931
self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/multiprocess/managers.py", line 194, in accepter
Invalid NAL unit size (7032 > 2927).
Error splitting the input into NAL units.
    t.start()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 935, in start
    Invalid NAL unit size (28973 > 6121).
missing picture in access unit with size 6125
_start_new_thread(self._bootstrap, ())Invalid NAL unit size (28973 > 6121).

RuntimeError: can't start new threadError splitting the input into NAL units.

Invalid NAL unit size (4411 > 296).
missing picture in access unit with size 300
Invalid NAL unit size (4411 > 296).
Error splitting the input into NAL units.
Invalid NAL unit size (14414 > 1471).
missing picture in access unit with size 1475
Invalid NAL unit size (14414 > 1471).
Error splitting the input into NAL units.
Invalid NAL unit size (5283 > 1792).
missing picture in access unit with size 1796
Invalid NAL unit size (5283 > 1792).
Error splitting the input into NAL units.
Invalid NAL unit size (79147 > 10042).
missing picture in access unit with size 10046
Invalid NAL unit size (79147 > 10042).
Error splitting the input into NAL units.
Invalid NAL unit size (45405 > 35540).
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (7131 > 3225).
missing picture in access unit with size 54860
Invalid NAL unit size (48042 > 33645).
missing picture in access unit with size 3229
missing picture in access unit with size 33649
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (48042 > 33645).
Error splitting the input into NAL units.
missing picture in access unit with size 35544
Invalid NAL unit size (45405 > 35540).
Error splitting the input into NAL units.
Error splitting the input into NAL units.
Invalid NAL unit size (8187 > 7069).
missing picture in access unit with size 7073
Invalid NAL unit size (8187 > 7069).
Error splitting the input into NAL units.
Invalid NAL unit size (7131 > 3225).
Error splitting the input into NAL units.
Invalid NAL unit size (14013 > 5998).
missing picture in access unit with size 6002
Invalid NAL unit size (14013 > 5998).
Error splitting the input into NAL units.
Invalid NAL unit size (17173 > 7231).
missing picture in access unit with size 7235
Invalid NAL unit size (17173 > 7231).
Error splitting the input into NAL units.
Invalid NAL unit size (16964 > 6055).
missing picture in access unit with size 6059
Invalid NAL unit size (16964 > 6055).
Exception in thread Thread-9 (accepter)Error splitting the input into NAL units.
:
Traceback (most recent call last):
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 1016, in _bootstrap_inner

Running tokenizer on dataset (num_proc=16):   0%|          | 0/19267 [13:22<?, ? examples/s]    self.run()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 953, in run
    Invalid NAL unit size (7032 > 2927).
missing picture in access unit with size 2931
self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/multiprocess/managers.py", line 194, in accepter
Invalid NAL unit size (7032 > 2927).
Error splitting the input into NAL units.
    t.start()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 935, in start
    Invalid NAL unit size (28973 > 6121).
missing picture in access unit with size 6125
_start_new_thread(self._bootstrap, ())Invalid NAL unit size (28973 > 6121).

RuntimeError: can't start new threadError splitting the input into NAL units.

Invalid NAL unit size (4411 > 296).
missing picture in access unit with size 300
Invalid NAL unit size (4411 > 296).
Error splitting the input into NAL units.
Invalid NAL unit size (14414 > 1471).
missing picture in access unit with size 1475
Invalid NAL unit size (14414 > 1471).
Error splitting the input into NAL units.
Invalid NAL unit size (5283 > 1792).
missing picture in access unit with size 1796
Invalid NAL unit size (5283 > 1792).
Error splitting the input into NAL units.
Invalid NAL unit size (79147 > 10042).
missing picture in access unit with size 10046
Invalid NAL unit size (79147 > 10042).
Error splitting the input into NAL units.
Invalid NAL unit size (45405 > 35540).
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (7131 > 3225).
missing picture in access unit with size 54860
Invalid NAL unit size (48042 > 33645).
missing picture in access unit with size 3229
missing picture in access unit with size 33649
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (48042 > 33645).
Error splitting the input into NAL units.
missing picture in access unit with size 35544
Invalid NAL unit size (45405 > 35540).
Error splitting the input into NAL units.
Error splitting the input into NAL units.
Invalid NAL unit size (8187 > 7069).
missing picture in access unit with size 7073
Invalid NAL unit size (8187 > 7069).
Error splitting the input into NAL units.
Invalid NAL unit size (7131 > 3225).
Error splitting the input into NAL units.
Invalid NAL unit size (14013 > 5998).
missing picture in access unit with size 6002
Invalid NAL unit size (14013 > 5998).
Error splitting the input into NAL units.
Invalid NAL unit size (17173 > 7231).
missing picture in access unit with size 7235
Invalid NAL unit size (17173 > 7231).
Error splitting the input into NAL units.
Invalid NAL unit size (16964 > 6055).
missing picture in access unit with size 6059
Invalid NAL unit size (16964 > 6055).
Exception in thread Thread-9 (accepter)Error splitting the input into NAL units.
:
Traceback (most recent call last):
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 1016, in _bootstrap_inner

Running tokenizer on dataset (num_proc=16):   0%|          | 0/19267 [13:22<?, ? examples/s]    self.run()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 953, in run
    Invalid NAL unit size (7032 > 2927).
missing picture in access unit with size 2931
self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/multiprocess/managers.py", line 194, in accepter
Invalid NAL unit size (7032 > 2927).
Error splitting the input into NAL units.
    t.start()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 935, in start
    Invalid NAL unit size (28973 > 6121).
missing picture in access unit with size 6125
_start_new_thread(self._bootstrap, ())Invalid NAL unit size (28973 > 6121).

RuntimeError: can't start new threadError splitting the input into NAL units.

Invalid NAL unit size (4411 > 296).
missing picture in access unit with size 300
Invalid NAL unit size (4411 > 296).
Error splitting the input into NAL units.
Invalid NAL unit size (14414 > 1471).
missing picture in access unit with size 1475
Invalid NAL unit size (14414 > 1471).
Error splitting the input into NAL units.
Invalid NAL unit size (5283 > 1792).
missing picture in access unit with size 1796
Invalid NAL unit size (5283 > 1792).
Error splitting the input into NAL units.
Invalid NAL unit size (79147 > 10042).
missing picture in access unit with size 10046
Invalid NAL unit size (79147 > 10042).
Error splitting the input into NAL units.
Invalid NAL unit size (45405 > 35540).
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (7131 > 3225).
missing picture in access unit with size 54860
Invalid NAL unit size (48042 > 33645).
missing picture in access unit with size 3229
missing picture in access unit with size 33649
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (48042 > 33645).
Error splitting the input into NAL units.
missing picture in access unit with size 35544
Invalid NAL unit size (45405 > 35540).
Error splitting the input into NAL units.
Error splitting the input into NAL units.
Invalid NAL unit size (8187 > 7069).
missing picture in access unit with size 7073
Invalid NAL unit size (8187 > 7069).
Error splitting the input into NAL units.
Invalid NAL unit size (7131 > 3225).
Error splitting the input into NAL units.
Invalid NAL unit size (14013 > 5998).
missing picture in access unit with size 6002
Invalid NAL unit size (14013 > 5998).
Error splitting the input into NAL units.
Invalid NAL unit size (17173 > 7231).
missing picture in access unit with size 7235
Invalid NAL unit size (17173 > 7231).
Error splitting the input into NAL units.
Invalid NAL unit size (16964 > 6055).
missing picture in access unit with size 6059
Invalid NAL unit size (16964 > 6055).
Exception in thread Thread-9 (accepter)Error splitting the input into NAL units.
:
Traceback (most recent call last):
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 1016, in _bootstrap_inner

Running tokenizer on dataset (num_proc=16):   0%|          | 0/19267 [13:22<?, ? examples/s]    self.run()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 953, in run
    Invalid NAL unit size (7032 > 2927).
missing picture in access unit with size 2931
self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/multiprocess/managers.py", line 194, in accepter
Invalid NAL unit size (7032 > 2927).
Error splitting the input into NAL units.
    t.start()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 935, in start
    Invalid NAL unit size (28973 > 6121).
missing picture in access unit with size 6125
_start_new_thread(self._bootstrap, ())Invalid NAL unit size (28973 > 6121).

RuntimeError: can't start new threadError splitting the input into NAL units.

Invalid NAL unit size (4411 > 296).
missing picture in access unit with size 300
Invalid NAL unit size (4411 > 296).
Error splitting the input into NAL units.
Invalid NAL unit size (14414 > 1471).
missing picture in access unit with size 1475
Invalid NAL unit size (14414 > 1471).
Error splitting the input into NAL units.
Invalid NAL unit size (5283 > 1792).
missing picture in access unit with size 1796
Invalid NAL unit size (5283 > 1792).
Error splitting the input into NAL units.
Invalid NAL unit size (79147 > 10042).
missing picture in access unit with size 10046
Invalid NAL unit size (79147 > 10042).
Error splitting the input into NAL units.

Others

No response

@MengHao666 MengHao666 added bug Something isn't working pending This problem is yet to be addressed labels Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant