You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Body:
Hello! I encountered an error while trying to convert my Zero-3 fine-tuned checkpoint using the zero_to_fp32.py script. The error message is:
PytorchStreamReader failed reading zip archive: not a ZIP archive
Steps to Reproduce:
1.Fine-tuned a model with DeepSpeed ZeRO-3 (config attached).
2.Generated checkpoint files in the checkpoint-40/pytorch_model/ directory.
3.Ran the conversion script: bash
python zero_to_fp32.py ./checkpoint-40/ ./output/ --safe_serialization
Received the error about the ZIP archive format.
[2025-03-18 15:42:32,765] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Processing zero checkpoint '/home/dell/hzy/my_video/checkpoint-40/pytorch_model'
Loading checkpoint shards: 0%| | 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 755, in
convert_zero_checkpoint_to_fp32_state_dict(args.checkpoint_dir,
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 632, in convert_zero_checkpoint_to_fp32_state_dict
state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 591, in get_fp32_state_dict_from_zero_checkpoint
state_dict = _get_fp32_state_dict_from_zero_checkpoint(ds_checkpoint_dir, exclude_frozen_parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 199, in _get_fp32_state_dict_from_zero_checkpoint
zero_stage, world_size, fp32_flat_groups = parse_optim_states(optim_files, ds_checkpoint_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 152, in parse_optim_states
state_dict = torch.load(f, map_location=device, mmap=True, weights_only=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/anaconda3/envs/CogVideo-main/lib/python3.11/site-packages/torch/serialization.py", line 1326, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/anaconda3/envs/CogVideo-main/lib/python3.11/site-packages/torch/serialization.py", line 671, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: PytorchStreamReader failed reading zip archive: not a ZIP archive
The text was updated successfully, but these errors were encountered:
This seems a bit odd, I don't know why it would try to read a ZIP archive (and your parameters appear to be fine), I suggest you check the deepspeed repo to see if there are any related issues.
Body:
Hello! I encountered an error while trying to convert my Zero-3 fine-tuned checkpoint using the zero_to_fp32.py script. The error message is:
PytorchStreamReader failed reading zip archive: not a ZIP archive
Steps to Reproduce:
1.Fine-tuned a model with DeepSpeed ZeRO-3 (config attached).
2.Generated checkpoint files in the checkpoint-40/pytorch_model/ directory.
3.Ran the conversion script:
bash
python zero_to_fp32.py ./checkpoint-40/ ./output/ --safe_serialization
Received the error about the ZIP archive format.
(CogVideo-main) dell@dell-DSS8440:~/hzy/my_video/checkpoint-40$ python zero_to_fp32.py \
[2025-03-18 15:42:32,765] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Processing zero checkpoint '/home/dell/hzy/my_video/checkpoint-40/pytorch_model'
Loading checkpoint shards: 0%| | 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 755, in
convert_zero_checkpoint_to_fp32_state_dict(args.checkpoint_dir,
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 632, in convert_zero_checkpoint_to_fp32_state_dict
state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 591, in get_fp32_state_dict_from_zero_checkpoint
state_dict = _get_fp32_state_dict_from_zero_checkpoint(ds_checkpoint_dir, exclude_frozen_parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 199, in _get_fp32_state_dict_from_zero_checkpoint
zero_stage, world_size, fp32_flat_groups = parse_optim_states(optim_files, ds_checkpoint_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 152, in parse_optim_states
state_dict = torch.load(f, map_location=device, mmap=True, weights_only=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/anaconda3/envs/CogVideo-main/lib/python3.11/site-packages/torch/serialization.py", line 1326, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/anaconda3/envs/CogVideo-main/lib/python3.11/site-packages/torch/serialization.py", line 671, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: PytorchStreamReader failed reading zip archive: not a ZIP archive
The text was updated successfully, but these errors were encountered: