Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when converting Zero-3 checkpoint: PytorchStreamReader failed reading zip archive: not a ZIP archive #740

Open
12344143213 opened this issue Mar 18, 2025 · 1 comment
Assignees

Comments

@12344143213
Copy link

Body:
Hello! I encountered an error while trying to convert my Zero-3 fine-tuned checkpoint using the zero_to_fp32.py script. The error message is:

PytorchStreamReader failed reading zip archive: not a ZIP archive
Steps to Reproduce:

1.Fine-tuned a model with DeepSpeed ZeRO-3 (config attached).
2.Generated checkpoint files in the checkpoint-40/pytorch_model/ directory.
3.Ran the conversion script:
bash
python zero_to_fp32.py ./checkpoint-40/ ./output/ --safe_serialization
Received the error about the ZIP archive format.


(CogVideo-main) dell@dell-DSS8440:~/hzy/my_video/checkpoint-40$ python zero_to_fp32.py \

"/home/dell/hzy/my_video/checkpoint-40/" \
"/home/dell/hzy/my_video/output/" \
--tag "pytorch_model" \
--safe_serialization

[2025-03-18 15:42:32,765] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Processing zero checkpoint '/home/dell/hzy/my_video/checkpoint-40/pytorch_model'
Loading checkpoint shards: 0%| | 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 755, in
convert_zero_checkpoint_to_fp32_state_dict(args.checkpoint_dir,
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 632, in convert_zero_checkpoint_to_fp32_state_dict
state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 591, in get_fp32_state_dict_from_zero_checkpoint
state_dict = _get_fp32_state_dict_from_zero_checkpoint(ds_checkpoint_dir, exclude_frozen_parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 199, in _get_fp32_state_dict_from_zero_checkpoint
zero_stage, world_size, fp32_flat_groups = parse_optim_states(optim_files, ds_checkpoint_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 152, in parse_optim_states
state_dict = torch.load(f, map_location=device, mmap=True, weights_only=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/anaconda3/envs/CogVideo-main/lib/python3.11/site-packages/torch/serialization.py", line 1326, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dell/anaconda3/envs/CogVideo-main/lib/python3.11/site-packages/torch/serialization.py", line 671, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: PytorchStreamReader failed reading zip archive: not a ZIP archive

@OleehyO
Copy link
Collaborator

OleehyO commented Mar 21, 2025

This seems a bit odd, I don't know why it would try to read a ZIP archive (and your parameters appear to be fine), I suggest you check the deepspeed repo to see if there are any related issues.

@OleehyO OleehyO self-assigned this Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants