Skip to content

RuntimeError when training GRPO with LoRA and PtEngine #5653

@chenjianhuii

Description

@chenjianhuii

Describe the bug

Image

Your hardware and system info
OS CentOS 7
CPU x86
python 3.10.6
ms_swift 3.8.0.dev0
torch 2.6.0
transformers 4.55.4
trl 0.20.0
peft 0.17.1
cuda driver version 12.4

Additional context

My script

export CUDA_VISIBLE_DEVICES=0

torchrun
--nproc_per_node=1
--nnodes=1
--node_rank=0
swift/cli/rlhf.py
--rlhf_type grpo
--do_train
--model xxx
--model_type qwen2_5
--train_type lora
--dataset xxx
--torch_dtype bfloat16
--num_train_epochs 2
--max_length 8192
--use_vllm false
--per_device_train_batch_size 2
--learning_rate 2e-5
--save_total_limit 1
--logging_steps 5
--output_dir xxx
--gradient_accumulation_steps 4
--warmup_ratio 0.05
--dataloader_num_workers 8
--max_completion_length 2048
--reward_funcs turn_repetition,soft_length,heuristic,repetition
--soft_max_length 120
--soft_cache_length 20
--num_generations 8
--temperature 1.0
--top_p 0.85
--deepspeed zero3_offload
--log_completions true
--ignore_args_error
--report_to tensorboard
--ds3_gather_for_generation false
--save_strategy epoch

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions