-
Notifications
You must be signed in to change notification settings - Fork 905
Description
Describe the bug

Your hardware and system info
OS CentOS 7
CPU x86
python 3.10.6
ms_swift 3.8.0.dev0
torch 2.6.0
transformers 4.55.4
trl 0.20.0
peft 0.17.1
cuda driver version 12.4
Additional context
My script
export CUDA_VISIBLE_DEVICES=0
torchrun
--nproc_per_node=1
--nnodes=1
--node_rank=0
swift/cli/rlhf.py
--rlhf_type grpo
--do_train
--model xxx
--model_type qwen2_5
--train_type lora
--dataset xxx
--torch_dtype bfloat16
--num_train_epochs 2
--max_length 8192
--use_vllm false
--per_device_train_batch_size 2
--learning_rate 2e-5
--save_total_limit 1
--logging_steps 5
--output_dir xxx
--gradient_accumulation_steps 4
--warmup_ratio 0.05
--dataloader_num_workers 8
--max_completion_length 2048
--reward_funcs turn_repetition,soft_length,heuristic,repetition
--soft_max_length 120
--soft_cache_length 20
--num_generations 8
--temperature 1.0
--top_p 0.85
--deepspeed zero3_offload
--log_completions true
--ignore_args_error
--report_to tensorboard
--ds3_gather_for_generation false
--save_strategy epoch