Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
6301476
support only sync lora weight
hjh0119 Sep 8, 2025
c7be012
fix wip
hjh0119 Sep 8, 2025
22042fc
wip
hjh0119 Sep 8, 2025
1081caa
fix colocate lora
hjh0119 Sep 9, 2025
4c04d36
add lora for server wip
hjh0119 Sep 9, 2025
1982e9e
Merge branch 'lora+' of github.com:hjh0119/swift into lora+
hjh0119 Sep 9, 2025
5fe3690
fix import
hjh0119 Sep 9, 2025
161dac8
update extension path
hjh0119 Sep 9, 2025
0a14d20
override enable_lora for rollout
hjh0119 Sep 9, 2025
4574665
Merge branch 'lora+' of github.com:hjh0119/swift into lora+
hjh0119 Sep 9, 2025
efae3b2
catch rollout exception
hjh0119 Sep 9, 2025
f454598
fix lora request
hjh0119 Sep 9, 2025
d46bc1f
server wip
hjh0119 Sep 10, 2025
986ac8d
server add_lora wip
hjh0119 Sep 11, 2025
0dc8c6e
fix server tp
hjh0119 Sep 12, 2025
ba284ba
merge main
hjh0119 Sep 8, 2025
0f7ca2a
Merge branch 'lora+' of github.com:hjh0119/swift into lora+
hjh0119 Sep 12, 2025
849696f
doc wip
hjh0119 Sep 12, 2025
f70e827
doc
hjh0119 Sep 12, 2025
274f6db
check lora
hjh0119 Sep 12, 2025
f0b4de8
support only sync lora weight
hjh0119 Sep 12, 2025
6069888
add args for lora script
hjh0119 Sep 12, 2025
0cf5a62
update script
hjh0119 Sep 12, 2025
691a5df
fix
hjh0119 Sep 12, 2025
5cab78d
remove unused import
hjh0119 Sep 12, 2025
e43a0da
fix
hjh0119 Sep 12, 2025
688bf64
fix typo
hjh0119 Sep 12, 2025
4fa2d2f
fix unmerge
hjh0119 Sep 12, 2025
78f9473
wip
hjh0119 Sep 28, 2025
a50f756
Merge branch 'lora+' of github.com:hjh0119/swift into lora+
hjh0119 Sep 28, 2025
be00cf4
Merge remote-tracking branch 'origin' into lora+
hjh0119 Oct 9, 2025
6745666
bucket for full training in server mode
hjh0119 Oct 9, 2025
dfccf15
remove circle import
hjh0119 Oct 10, 2025
4652f77
fix TokenizerGroup removed in vllm 0.11.0
hjh0119 Oct 10, 2025
8c73590
rm comments
hjh0119 Oct 10, 2025
4389268
Merge branch 'lora+' of github.com:hjh0119/swift into lora+
hjh0119 Oct 10, 2025
cc56588
move model batches for full parameters
hjh0119 Oct 13, 2025
1360752
Merge branch 'lora+' of github.com:hjh0119/swift into lora+
hjh0119 Oct 13, 2025
1561e8f
fix lora training with rollout enable_lora
hjh0119 Oct 13, 2025
0f81605
Merge remote-tracking branch 'origin' into lora+
hjh0119 Oct 13, 2025
24220bf
check should merge adapter
hjh0119 Oct 14, 2025
c66a801
update accuracy & test accuracy & moe script
hjh0119 Oct 15, 2025
190c201
add test cases
hjh0119 Oct 16, 2025
c186507
colocate moe script & fix moe colocate lora training
hjh0119 Oct 16, 2025
b9a1827
add script
hjh0119 Oct 17, 2025
eb3e230
doc update
hjh0119 Oct 17, 2025
9b503a7
rm sricpt & update doc
hjh0119 Oct 17, 2025
67f36a6
streamline weight sync
hjh0119 Oct 17, 2025
a406f6d
clean comments and import
hjh0119 Oct 17, 2025
0380c08
add vllm_enable_lora for colocate
hjh0119 Oct 17, 2025
9f98473
fix zh comment in en doc
hjh0119 Oct 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion docs/source/BestPractices/GRPO代码训练.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,9 @@
```bash
CUDA_VISIBLE_DEVICES=7 \
swift rollout \
--model Qwen/Qwen2.5-7B-Instruct
--model Qwen/Qwen2.5-7B-Instruct \
--vllm_enable_lora true \
--vllm_max_lora_rank 16
```

```bash
Expand All @@ -61,6 +63,8 @@ swift rlhf \
--vllm_server_host 127.0.0.1 \
--vllm_server_port 8000 \
--train_type lora \
--lora_rank 16 \
--lora_alpha 32 \
--torch_dtype bfloat16 \
--dataset 'open-r1/verifiable-coding-problems-python-10k' \
--load_from_cache_file true \
Expand Down
26 changes: 25 additions & 1 deletion docs/source/Instruction/GRPO/GetStarted/GRPO.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ swift rollout \

更多 rollout 参数参考[vLLM参数](../../../Instruction/命令行参数.md#vllm参数)和[rollout 参数](../../../Instruction/命令行参数.md#rollout参数)

注意:在使用 use_async_engine 时,仅开启 DP 可能会导致错误,相关问题参考: [vllm issue](https://github.com/vllm-project/vllm/issues/18567)。如果出现错误,请尝试同时启用 TP 和 DP
注意:在使用 use_async_engine 时,仅开启 DP 可能会导致错误,相关问题参考: [vllm issue](https://github.com/vllm-project/vllm/issues/18567)。如果出现错误,请尝试同时启用 TP 和 DP,或升级vLLM


训练使用以下参数配置外部 vLLM 服务器
Expand All @@ -196,6 +196,30 @@ swift rollout \
--vllm_server_port <服务端口> \
--vllm_server_timeout <超时时间> \
```
#### 权重同步加速
swift 3.10 优化了权重同步,设置以下参数可以进一步优化 LoRA 训练的权重同步速度。

```bash
# rollout(server mode)
swift rollout \
--vllm_enable_lora true \
--vllm_max_lora_rank xxx # 与训练脚本lora_rank一致
...

# grpo(colocate mode)
swift rlhf \
--rlhf_type grpo \
--vllm_mode colocate \
--vllm_enable_lora true \
...
```

注意:以下情况无法使用该优化:

- 训练多模态模型的ViT层(freeze_vit false)
- MoE 模型

优化实现细节请参考该[PR](https://github.com/modelscope/ms-swift/pull/5773)

## logged metrics
- completions/mean_length:生成的 completion 的平均长度。
Expand Down
11 changes: 8 additions & 3 deletions docs/source/Instruction/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -526,17 +526,20 @@ reward模型参数将在PPO、GRPO中使用。
- vllm_mode: vLLM 集成模式,可选项为 `server` 和 `colocate`。server 模式使用 `swift rollout` 拉起的 vLLM 服务器进行采样,colocate 模式在程序内部署 vLLM。使用server端时,
- vllm_mode server 参数
- vllm_server_base_url: vLLM server的Base URL(比如 http://local_host:8000), 默认为None。设置后,忽略host和port设置。
- vllm_server_host:vLLM server host地址,默认为None,使用外部vLLM server时使用
- vllm_server_host:vLLM server host地址,默认为None。
- vllm_server_port vLLM server 服务端口,默认为8000。
- vllm_server_timeout 连接vLLM server的超时时间,默认为 240s。
- vllm_server_pass_dataset: 透传额外的数据集信息到vLLM server,用于多轮训练。
- async_generate: 异步rollout以提高训练速度,注意开启时采样会使用上一轮更新的模型进行采样,不支持多轮场景。默认`false`.
- SWIFT_UPDATE_WEIGHTS_BUCKET_SIZE:环境变量,用于控制权重同步时的传输桶大小(bucket size),适用于 Server Mode 下的全参数训练,单位为 MB,默认值为 512 MB。
- vllm_mode colocate 参数(更多参数支持参考[vLLM参数](#vLLM参数)。)
- vllm_gpu_memory_utilization: vllm透传参数,默认为0.9。
- vllm_max_model_len: vllm透传参数,默认为None。
- vllm_enforce_eager: vllm透传参数,默认为False。
- vllm_limit_mm_per_prompt: vllm透传参数,默认为None。
- vllm_enable_prefix_caching: vllm透传参数,默认为True。
- vllm_tensor_parallel_size: tp并行数,默认为`1`。
- vllm_enable_lora: 支持vLLM Engine 加载 LoRA adapter,默认为False。用于加速LoRA训练的权重同步,具体参考[文档](./GRPO/GetStarted/GRPO.md#权重同步加速)。
- sleep_level: 训练时释放 vLLM 显存,可选项为[0, 1], 默认为0,不释放
- offload_optimizer: 是否在vLLM推理时offload optimizer参数,默认为False。
- offload_model: 是否在vLLM推理时 offload 模型,默认为False。
Expand All @@ -549,7 +552,7 @@ reward模型参数将在PPO、GRPO中使用。
- sync_ref_model: 是否定期同步ref_model,默认为False。
- ref_model_mixup_alpha: 控制在更新过程中model和先前ref_model之间的混合。更新公式为 $π_{ref} = α * π_θ + (1 - α) * π_{ref_{prev}}$。默认为0.6。
- ref_model_sync_steps:同步频率,默认为512。
- move_model_batches: 在模型向vLLM等快速推理框架移动参数时,将layers分为多少个batch. 默认为None, 代表整个模型不进行拆分,否则拆分为move_model_batches+1(非layer参数)+1(多模态部分参数)个。注意:该参数仅对LoRA(PEFT)训练有意义。
- move_model_batches: 在模型向vLLM等快速推理框架移动参数时,将layers分为多少个batch. 默认为None, 代表整个模型不进行拆分,否则拆分为move_model_batches+1(非layer参数)+1(多模态部分参数)个。
- multi_turn_scheduler: 多轮GRPO参数, 传入对应的plugin名称, 同时在plugin/multi_turn.py中添加好对应的实现。
- max_turns: 多轮GRPO的轮数上限。默认为None,不做限制。
- dynamic_sample:筛除group内奖励标准差为0的数据,额外采样新数据,默认为False。
Expand Down Expand Up @@ -604,8 +607,10 @@ soft overlong 奖励参数

### Rollout参数
Rollout参数继承于[部署参数](#部署参数)
- multi_turn_scheduler: 多轮GRPO训练规划器,传入对应的plugin名称, 同时在plugin/multi_turn.py中添加好对应的实现。默认为None,具体参考[文档](./GRPO/DeveloperGuide/多轮训练.md)
- multi_turn_scheduler: 多轮GRPO训练规划器,传入对应的plugin名称, 同时在plugin/multi_turn.py中添加好对应的实现。默认为None,具体参考[文档](./GRPO/DeveloperGuide/多轮训练.md)
- max_turns: 多轮GRPO训练下的最大轮数,默认为None,即不做约束。
- vllm_enable_lora: 支持vLLM Engine 加载 LoRA adapter,默认为False。用于加速LoRA训练的权重同步,具体参考[文档](./GRPO/GetStarted/GRPO.md#权重同步加速)。
- vllm_max_lora_rank: vLLM Engine LoRA参数,需大于等于训练的lora_rank,建议等于。默认为16。

### Web-UI参数
- server_name: web-ui的host,默认为'0.0.0.0'。
Expand Down
6 changes: 5 additions & 1 deletion docs/source_en/BestPractices/GRPO-Code-Training.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,9 @@ launch external vLLM server using following script
```bash
CUDA_VISIBLE_DEVICES=7 \
swift rollout \
--model Qwen/Qwen2.5-7B-Instruct
--model Qwen/Qwen2.5-7B-Instruct \
--vllm_enable_lora true \
--vllm_max_lora_rank 16
```

```bash
Expand All @@ -65,6 +67,8 @@ swift rlhf \
--vllm_server_host 127.0.0.1 \
--vllm_server_port 8000 \
--train_type lora \
--lora_rank 16 \
--lora_alpha 32 \
--torch_dtype bfloat16 \
--dataset 'open-r1/verifiable-coding-problems-python-10k' \
--load_from_cache_file true \
Expand Down
9 changes: 7 additions & 2 deletions docs/source_en/Instruction/Command-line-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -535,17 +535,20 @@ The meanings of the following parameters can be referenced [here](https://huggin
- vllm_mode: Mode to use for vLLM integration when `use_vllm` is set to `True`. Must be one of `server` or `colocate`
- vllm_mode server parameter
- vllm_server_base_url: Base URL for the vLLM server (e.g., 'http://localhost:8000'). If provided, `vllm_server_host` " "and `vllm_server_port` are ignored. Default is None.
- vllm_server_host: The host address of the vLLM server. Default is None. This is used when connecting to an external vLLM server.
- vllm_server_host: The host address of the vLLM server. Default is None.
- vllm_server_port: The service port of the vLLM server. Default is 8000.
- vllm_server_timeout: The connection timeout for the vLLM server. Default is 240 seconds.
- vllm_server_pass_dataset: pass additional dataset information through to the vLLM server for multi-turn training.
- async_generate: Use async rollout to improve train speed. Note that rollout will use the model updated in the previous round when enabled. Multi-turn scenarios are not supported. Default is `false`.
- SWIFT_UPDATE_WEIGHTS_BUCKET_SIZE: An environment variable that controls the bucket size (in MB) for weight synchronization during full-parameter training in Server Mode. Default is 512 MB.
- vllm_mode colocate parameter (For more parameter support, refer to the [vLLM Arguments](#vLLM-Arguments).)
- vllm_gpu_memory_utilization: vLLM passthrough parameter, default is 0.9.
- vllm_max_model_len: vLLM passthrough parameter, the total length limit of model, default is None.
- vllm_enforce_eager: vLLM passthrough parameter, default is False.
- vllm_limit_mm_per_prompt: vLLM passthrough parameter, default is None.
- vllm_enable_prefix_caching: A pass-through parameter for vLLM, default is True.
- vllm_tensor_parallel_size: the tensor parallel size of vLLM engine, default is 1.
- vllm_enable_lora: Enable the vLLM engine to load LoRA adapters; defaults to False. Used to accelerate weight synchronization during LoRA training. See the [documentation](./GRPO/GetStarted/GRPO.md#weight-sync-acceleration) for details.
- sleep_level: make vllm sleep when model is training. Options are 0 or 1, default is 0, no sleep
- offload_optimizer: Whether to offload optimizer parameters during inference with vLLM. The default is `False`.
- offload_model: Whether to offload the model during inference with vLLM. The default is `False`.
Expand All @@ -563,7 +566,7 @@ The meanings of the following parameters can be referenced [here](https://huggin
- sync_ref_model: Whether to synchronize the reference model. Default is False。
- ref_model_mixup_alpha: The Parameter controls the mix between the current policy and the previous reference policy during updates. The reference policy is updated according to the equation: $π_{ref} = α * π_θ + (1 - α) * π_{ref_{prev}}$. Default is 0.6.
- ref_model_sync_steps:The parameter determines how frequently the current policy is synchronized with the reference policy. Default is 512.
- move_model_batches: When moving model parameters to fast inference frameworks such as vLLM/LMDeploy, determines how many batches to divide the layers into. The default is `None`, which means the entire model is not split. Otherwise, the model is split into `move_model_batches + 1` (non-layer parameters) + `1` (multi-modal component parameters) batches. This parameter is only meaningful for LoRA (PEFT).
- move_model_batches: When moving model parameters to fast inference frameworks such as vLLM/LMDeploy, determines how many batches to divide the layers into. The default is `None`, which means the entire model is not split. Otherwise, the model is split into `move_model_batches + 1` (non-layer parameters) + `1` (multi-modal component parameters) batches.
- multi_turn_scheduler: Multi-turn GRPO parameter; pass the corresponding plugin name, and make sure to implement it in plugin/multi_turn.py.
- max_turns: Maximum number of rounds for multi-turn GRPO. The default is None, which means there is no limit.
- dynamic_sample: Exclude data within the group where the reward standard deviation is 0, and additionally sample new data. Default is False.
Expand Down Expand Up @@ -623,6 +626,8 @@ Deployment Arguments inherit from the [inference arguments](#inference-arguments
The rollout parameters inherit from the [deployment parameters](#deployment-arguments).
- multi_turn_scheduler: The scheduler for multi-turn GRPO training. Pass the corresponding plugin name, and ensure the implementation is added in `plugin/multi_turn.py`. Default is `None`. See [documentation](./GRPO/DeveloperGuide/multi_turn.md) for details.
- max_turns: Maximum number of turns in multi-turn GRPO training. Default is `None`, meaning no limit.
- vllm_enable_lora: Enable the vLLM engine to load LoRA adapters; defaults to False. Used to accelerate weight synchronization during LoRA training. See the [documentation](./GRPO/GetStarted/GRPO.md#weight-sync-acceleration) for details.
- vllm_max_lora_rank: LoRA parameter for the vLLM engine. Must be greater than or equal to the training lora_rank; it is recommended to set them equal. Defaults to 16.

### Web-UI Arguments
- server_name: Host for the web UI, default is '0.0.0.0'.
Expand Down
27 changes: 26 additions & 1 deletion docs/source_en/Instruction/GRPO/GetStarted/GRPO.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ swift rollout \
```
For more rollout parameters, refer to the [vllm arguments](../../../Instruction/Command-line-parameters.md#vllm-arguments) and [rollout arguments](../../../Instruction/Command-line-parameters.md#rollout-arguments)

Note: When set `use_async_engine`, enabling only DP (Data Parallelism) may cause errors. [Related issue](https://github.com/vllm-project/vllm/issues/18567). If errors occur, try enabling both TP (Tensor Parallelism) and DP.
Note: When set `use_async_engine`, enabling only DP (Data Parallelism) may cause errors. [Related issue](https://github.com/vllm-project/vllm/issues/18567). If errors occur, try enabling both TP (Tensor Parallelism) and DP or upgrading vLLM.

To configure the external vLLM server during training, use the following parameters:

Expand All @@ -194,6 +194,31 @@ To configure the external vLLM server during training, use the following paramet
--vllm_server_port <service_port> \
--vllm_server_timeout <timeout> \
```

#### Weight-Sync Acceleration
Swift 3.10 optimizes weight synchronization, and setting the following parameters can further improve the weight synchronization speed for LoRA training:

```bash
# rollout(server mode)
swift rollout \
--vllm_enable_lora true \
--vllm_max_lora_rank xxx # match the lora_rank in the training script
...

# grpo(colocate mode)
swift rlhf \
--rlhf_type grpo \
--vllm_mode colocate \
--vllm_enable_lora true \
...
```
Note: This optimization cannot be used in the following cases:

- Training the ViT layers of multimodal models (freeze_vit set to false)
- MoE models

For implementation details, please refer to the [PR](https://github.com/modelscope/ms-swift/pull/5773)

## logged metrics
- completions/mean_length: The average length of generated completions.
- completions/min_length: The minimum length among generated completions.
Expand Down
6 changes: 6 additions & 0 deletions examples/train/grpo/external/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@
1. vLLM version 0.8.3 or higher.
2. trl version 0.17.0 or higher

For LoRA Training, set following parameters to speed up weight update
```bash
--vllm_enable_lora true
--vllm_max_lora_rank xxx # same as lora_rank in training script
```

## **Introduction**

The GRPO (Group Relative Policy Optimization) training framework supports high-performance inference engines like vLLM to accelerate the sampling process. The **External Mode** allows you to connect to an external vLLM inference server, separating the inference service from the training process. This mode is ideal for scenarios where you want to offload inference to dedicated hardware or servers, improving resource utilization and scalability.
Expand Down
44 changes: 44 additions & 0 deletions examples/train/grpo/external/moe_full.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# 8*80G

# CUDA_VISIBLE_DEVICES=0 \
# swift rollout \
# --model Qwen/Qwen3-30B-A3B-Instruct-2507 \
# --vllm_max_model_len 16384 \
# --vllm_enable_prefix_caching true

CUDA_VISIBLE_DEVICES=1,2,3,4,5,6,7 \
NPROC_PER_NODE=7 \
swift rlhf \
--rlhf_type grpo \
--model Qwen/Qwen3-30B-A3B-Instruct-2507 \
--reward_funcs accuracy \
--use_vllm true \
--vllm_mode server \
--vllm_server_host 127.0.0.1 \
--vllm_server_port 8000 \
--train_type full \
--torch_dtype bfloat16 \
--dataset AI-MO/NuminaMath-TIR#1000 \
--max_length 12000 \
--max_completion_length 8192 \
--overlong_filter true \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--learning_rate 1e-6 \
--gradient_accumulation_steps 4 \
--save_strategy 'steps' \
--eval_strategy 'steps' \
--eval_steps 1000 \
--save_steps 1000 \
--save_total_limit 10 \
--logging_steps 1 \
--warmup_ratio 0.01 \
--dataloader_num_workers 4 \
--num_generations 14 \
--temperature 1.0 \
--deepspeed zero3_offload \
--log_completions true \
--report_to tensorboard swanlab \
--num_iterations 1 \
--beta 0.001 \
--move_model_batches 5
44 changes: 44 additions & 0 deletions examples/train/grpo/external/moe_lora.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# 8*80G

# CUDA_VISIBLE_DEVICES=0 \
# swift rollout \
# --model Qwen/Qwen3-30B-A3B-Instruct-2507 \
# --vllm_max_model_len 16384 \
# --vllm_enable_prefix_caching true

CUDA_VISIBLE_DEVICES=1,2,3,4,5,6,7 \
NPROC_PER_NODE=7 \
swift rlhf \
--rlhf_type grpo \
--model Qwen/Qwen3-30B-A3B-Instruct-2507 \
--reward_funcs accuracy \
--use_vllm true \
--vllm_mode server \
--vllm_server_host 127.0.0.1 \
--vllm_server_port 8000 \
--train_type lora \
--torch_dtype bfloat16 \
--dataset AI-MO/NuminaMath-TIR#1000 \
--max_length 12000 \
--max_completion_length 8192 \
--overlong_filter true \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--learning_rate 1e-6 \
--gradient_accumulation_steps 4 \
--save_strategy 'steps' \
--eval_strategy 'steps' \
--eval_steps 1000 \
--save_steps 1000 \
--save_total_limit 10 \
--logging_steps 1 \
--warmup_ratio 0.01 \
--dataloader_num_workers 4 \
--num_generations 14 \
--temperature 1.0 \
--deepspeed zero3 \
--log_completions true \
--report_to tensorboard swanlab \
--num_iterations 1 \
--beta 0.001 \
--move_model_batches 5
40 changes: 40 additions & 0 deletions examples/train/grpo/internal/moe_full.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# 8*80G

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
swift rlhf \
--rlhf_type grpo \
--model Qwen/Qwen3-30B-A3B-Instruct-2507 \
--reward_funcs accuracy \
--use_vllm true \
--vllm_mode colocate \
--vllm_gpu_memory_utilization 0.4 \
--vllm_tensor_parallel_size 2 \
--vllm_max_model_len 16384 \
--train_type full \
--torch_dtype bfloat16 \
--dataset AI-MO/NuminaMath-TIR#1000 \
--max_length 12000 \
--max_completion_length 8192 \
--overlong_filter true \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--learning_rate 1e-6 \
--gradient_accumulation_steps 4 \
--save_strategy 'steps' \
--eval_strategy 'steps' \
--eval_steps 1000 \
--save_steps 1000 \
--save_total_limit 10 \
--logging_steps 1 \
--warmup_ratio 0.01 \
--dataloader_num_workers 4 \
--num_generations 16 \
--temperature 1.0 \
--deepspeed zero3_offload \
--log_completions true \
--sleep_level 1 \
--report_to tensorboard swanlab \
--num_iterations 1 \
--beta 0.001 \
--move_model_batches 10
42 changes: 42 additions & 0 deletions examples/train/grpo/internal/moe_lora.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# 8*80G

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
swift rlhf \
--rlhf_type grpo \
--model Qwen/Qwen3-30B-A3B-Instruct-2507 \
--reward_funcs accuracy \
--use_vllm true \
--vllm_mode colocate \
--vllm_gpu_memory_utilization 0.4 \
--vllm_tensor_parallel_size 2 \
--vllm_max_model_len 16384 \
--train_type lora \
--torch_dtype bfloat16 \
--dataset AI-MO/NuminaMath-TIR#1000 \
--max_length 12000 \
--max_completion_length 8192 \
--overlong_filter true \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--learning_rate 1e-6 \
--gradient_accumulation_steps 4 \
--save_strategy 'steps' \
--eval_strategy 'steps' \
--eval_steps 1000 \
--save_steps 1000 \
--save_total_limit 10 \
--logging_steps 1 \
--warmup_ratio 0.01 \
--dataloader_num_workers 4 \
--num_generations 16 \
--temperature 1.0 \
--deepspeed zero3 \
--log_completions true \
--sleep_level 1 \
--offload_model true \
--offload_optimizer true \
--report_to tensorboard swanlab \
--num_iterations 1 \
--beta 0.001 \
--move_model_batches 10
1 change: 0 additions & 1 deletion examples/train/grpo/internal/vllm_72b_4gpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ swift rlhf \
--top_p 1.0 \
--top_k 80 \
--log_completions true \
--async_generate false \
--move_model_batches 16 \
--offload_optimizer true \
--offload_model true \
Expand Down
Loading
Loading