Releases · modelscope/ms-swift

08 Jan 02:29

Jintao-Huang

v3.12.1

f842150

v3.12.1 Latest

Latest

What's Changed

[bugfix] fix glm4_7 agent_template by @Jintao-Huang in #7256
[bugfix] fix DeepSeek-OCR vllm deploy by @hjh0119 in #7258
[feat] add async reward function support for GRPO training by @hjh0119 in #7252
[model] support medgemma by @slin000111 in #7261
[megatron] Support MiniMaxAI/MiniMax-M2.1 by @Jintao-Huang in #7262
Support muonclip optimizer by @vx120 in #7191
add task_type by @slin000111 in #7265
[bugfix] fix mtp save by @Jintao-Huang in #7267
[feat] support megatron grpo entropy mask & log by @hjh0119 in #7263
[model] support iquestcoder by @Jintao-Huang in #7271
[bugfix] fix reward model adapters by @hjh0119 in #7293
Fix the issue of repeated inference in multi-turn scheduler. by @Simon-ss7 in #7279
[bugfix] auto-enable async engine for vLLM encode tasks by @hjh0119 in #7301
[bugfix] fix vllm_engine load_format by @Jintao-Huang in #7302
fix npu megatron cp by @addsubmuldiv in #7299
[misc] Remove unnecessary clone operations during weight synchronization by @hjh0119 in #7308
[model] support youtu-llm by @hjh0119 in #7306
[megatron] fix gpt_bridge oom by @Jintao-Huang in #7310
[misc] fix youtu agent template type-checking by @hjh0119 in #7311
[bugfix] Fix duplicate 'load_format' argument being passed in rollout by @hjh0119 in #7312

New Contributors

@Simon-ss7 made their first contribution in #7279

Full Changelog: v3.12.0...v3.12.1

Contributors

addsubmuldiv, Simon-ss7, and 4 other contributors

Assets 2

30 Dec 03:24

Jintao-Huang

v3.12.0

5effa64

v3.12.0

中文版

新特性

Megatron-SWIFT
a. GKD算法支持Megatron训练，文档参考：https://swift.readthedocs.io/zh-cn/latest/Megatron-SWIFT/GKD.html
b. 新模型支持：GLM4 Dense; GLM4.7; GLM4.6v-Flash, GLM-4.1V。
c. save_safetensors 支持断点续训，将 Mcore-Bridge 加载和存储方式作为推荐方式。
d. 非 padding-free 训练模式支持更多训练阶段：GRPO/DPO/KTO/RM/序列分类。
e. group_by_length 参数支持，将数据集长度大致相同的样本分组在一起（含随机因素），加速非packing模式下训练速度。
f. 支持 --report_to 参数，将训练日志在 wandb/swanlab 中记录并可视化。
g. Qwen3-Next 使用 Zero-Centered RMSNorm，与 transformers 对齐。
h. train_dataloader_shuffle 参数支持，控制训练数据集是否随机。
i. template.encode 新增重试机制，避免 megatron 训练因网络问题获取图片/视频报错而卡住。
RL
a. 增加 Off-Policy Sequence Masking (from DeepSeek-V3.2)，文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/training_inference_mismatch.html#off-policy-sequence-masking
b. GRPO 增加参数 num_generations_eval 设置 eval 阶段的生成数量。
c. 优化 GKD loss 计算的显存峰值。
d. GRPO/GKD server mode 支持使用 ipv6 地址。
e. 支持使用 structured_outputs_regex 进行结构化输出采样。
训练
a. embedding/reranker/序列分类任务支持序列 packing 和序列并行。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/sequence_parallel
b. 支持 --fsdp fsdp2 使用 ms-swift 内置的 FSDP2 配置文件。
c. loss_scale 支持3种基本策略：'default'、'last_round'、'all'与其他策略的混合使用，例如：'last_round+ignore_empty_think'。
d. cached_dataset 支持 embedding/reranker/序列分类训练任务，训练脚本参考https://github.com/modelscope/ms-swift/tree/main/examples/train/cached_dataset
e. thinking template 重构，ThinkingTemplate 功能合入 Template，新增enable_thinking, add_non_thinking_prefix参数。
f. 新增 SWIFT_PATCH_CONV3D 环境变量，避免 torch2.9 环境跑 conv3d 缓慢的问题。
g. 支持 swanlab_notification_method 参数，在训练完成/发生错误时，指定 swanlab 通知方式。
h. dataloader_prefetch_factor 参数默认值从10修改为2。
国产化硬件（感谢昇腾和招商银行技术团队的贡献）
a. 新增更多训练脚本：https://github.com/modelscope/ms-swift/tree/main/examples/ascend
b. Qwen3-VL 混合算子支持，具体查看这个PR：#7079
c. 更新 Megatron-SWIFT NPU 性能采集/精度采集相关文档，参考这里：https://swift.readthedocs.io/zh-cn/latest/Megatron-SWIFT/Ascend.html

新模型

纯文本模型：
a. ZhipuAI/GLM-4.7系列
b. iic/QwenLong-L1.5-30B-A3B
c. gongjy/MiniMind2 （感谢 @PiggerZZM 的贡献）
多模态模型：
a. ZhipuAI/GLM-4.6V; ZhipuAI/GLM-4.6V-Flash系列
b. Tencent-Hunyuan/HunyuanOCR

English Version

New Features

Megatron-SWIFT
a. GKD algorithm supports Megatron training. Documentation reference: https://swift.readthedocs.io/en/latest/Megatron-SWIFT/GKD.html
b. New model support: GLM4 Dense; GLM4.7; GLM4.6v-Flash, GLM-4.1V.
c. save_safetensors supports checkpoint resumption, with Mcore-Bridge loading and storage method as the recommended approach.
d. Non-padding-free training mode supports more training stages: GRPO/DPO/KTO/RM/sequence classification.
e. group_by_length parameter support, grouping samples with similar lengths in the dataset together (with random factors) to accelerate training speed in non-packing mode.
f. Support for --report_to parameter to log and visualize training logs in wandb/swanlab.
g. Qwen3-Next uses Zero-Centered RMSNorm, aligned with transformers.
h. train_dataloader_shuffle parameter support to control whether training dataset is shuffled.
i. Added retry mechanism to template.encode to prevent megatron training from freezing due to network issues when fetching images/videos.
RL
a. Added Off-Policy Sequence Masking (from DeepSeek-V3.2). Documentation reference: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/training_inference_mismatch.html#off-policy-sequence-masking
b. GRPO adds num_generations_eval parameter to set the number of generations during eval stage.
c. Optimized memory peak for GKD loss calculation.
d. GRPO/GKD server mode supports using ipv6 addresses.
e. Support for structured output sampling using structured_outputs_regex.
Training
a. Embedding/reranker/sequence classification tasks support sequence packing and sequence parallelism. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/sequence_parallel
b. Support for --fsdp fsdp2 to use ms-swift built-in FSDP2 configuration file.
c. loss_scale supports 3 basic strategies: 'default', 'last_round', 'all' and their hybrid use with other strategies, e.g., 'last_round+ignore_empty_think'.
d. cached_dataset supports embedding/reranker/sequence classification training tasks. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/cached_dataset
e. Thinking template refactored, ThinkingTemplate functionality merged into Template, added enable_thinking and add_non_thinking_prefix parameters.
f. Added SWIFT_PATCH_CONV3D environment variable to avoid slow conv3d execution in torch2.9 environment.
g. Support for swanlab_notification_method parameter to specify swanlab notification method when training completes/errors occur.
h. dataloader_prefetch_factor parameter default value changed from 10 to 2.
Domestic Hardware (Thanks to Ascend and CMB technical teams)
a. Added more training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/ascend
b. Qwen3-VL hybrid operator support, see this PR: #7079
c. Updated Megatron-SWIFT NPU performance collection/accuracy collection documentation, reference: https://swift.readthedocs.io/en/latest/Megatron-SWIFT/Ascend.html

New Models

Text-only models:
a. ZhipuAI/GLM-4.7 series
b. iic/QwenLong-L1.5-30B-A3B
c. gongjy/MiniMind2 (Thanks to @PiggerZZM's contribution)
Multimodal models:
a. ZhipuAI/GLM-4.6V; ZhipuAI/GLM-4.6V-Flash series
b. Tencent-Hunyuan/HunyuanOCR

What's Changed

[model] Support GLM4.6-V by @Jintao-Huang in #6948
[model] support glm4_6v flash by @Jintao-Huang in #6959
[bugfix] fix truncation_strategy left by @Jintao-Huang in #6961
[bugfix] fix megatron save_checkpoint by @Jintao-Huang in #6963
[feat] GKD support truncation strategy delete to resample by @hjh0119 in #6964
[misc] megatron grpo check rollout_logps by @hjh0119 in #6970
[misc] set default group_port for vllm client by @hjh0119 in #6972
[grpo] support Off-Policy Sequence Masking by @hjh0119 in #6978
[megatron, misc] support check_latest_model by @hjh0119 in #6988
[bugfix] fix reranker_padding_free by @Jintao-Huang in #6989
[megatron] fix eval_iters 1 by @Jintao-Huang in #6990
Add dense_npu.sh for megatron lora training in huawei npu by @vx120 in #6976
fix system swift pt by @Jintao-Huang in #7003
[bugfix] fix qwen_vl_utils torchvision base64 by @Jintao-Huang in #7004
[bugfix] fix liger_kernel flash_attn by @Jintao-Huang in #7005
[bugfix] fix qwen3_vl bridge by @Jintao-Huang in #7006
[bugfix] fix reranker padding_free & fix seq_cls omni padding_free by @Jintao-Huang in #7007
[npu] add npu qwen3_omni sft example for mindspeed backend by @tongtong0613 in #7008
[bugfix] qwen-omni3 vllm infer with USE_AUDIO_IN_VIDEO by @hjh0119 in #7009
[bugfix] fix grpo sleep_level 2 causes gibberish outputs by @hjh0119 in #7017
add npu vllm-ascend docs and examples by @addsubmuldiv in #7013
[compat] fix mcore012 compat torch new by @Jintao-Huang in #7021
[megatron] Megatron support random/non-random dataloader by @Jintao-Huang in #7016
[bugfix] megatron add retry to avoid hang by @Jintao-Huang in #7023
[trainer] refactor acc metrics by @Jintao-Huang in #7026
[infer] update embddding/reranker demo by @Jintao-Huang in #7029
[train] support embeding/reranker packing & support reranker/embedding cache_dataset by @Jintao-Huang in #6987
update readme by @Jintao-Huang in #7033
[misc] update swift image by @Jintao-Huang in #7039
[bugfix] remove add_eos for rm in grpo by @hjh0119 in #7040
[npu] Fix device mismatch in weight sync for HCCL communicator by @singing4you in #7036
collect npu profiling data by @OneMondy in #6977
[bugfix] fix null_ref_context by @Jintao-Huang in #7042
[model] support hunyuan_ocr by @slin000111 in #7038
update flash_attn version; fix mcore 0.15 hang by @Jintao-Huang in #7043
[bugfix] fix grpo multi turn log_entropy by @hjh0119 in #7044
[bugfix] fix dataloader megatron by @Jintao-Huang in #7050
[grpo] support num_generations_eva...

Contributors

liuyanyi, PiggerZZM, and 12 other contributors

Assets 2

28 Dec 12:54

Jintao-Huang

v3.11.3

96d08aa

Patch release v3.11.3

Full Changelog: v3.11.2...v3.11.3

Assets 2

21 Dec 02:59

Jintao-Huang

v3.11.2

9a4c553

Patch release v3.11.2

Full Changelog: v3.11.1...v3.11.2

Assets 2

15 Dec 01:10

Jintao-Huang

v3.11.1

21193c6

Patch release v3.11.1

Full Changelog: v3.11.0...v3.11.1

Assets 2

09 Dec 02:44

Jintao-Huang

v3.11.0

7537fa8

v3.11.0

中文版

新特性

Megatron-SWIFT
a. 支持 GRPO Megatron 训练，训练文档参考：https://swift.readthedocs.io/zh-cn/latest/Megatron-SWIFT/GRPO.html
b. FP8 blockwise 训练支持，支持FP8加载和导出权重，训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/megatron/fp8
c. MTP 训练支持，训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/megatron/lora/mtp.sh
d. 新模型支持：GPT-OSS，Llama4，InternVL3.5-GPT-OSS等。
e. 支持 --save_strategy epoch 策略存储模型。
f. 兼容 megaron-core 0.12-0.15 版本。
RL
a. 新算法 SAPO 支持，文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/SAPO.html
b. 新算法 CISPO 支持，文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/CISPO.html
c. 缓解训推不一致的算法支持，包括 TIS/MIS 与 rollout off-policy metrics 记录，文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/training_inference_mismatch.html
d. tree-rollout 支持，文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/treepo.html （感谢招商银行团队 @li2zhi 的贡献）
e. gkd 训练支持使用 liger_kernel loss（--use_liger_kernel true）。
f. 新增 GRPO loss_type，文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/DeveloperGuide/loss_types.html
训练
a. cached dataset 重构，更好支持大型数据集离线 tokenize 场景，脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/cached_dataset
b. 预训练场景 --truncation_strategy split 策略支持，将长文本切成多条数据样本避免 tokens 浪费。
c. packing_num_proc 参数支持。
d. Qwen2.5-VL系列模型兼容使用 "qwen_vl_utils>=0.14"。
e. MFU 日志插件支持。(感谢 @y2logic 的贡献)
国产化硬件（感谢昇腾和招商银行技术团队的贡献）
a. Megatron-SWIFT 支持昇腾 NPU，文档参考：https://swift.readthedocs.io/zh-cn/latest/BestPractices/NPU-support.html
b. 昇腾NPU混合算子支持 Qwen2、Qwen3、Qwen3-MoE 系列模型，加速训练过程。

新模型

纯文本模型：
a. moonshotai/Kimi-K2-Thinking
多模态模型：
a. SenseNova/SenseNova-SI-InternVL3-2B系列
b. mistralai/Ministral-3-3B-Instruct-2512系列
c. mistralai/Mistral-Small-3.2-24B-Instruct-2506

English Version

New Features

Megatron-SWIFT
a. GRPO training support on Megatron, documentation: https://swift.readthedocs.io/en/latest/Megatron-SWIFT/GRPO.html
b. FP8 blockwise training support, including FP8 weight loading and exporting. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/fp8
c. MTP training support, training script: https://github.com/modelscope/ms-swift/blob/main/examples/megatron/lora/mtp.sh
d. New model support: GPT-OSS, Llama4, InternVL3.5-GPT-OSS, etc.
e. Support for saving strategy --save_strategy epoch.
f. Compatible with megaron-core versions 0.12–0.15.
RL
a. New algorithm SAPO supported, documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/SAPO.html
b. New algorithm CISPO supported, documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/CISPO.html
c. Algorithms for mitigating training–inference mismatch, including TIS/MIS and rollout off-policy metrics. Docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/training_inference_mismatch.html
d. Tree-rollout support, docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/treepo.html (Thanks to CMB team @li2zhi for the contribution)
e. GKD training supports liger_kernel loss (--use_liger_kernel true).
f. New GRPO loss types added, docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/DeveloperGuide/loss_types.html
Training
a. Cached dataset refactoring for better offline tokenization of large datasets. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/cached_dataset
b. Pretraining --truncation_strategy split support, splitting long text into multiple samples to avoid token waste.
c. Added packing_num_proc parameter support.
d. Qwen2.5-VL series models compatible with "qwen_vl_utils>=0.14".
e. MFU logging plugin support (Thanks to @y2logic).
Domestic Hardware Support (Thanks to Ascend and CMB technical teams)
a. Megatron-SWIFT supports Ascend NPU, documentation: https://swift.readthedocs.io/en/latest/BestPractices/NPU-support.html
b. Ascend NPU mixed operators support Qwen2, Qwen3, Qwen3-MoE series models, accelerating training.

New Models

Text-only models:
a. moonshotai/Kimi-K2-Thinking
Multimodal models:
a. SenseNova/SenseNova-SI-InternVL3-2B series
b. mistralai/Ministral-3-3B-Instruct-2512 series
c. mistralai/Mistral-Small-3.2-24B-Instruct-2506

What's Changed

bump version 3.11.0.dev by @Jintao-Huang in #6560
[model] support Kimi-K2 by @Jintao-Huang in #6562
[bugfix] fix pp vit_lr by @Jintao-Huang in #6565
[bugfix] fix tools parse in gkd/grpo server mode by @hjh0119 in #6568
[bugfix] fix grpo with reward model by @hjh0119 in #6567
[bugfix] fix mcore-bridge vpp by @Jintao-Huang in #6581
qwen2.5-vl compat qwen_vl_utils version by @Jintao-Huang in #6584
[bugfix] fix packing_length by @Jintao-Huang in #6594
[dataset] support packing_num_proc by @Jintao-Huang in #6592
Fix emb loss scale by @tastelikefeet in #6597
[megatron] compat megatron-core 0.12-0.14 by @Jintao-Huang in #6599
[kto] fix kto loss_type=apo_zero_unpaired by @Jintao-Huang in #6601
Fix command line display for UI by @slin000111 in #6603
Support Megatron GRPO by @hjh0119 in #6025
[megatron] fix train_iters by @Jintao-Huang in #6611
[bugfix] fix modelscope patch_hub by @Jintao-Huang in #6612
[template] support add_eos by @Jintao-Huang in #6613
[dataset] refactor cached_dataset by @Jintao-Huang in #6561
[bugfix]fix add_eos in gkd/grpo for truncated sample encode by @hjh0119 in #6618
Support GKD Liger Kernel Loss by @hjh0119 in #6619
Support generative reranker right pad by @0russwest0 in #6573
update swift image 3.10.1 by @Jintao-Huang in #6622
[model] support mistral 2506 by @Jintao-Huang in #6624
update peft version by @Jintao-Huang in #6621
[bugfix] Fix multinode write conflict mcore-bridge (deepseek-v3) by @Jintao-Huang in #6626
Initialize chord dataset after accelerator setup in GRPOTrainer by @tongchen126 in #6638
[bugfix] fix megatron grpo max_epochs by @hjh0119 in #6646
[bugfix] fix megatron grpo server mode sync weight by @hjh0119 in #6648
[megatron] fix save barrier by @Jintao-Huang in #6653
[bugfix] fix megatron grpo rollout_group by @hjh0119 in #6655
[bugfix] fix chatml chat template by @Jintao-Huang in #6656
[bugfix] fix train_type full freeze_llm by @Jintao-Huang in #6651
[mcore-bridge] optimize gpt_bridge comm by @Jintao-Huang in #6659
[algo] support cispo algorithm by @hjh0119 in #6572
[model] Support SenseNova-SI by @hjh0119 in #6657
[megatron] fix swift export merge_lora by @Jintao-Huang in #6664
[bugfix] memory log is missing on Ascend NPU by @baymax591 in #6647
update doc by @tastelikefeet in #6665
[bugfix] Fix GKD with TRL >= 0.24 & GKD Liger by @hjh0119 in #6663
[template] support truncation_strategy spllit (swift pt) by @Jintao-Huang in #6672
[bugfix] fix qwen3_omni seq_cls by @Jintao-Huang in #6673
[bugfix] getattr error for activation_offloading in RM training by @hjh0119 in #6677
[bugfix] fix liger-kernel version check by @hjh0119 in #6679
[bugfix] fix qwen3_vl image_list fps by @Jintao-Huang in #6696
[bugfix] fix logprobs in vllm sampling params by @hjh0119 in #6698
[megatron] support global_aux_loss by @Jintao-Huang in #6699
[bugfix] fix megatron grpo local jsonl writer by @hjh0119 in #6700
fix type_type=rm eval trl>=0.25 by @Jintao-Huang in #6701
add npu fsdp example by @addsubmuldiv in #6697
add npu deepspeed example by @addsubmuldiv in https://github.com/model...

Contributors

addsubmuldiv, ji-huazhong, and 18 other contributors

Assets 2

30 Nov 06:35

Jintao-Huang

v3.10.3

479fb10

Patch release v3.10.3

Full Changelog: v3.10.2...v3.10.3

Assets 2

23 Nov 09:58

Jintao-Huang

v3.10.2

650e407

Patch release v3.10.2

Full Changelog: v3.10.1...v3.10.2

Assets 2

16 Nov 16:50

Jintao-Huang

v3.10.1

f7450e2

Patch release v3.10.1

Full Changelog: v3.10.0...v3.10.1

Assets 2

11 Nov 12:14

Jintao-Huang

v3.10.0

35c0542

v3.10.0

中文版

新特性

Megatron-SWIFT
a. Mcore-Bridge发布。支持直接加载和存储 safetensors 格式的模型权重；支持LoRA增量权重双向转换；支持多机转换。文档参考：https://swift.readthedocs.io/zh-cn/latest/Megatron-SWIFT/Mcore-Bridge.html 。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/megatron/mcore_bridge
b. megatron-core 版本升级至0.14.0。
c. 多模态模型训练新增 vit_lr 和 aligner_lr 参数支持。
d. 新增存储优化参数：async_save, save_retain_interval等。
e. 支持batched mrope，加速Qwen3-VL、Qwen2.5-VL等模型的训练速度。
RL
a. GRPO LoRA 训练权重同步速度优化，具体参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/GetStarted/GRPO.html#id3
b. GRPO 训练显存优化以降低峰值显存占用。
c. RLVR 新算法支持：RLOO，文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/RLOO.html 。REINFORCE++ Baseline，文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/REINFORCEPP.html
d. GKD 支持使用 vLLM 加速策略模型rollout，并新增参数teacher_deepspeed额外控制教师模型分片策略。文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GKD.html
e. GSPO 支持使用liger_kernel减少显存使用。
训练
a. PT/SFT/采样/数据蒸馏中支持了RAY，具体参考文档：https://swift.readthedocs.io/zh-cn/latest/Instruction/Ray.html
b. Qwen3-VL、Qwen3-Omni支持混合模态数据训练；Qwen3-VL支持ulysses序列并行。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_vl
c. 支持 yaml 方式配置训练参数，脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/yaml
d. 新增 FSDP2 训练启动案例，脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-gpu/fsdp2_lora
e. 新增自定义多模态模型注册最佳实践：https://swift.readthedocs.io/zh-cn/latest/BestPractices/MLLM-Registration.html
f. embedding 训练中的 InfoNCE 损失与 Qwen3-Embedding 论文描述对齐。具体参考文档：https://swift.readthedocs.io/zh-cn/latest/BestPractices/Embedding.html
g. 新增多标签分类训练案例，脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls/multi_label
h. agent_template 支持 seed-oss。感谢@hpsun1109的贡献。
全链路
a. swift export支持 GPTQ-v2 量化，脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/gptq_v2.sh 。感谢@zzc0430的贡献。
b. swift deploy vllm推理后端支持 DP 部署，使用--vllm_data_parallel_size参数。感谢@YushunXiang 的贡献。
c. swift deploy 新增 health/ping endpoints。
d. vLLM 部署新增参数 vllm_mm_processor_cache_gb/vllm_engine_kwargs。

新模型

纯文本模型：
a. Qwen/Qwen3Guard-Gen-0.6B系列
b. MiniMax/MiniMax-M2
多模态模型：
a. Qwen/Qwen3-VL-2B-Instruct系列
b. deepseek-ai/DeepSeek-OCR，训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/models/deepseek_ocr
c. PaddlePaddle/PaddleOCR-VL
d. ZhipuAI/Glyph
e. PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking系列
f. lmms-lab/LLaVA-OneVision-1.5-4B-Instruct系列

English Version

New Features

Megatron-SWIFT
a. Mcore-Bridge Release. Supports direct loading and saving of model weights in safetensors format; supports bidirectional conversion of LoRA incremental weights; supports multi-node conversion. Documentation: https://swift.readthedocs.io/en/latest/Megatron-SWIFT/Mcore-Bridge.html. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/mcore_bridge
b. Upgraded megatron-core version to 0.14.0.
c. Added vit_lr and aligner_lr parameter support for multimodal model training.
d. Added storage optimization parameters: async_save, save_retain_interval, etc.
e. Support for batched mrope to accelerate training speed of Qwen3-VL, Qwen2.5-VL, and other models.
RL
a. GRPO LoRA training weight synchronization speed optimization. Details: https://swift.readthedocs.io/en/latest/Instruction/GRPO/GetStarted/GRPO.html#memory-optimization-solutions-in-colocate-mode
b. GRPO training memory optimization to reduce peak memory consumption.
c. New RLVR algorithm support: RLOO, documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/RLOO.html. REINFORCE++ Baseline, documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/REINFORCEPP.html
d. GKD supports using vLLM to accelerate policy model rollout, with new parameter teacher_deepspeed for additional control of teacher model sharding strategy. Documentation: https://swift.readthedocs.io/en/latest/Instruction/GKD.html
e. GSPO supports using liger_kernel to reduce memory usage.
Training
a. RAY support added for PT/SFT/Sampling/Data Distillation, documentation: https://swift.readthedocs.io/en/latest/Instruction/Ray.html
b. Qwen3-VL and Qwen3-Omni support mixed modality data training; Qwen3-VL supports Ulysses sequence parallelism. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_vl
c. Support for YAML-based training parameter configuration, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/yaml
d. Added FSDP2 training launch example, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-gpu/fsdp2_lora
e. Added best practice for custom multimodal model registration: https://swift.readthedocs.io/en/latest/BestPractices/MLLM-Registration.html
f. InfoNCE loss in embedding training aligned with Qwen3-Embedding paper description. Documentation: https://swift.readthedocs.io/en/latest/BestPractices/Embedding.html
g. Added multi-label classification training example, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls/multi_label
h. agent_template supports seed-oss. Thanks to @hpsun1109 for the contribution.
Full Pipeline
a. swift export supports GPTQ-v2 quantization, scripts: https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/gptq_v2.sh. Thanks to @zzc0430 for the contribution.
b. swift deploy vLLM inference backend supports DP deployment, using --vllm_data_parallel_size parameter. Thanks to @YushunXiang for the contribution.
c. swift deploy added health/ping endpoints.
d. vLLM deployment added parameters vllm_mm_processor_cache_gb/vllm_engine_kwargs.

New Models

Text-only models:
a. Qwen/Qwen3Guard-Gen-0.6B series
b. MiniMax/MiniMax-M2
Multimodal models:
a. Qwen/Qwen3-VL-2B-Instruct series
b. deepseek-ai/DeepSeek-OCR, training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/deepseek_ocr
c. PaddlePaddle/PaddleOCR-VL
d. ZhipuAI/Glyph
e. PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking series
f. lmms-lab/LLaVA-OneVision-1.5-4B-Instruct series

What's Changed

[bugfix] fix image_list qwen2.5/3-omni by @Jintao-Huang in #6122
[model] Support Qwen3-VL dense by @Jintao-Huang in #6120
feat: support gptq_v2 quantization method by @zzc0430 in #6102
[bugfix] fix gptq_v2 by @Jintao-Huang in #6126
[bugfix] patch timeout & fix print_rich_table by @Jintao-Huang in #6137
Add the support for vLLM data parallel configuration in SwiftDeploy by @YushunXiang in #6114
[docs] update vllm deploy DP docs by @Jintao-Huang in #6139
[model] Support Qwen/Qwen3-VL-4B-Instruct series by @Jintao-Huang in #6143
Update loss_scale method call to pass through inputs.extra_kwargs by @CJack812 in #6160
[bugfix] fix qwen3_vl videos by @Jintao-Huang in #6162
Fix bug of sp/cp by @tastelikefeet in #6163
[deploy] update vllm_enable_prefix_caching by @Jintao-Huang in #6165
[bugfix] qwen3-vl support mixed data by @Jintao-Huang in #6161
[template] add_retry by @Jintao-Huang in #6138
[bugfix] Fix multimodal lazy_tokenize false by @Jintao-Huang in #6172
[template] update qwen3_vl grounding dataset format by @Jintao-Huang in #6178
[docs] update docs by @Jintao-Huang in #6180
[bugfix] add tools fileds in inputs2reqeusts by @hjh0119 in #6054
[grpo] Optimize vLLM weight synchronization & update buitin accuracy reward by @hjh0119 in #5773
[model] support Qwen/Qwen3Guard-Gen-0.6B series by @Jintao-Huang in #6189
[template] Support qwen3 omni mixed data by @Jintao-Huang in #6196
[docs] update qwen3_vl best practice by @Jintao-Huang in #6206
[vllm] support vllm_mm_processor_cache_gb by @hjh0119 in #6210
[megatron] fix qwen3_vl new_special_tokens by @Jintao-Huang in #6213
[megatron] add mcore save_args by @Jintao-Huang in #6216
[bugfix] fix dtype warning by @Jintao-Huang in #6219
[bugfix] fix infer pt dp by @Jintao-Huang in #6222
support training for multimodal reranker by @0russwest0 in #6192
[bugfix] fix reward_trainer logger by @Jintao-Huang in #6240
[model] Support deepseek-ocr by @Jintao-Huang in #6238
[docs] update deepseek_ocr docs by @Jintao-Huang in #6242
[bugfix] fi...

Contributors

hpsun1109, CJack812, and 8 other contributors

Assets 2

Releases: modelscope/ms-swift

v3.12.1

What's Changed

New Contributors

Contributors

Uh oh!

v3.12.0

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

Contributors

Uh oh!

Patch release v3.11.3

Uh oh!

Patch release v3.11.2

Uh oh!

Patch release v3.11.1

Uh oh!

v3.11.0

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

Contributors

Uh oh!

Patch release v3.10.3

Uh oh!

Patch release v3.10.2

Uh oh!

Patch release v3.10.1

Uh oh!

v3.10.0

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

Contributors

Uh oh!