Release v3.1.1 · modelscope/ms-swift

中文版

新特性

支持大模型、多模态模型、Agent、多节点GRPO训练，参考这里。
支持Embeding模型训练，参考这里。
swift sample支持MCTS、蒸馏方式数据采样，支持多模态模型采样。
支持自定义数据集评测，参考这里。

新模型

AIDC-AI/Ovis2-2B系列
Qwen/Qwen2.5-VL-72B-Instruct-AWQ系列
stepfun-ai/GOT-OCR-2.0-hf
stepfun-ai/Step-Audio-Chat
mistralai/Mistral-Small-24B-Instruct-2501

新数据集

GRPO相关
- AI-ModelScope/MATH-lighteval
- LLM-Research/xlam-function-calling-60k
- AI-MO/NuminaMath-TIR
R1相关
- liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT
- modelscope/MathR, modelscope/MathR-32B-Distill

New Features

Support for large models, multimodal models, Agents, and multi-node GRPO training. Refer to this documentation.
Support for Embedding model training. Refer to this script.
swift sample supports MCTS and distillation data sampling, as well as multimodal model sampling.
Support for custom dataset evaluation. Refer to this documentation.

New Models

AIDC-AI/Ovis2-2B series
Qwen/Qwen2.5-VL-72B-Instruct-AWQ series
stepfun-ai/GOT-OCR-2.0-hf
stepfun-ai/Step-Audio-Chat
mistralai/Mistral-Small-24B-Instruct-2501

New Datasets

Related to GRPO
- AI-ModelScope/MATH-lighteval
- LLM-Research/xlam-function-calling-60k
- AI-MO/NuminaMath-TIR
Related to R1
- liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT
- modelscope/MathR, modelscope/MathR-32B-Distill

What's Changed

Add evalscope native backend by @Yunnglin in #2981
support mistralai/Mistral-Small-24B-Instruct-2501 by @Jintao-Huang in #3030
MCTS Sampler by @lxline in #2967
fix windows url by @Jintao-Huang in #3041
Support sample multi modal models by @tastelikefeet in #3048
Support sft embedding model by @tastelikefeet in #3039
support GRPO by @hjh0119 in #3022
fix grpo by @hjh0119 in #3050
fix grpo by @Jintao-Huang in #3051
update docs (fine-tuning) by @Jintao-Huang in #3052
bump version by @Jintao-Huang in #3053
fix grpo model_type by @Jintao-Huang in #3057
update rlhf documents by @hjh0119 in #3055
add grpo multinode scripts by @hjh0119 in #3059
Fix orm env by @tastelikefeet in #3065
Support external plugins by @tastelikefeet in #3066
update docs by @Jintao-Huang in #3070
fix grpo nan by @Jintao-Huang in #3075
fix grpo metric_for_best_model by @Jintao-Huang in #3077
register MathR by @mi804 in #3078
fix accuracy reward by @hjh0119 in #3080
fix SwiftModel by @Jintao-Huang in #3071
Fix grpo vlm (internvl2.5) by @Jintao-Huang in #3081
Refactor orm prm by @Jintao-Huang in #3085
fix competition math by @tastelikefeet in #3086
support cuda operations to npu by @tastelikefeet in #3087
fix grpo temperature 0.7->0.9 by @Jintao-Huang in #3091
support grpo vllm lora by @Jintao-Huang in #3095
Feat: Eval custom dataset by @Yunnglin in #3093
cosine and repetition reward for GRPO by @hjh0119 in #3079
fix get_device by @Jintao-Huang in #3097
Fix/grpo by @MrToy in #3101
fix unsloth by @tastelikefeet in #3100
support grpo npu by @Jintao-Huang in #3102
fix grpo zero3 by @Jintao-Huang in #3104
support log completions by @Jintao-Huang in #3110
Fix typos by @co63oc in #3111
update trl version by @Jintao-Huang in #3117
fix eval docs by @Jintao-Huang in #3118
Support llamapro for grpo by @tastelikefeet in #3119
fix grpo trainer by @Jintao-Huang in #3120
fix cleanup error by @Jintao-Huang in #3121
Fix typos by @co63oc in #3123
refactor patcher by @Jintao-Huang in #3124
Support lmdeploy in GRPO by @tastelikefeet in #3126
support stepfun-ai/Step-Audio-Chat by @Jintao-Huang in #3127
update docs by @Jintao-Huang in #3131
fix grpo pt infer generation_config by @Jintao-Huang in #3135
support_local_path by @Jintao-Huang in #3140
Support swanlab by @tastelikefeet in #3142
fix grpo sample by @MrToy in #3144
fix grpo vllm lora by @Jintao-Huang in #3134
fix create_repo by @tastelikefeet in #3147
fix grpo zero3 by @Jintao-Huang in #3149
docs: report_to add swanlab by @Zeyi-Lin in #3158
Support Ovis2 models by @DaozeZhang in #3163
support grpo metric_for_best_model by @Jintao-Huang in #3155
Fix ovis2 by @Jintao-Huang in #3169
Support Agent GRPO by @tastelikefeet in #3170
fix max_length error by @Jintao-Huang in #3173
fix streaming by @Jintao-Huang in #3176
Fix/agent grpo by @tastelikefeet in #3172
Fix lmdeploy branch by @tastelikefeet in #3145
fix internvl-4b by @Jintao-Huang in #3178
refactor cosine orm by @Jintao-Huang in #3179
fix sampler reaches max_length by @tastelikefeet in #3180
Fix prm in sampler by @tastelikefeet in #3184
Support GOT_OCR2_hf by @DaozeZhang in #3182
Knowledge Distillation sampling by @mi804 in #3185
compat vllm==0.7.2 by @Jintao-Huang in #3083
support r1 dataset by @Jintao-Huang in #3191
Refactor grpo dataset by @Jintao-Huang in #3192
Add links to agent grpo by @tastelikefeet in #3193

New Contributors

@MrToy made their first contribution in #3101
@co63oc made their first contribution in #3111
@Zeyi-Lin made their first contribution in #3158

Full Changelog: v3.1.0...v3.1.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.1.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

中文版

新特性

新模型

新数据集

New Features

New Models

New Datasets

What's Changed

New Contributors

Contributors

Uh oh!