v3.1.1
中文版
新特性
- 支持大模型、多模态模型、Agent、多节点GRPO训练,参考这里。
- 支持Embeding模型训练,参考这里。
swift sample支持MCTS、蒸馏方式数据采样,支持多模态模型采样。- 支持自定义数据集评测,参考这里。
新模型
- AIDC-AI/Ovis2-2B系列
- Qwen/Qwen2.5-VL-72B-Instruct-AWQ系列
- stepfun-ai/GOT-OCR-2.0-hf
- stepfun-ai/Step-Audio-Chat
- mistralai/Mistral-Small-24B-Instruct-2501
新数据集
- GRPO相关
- AI-ModelScope/MATH-lighteval
- LLM-Research/xlam-function-calling-60k
- AI-MO/NuminaMath-TIR
- R1相关
- liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT
- modelscope/MathR, modelscope/MathR-32B-Distill
New Features
- Support for large models, multimodal models, Agents, and multi-node GRPO training. Refer to this documentation.
- Support for Embedding model training. Refer to this script.
swift samplesupports MCTS and distillation data sampling, as well as multimodal model sampling.- Support for custom dataset evaluation. Refer to this documentation.
New Models
- AIDC-AI/Ovis2-2B series
- Qwen/Qwen2.5-VL-72B-Instruct-AWQ series
- stepfun-ai/GOT-OCR-2.0-hf
- stepfun-ai/Step-Audio-Chat
- mistralai/Mistral-Small-24B-Instruct-2501
New Datasets
- Related to GRPO
- AI-ModelScope/MATH-lighteval
- LLM-Research/xlam-function-calling-60k
- AI-MO/NuminaMath-TIR
- Related to R1
- liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT
- modelscope/MathR, modelscope/MathR-32B-Distill
What's Changed
- Add evalscope native backend by @Yunnglin in #2981
- support mistralai/Mistral-Small-24B-Instruct-2501 by @Jintao-Huang in #3030
- MCTS Sampler by @lxline in #2967
- fix windows url by @Jintao-Huang in #3041
- Support sample multi modal models by @tastelikefeet in #3048
- Support sft embedding model by @tastelikefeet in #3039
- support GRPO by @hjh0119 in #3022
- fix grpo by @hjh0119 in #3050
- fix grpo by @Jintao-Huang in #3051
- update docs (fine-tuning) by @Jintao-Huang in #3052
- bump version by @Jintao-Huang in #3053
- fix grpo model_type by @Jintao-Huang in #3057
- update rlhf documents by @hjh0119 in #3055
- add grpo multinode scripts by @hjh0119 in #3059
- Fix orm env by @tastelikefeet in #3065
- Support external plugins by @tastelikefeet in #3066
- update docs by @Jintao-Huang in #3070
- fix grpo nan by @Jintao-Huang in #3075
- fix grpo metric_for_best_model by @Jintao-Huang in #3077
- register MathR by @mi804 in #3078
- fix accuracy reward by @hjh0119 in #3080
- fix SwiftModel by @Jintao-Huang in #3071
- Fix grpo vlm (internvl2.5) by @Jintao-Huang in #3081
- Refactor orm prm by @Jintao-Huang in #3085
- fix competition math by @tastelikefeet in #3086
- support cuda operations to npu by @tastelikefeet in #3087
- fix grpo temperature 0.7->0.9 by @Jintao-Huang in #3091
- support grpo vllm lora by @Jintao-Huang in #3095
- Feat: Eval custom dataset by @Yunnglin in #3093
- cosine and repetition reward for GRPO by @hjh0119 in #3079
- fix get_device by @Jintao-Huang in #3097
- Fix/grpo by @MrToy in #3101
- fix unsloth by @tastelikefeet in #3100
- support grpo npu by @Jintao-Huang in #3102
- fix grpo zero3 by @Jintao-Huang in #3104
- support log completions by @Jintao-Huang in #3110
- Fix typos by @co63oc in #3111
- update trl version by @Jintao-Huang in #3117
- fix eval docs by @Jintao-Huang in #3118
- Support llamapro for grpo by @tastelikefeet in #3119
- fix grpo trainer by @Jintao-Huang in #3120
- fix cleanup error by @Jintao-Huang in #3121
- Fix typos by @co63oc in #3123
- refactor patcher by @Jintao-Huang in #3124
- Support lmdeploy in GRPO by @tastelikefeet in #3126
- support stepfun-ai/Step-Audio-Chat by @Jintao-Huang in #3127
- update docs by @Jintao-Huang in #3131
- fix grpo pt infer generation_config by @Jintao-Huang in #3135
- support_local_path by @Jintao-Huang in #3140
- Support swanlab by @tastelikefeet in #3142
- fix grpo sample by @MrToy in #3144
- fix grpo vllm lora by @Jintao-Huang in #3134
- fix create_repo by @tastelikefeet in #3147
- fix grpo zero3 by @Jintao-Huang in #3149
- docs: report_to add swanlab by @Zeyi-Lin in #3158
- Support Ovis2 models by @DaozeZhang in #3163
- support grpo metric_for_best_model by @Jintao-Huang in #3155
- Fix ovis2 by @Jintao-Huang in #3169
- Support Agent GRPO by @tastelikefeet in #3170
- fix max_length error by @Jintao-Huang in #3173
- fix streaming by @Jintao-Huang in #3176
- Fix/agent grpo by @tastelikefeet in #3172
- Fix lmdeploy branch by @tastelikefeet in #3145
- fix internvl-4b by @Jintao-Huang in #3178
- refactor cosine orm by @Jintao-Huang in #3179
- fix sampler reaches max_length by @tastelikefeet in #3180
- Fix prm in sampler by @tastelikefeet in #3184
- Support GOT_OCR2_hf by @DaozeZhang in #3182
- Knowledge Distillation sampling by @mi804 in #3185
- compat vllm==0.7.2 by @Jintao-Huang in #3083
- support r1 dataset by @Jintao-Huang in #3191
- Refactor grpo dataset by @Jintao-Huang in #3192
- Add links to agent grpo by @tastelikefeet in #3193
New Contributors
- @MrToy made their first contribution in #3101
- @co63oc made their first contribution in #3111
- @Zeyi-Lin made their first contribution in #3158
Full Changelog: v3.1.0...v3.1.1