Skip to content

Conversation

@aleien95
Copy link
Collaborator

What does this PR do?
Extends existing text sequence parallelism to support multimodal (image + text) training for SFT and DPO with padding removal optimization.

Key Changes
Multimodal sequence parallelism for SFT/DPO: Added support for vision-language models in both Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) training while maintaining backward compatibility with text-only training
Remove padding optimization: Implemented dynamic padding removal to reduce memory usage and improve training efficiency
Enhanced parallel communication: Updated tensor distribution patterns for multimodal data

lilin3 added 3 commits July 30, 2025 16:35
- Remove data printing functions from SFT and DPO trainers for better performance
- Replace 360-example-vl.sh with separate SFT and DPO training scripts
- Add SFT visual-language demo dataset (data/sft-vl-demo/)
- Update dataset configuration to support new data structure
… code style

- Add multimodal_forwards module to centrally manage multimodal model forward logic
- Extract and optimize forward function implementations for Qwen2 VL and Qwen2.5 VL
- Improve sequence_parallel related code structure
@HaoshengZou HaoshengZou merged commit 5f64acf into Qihoo360:sp Oct 8, 2025
@HaoshengZou HaoshengZou changed the title Sp vl SP on VLMs Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants