Skip to content

[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models#7234

Merged
Jiang-Jia-Jun merged 1 commit intoPaddlePaddle:release/2.5from
K11OntheBoat:R25_pick_mmFix
Apr 8, 2026
Merged

[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models#7234
Jiang-Jia-Jun merged 1 commit intoPaddlePaddle:release/2.5from
K11OntheBoat:R25_pick_mmFix

Conversation

@K11OntheBoat
Copy link
Copy Markdown
Collaborator

Motivation

在部署多模态模型的时候,当开启--deploy-modality 'text' 开关,获得一个干净的纯文runtime. 不会有多余的多模部分来干扰服务的资源和推理性能. 收益: xx 多模态模型在使用后, 纯文 benchamrk,QPS 提升2.5倍.

Modifications

enable_mm 代表模型具有多模态能力. enable_mm_runtime 代表多模态runtime,enable_mm_runtime=false 代表纯文runtime.

Usage or Command

多模态模型起服务带上--deploy-modality 'text'开关.

Accuracy Tests

Base 模型,打开和关闭--deploy-modality 'text' ,纯文请求的输入token和输出token一致.

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 8, 2026

Thanks for your contribution!

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


liuruian seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot paddle-bot bot added the contributor External developers label Apr 8, 2026
Copy link
Copy Markdown

@fastdeploy-bot fastdeploy-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-08 12:17 CST

📋 Review 摘要

PR 概述:为多模态模型提供纯文本部署模式,通过 --deploy-modality 'text' 开关获得干净的纯文本 runtime,提升 QPS 约 2.5 倍。

变更范围:config、engine、worker、attention backend

影响面 Tag[Optimization] [Engine] [Models]

问题

级别 文件 概述
🟡 建议 fastdeploy/worker/input_batch.py:831 ProposerInputBatch 中存在重复的 tensor 初始化
❓ 疑问 fastdeploy/config.py:1941 直接修改 model_config 属性的设计

总体评价

PR 核心逻辑正确,引入 enable_mm_runtimeenable_rope_3d_runtime 属性有效区分了"模型是否支持多模态"与"是否启用多模态 runtime",测试文件也正确更新了 mock 配置。建议修复 ProposerInputBatch.init_share_inputs 中的重复初始化代码。

-1,
dtype="int32",
)
self.attn_mask_offsets = paddle.full(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 代码中存在重复的 tensor 初始化。attn_mask_offsetsattn_mask_offsets_fullattn_mask_offsets_decoder 在同一方法中被初始化了两次(第 817-821 行和第 831-839 行),虽然参数相同不影响功能,但浪费了内存分配和计算资源。

建议删除第 831-839 行的重复初始化代码。

logger.info(
"Deploy modality is text; forcing the multimodal-capable model onto the 2D RoPE runtime path."
)
setattr(self.model_config, "rope_3d", False)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 直接修改 model_configrope_3duse_3d_rope 属性。虽然这不是 bug,且代码中有对应的日志说明,但直接修改模型配置对象可能不是最佳实践。

更好的方式是让 enable_rope_3d_runtime 属性直接检查 deploy_modality,而不是修改 model_config。这样能避免修改原始配置对象,使逻辑更清晰。

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 60.46512% with 17 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.5@2d6fa35). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/config.py 57.14% 4 Missing and 2 partials ⚠️
fastdeploy/worker/input_batch.py 50.00% 5 Missing and 1 partial ⚠️
fastdeploy/engine/common_engine.py 0.00% 1 Missing and 2 partials ⚠️
fastdeploy/engine/async_llm.py 0.00% 0 Missing and 1 partial ⚠️
...ecutor/layers/attention/flash_mask_attn_backend.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.5    #7234   +/-   ##
==============================================
  Coverage               ?   69.48%           
==============================================
  Files                  ?      390           
  Lines                  ?    54382           
  Branches               ?     8574           
==============================================
  Hits                   ?    37788           
  Misses                 ?    13866           
  Partials               ?     2728           
Flag Coverage Δ
GPU 69.48% <60.46%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 46ad25d into PaddlePaddle:release/2.5 Apr 8, 2026
30 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants