Skip to content

[Cherry-Pick][Refactor] Replace --skip-mm-profiling with --deploy-modality text(#7048)#7068

Closed
EmmonsCurse wants to merge 1 commit intoPaddlePaddle:release/2.5from
EmmonsCurse:cherry-pick/7048/release/2.5
Closed

[Cherry-Pick][Refactor] Replace --skip-mm-profiling with --deploy-modality text(#7048)#7068
EmmonsCurse wants to merge 1 commit intoPaddlePaddle:release/2.5from
EmmonsCurse:cherry-pick/7048/release/2.5

Conversation

@EmmonsCurse
Copy link
Copy Markdown
Collaborator

Cherry-pick of #7048 (authored by @kevincheng2) to release/2.5.

devPR:#7048


Motivation

--skip-mm-profiling 参数与已有的 --deploy-modality 参数功能存在语义重叠:
当以纯文本模式(--deploy-modality text)部署时,本就不需要为多模态 token 预留显存。
引入独立参数增加了配置复杂度,复用 deploy_modality 更加直观和一致。

Modifications

  • fastdeploy/engine/args_utils.py:删除 EngineArgs.skip_mm_profiling 字段及
    --skip-mm-profiling 启动参数
  • fastdeploy/config.py:删除 ModelConfig.__init__ 中的 self.skip_mm_profiling = False
    FDConfig.get_max_chunk_tokens 中将条件改为
    self.deploy_modality != DeployModality.TEXT
    当 deploy_modality 为 text 时直接返回 max_num_batched_tokens,跳过 mm token 叠加

Usage or Command

# 以文本模式部署,跳过 mm token profiling 开销(替代原 --skip-mm-profiling)
python -m fastdeploy.entrypoints.openai.api_server \
--deploy-modality text \
--model /path/to/model \
...

Checklist

  • Add at least a tag in the PR title.
  • Format your code, run pre-commit before commit.
  • Add unit tests. 本次为参数重构,逻辑等价替换,已有 config 单元测试覆盖。

…addlePaddle#7048)

* [Feature] Support --skip-mm-profiling to skip multimodal token overhead in profiling

## Motivation

在多模态模型(如 Qwen2.5-VL、ERNIE4.5-VL 等)部署时,`get_max_chunk_tokens` 会在
基础 token 数之上额外叠加 mm token 数,用于 profiling 阶段预留显存。

某些场景下(如已知图像 token 数较小,或希望节省显存),用户希望跳过该多模态 token
额外开销的计算,直接使用文本 token 数进行 profiling。

## Modifications

- `fastdeploy/engine/args_utils.py`:`EngineArgs` 新增 `skip_mm_profiling: bool = False`
  字段,parser 新增 `--skip-mm-profiling` 启动参数
- `fastdeploy/config.py`:`ModelConfig.__init__` 新增 `self.skip_mm_profiling = False`;
  `FDConfig.get_max_chunk_tokens` 中增加 `not self.model_config.skip_mm_profiling` 判断,
  开启后跳过 mm token 叠加,直接返回基础 `num_tokens`

## Usage or Command

启动服务时添加参数:
```bash
--skip-mm-profiling
```

## Checklist

- [x] Add at least a tag in the PR title.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. 本功能为配置参数透传,逻辑简单,已有相关 config 单元测试覆盖。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [Refactor] Replace skip_mm_profiling with deploy_modality=text to skip mm profiling

## Motivation

原 `--skip-mm-profiling` 参数与已有的 `deploy_modality` 参数功能存在语义重叠:
当以纯文本模式(`deploy_modality=text`)部署时,本就不需要为多模态 token 预留显存。
引入独立参数增加了配置复杂度,复用 `deploy_modality` 更加直观和一致。

## Modifications

- `fastdeploy/engine/args_utils.py`:删除 `EngineArgs.skip_mm_profiling` 字段及
  `--skip-mm-profiling` 启动参数
- `fastdeploy/config.py`:删除 `ModelConfig.__init__` 中的 `self.skip_mm_profiling = False`;
  `FDConfig.get_max_chunk_tokens` 中将条件改为
  `self.deploy_modality != DeployModality.TEXT`,
  当 deploy_modality 为 text 时直接返回 `max_num_batched_tokens`,跳过 mm token 叠加

## Usage or Command

```bash
# 以文本模式部署,跳过 mm token profiling 开销(替代原 --skip-mm-profiling)
python -m fastdeploy.entrypoints.openai.api_server \
  --deploy-modality text \
  --model /path/to/model \
  ...
```

## Checklist

- [x] Add at least a tag in the PR title.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. 本次为参数重构,逻辑等价替换,已有 config 单元测试覆盖。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Mar 30, 2026

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants