Skip to content

[Feature] Added the /v1/abort_requests endpoint#6992

Open
qwes5s5 wants to merge 2 commits intoPaddlePaddle:developfrom
qwes5s5:abort_requests
Open

[Feature] Added the /v1/abort_requests endpoint#6992
qwes5s5 wants to merge 2 commits intoPaddlePaddle:developfrom
qwes5s5:abort_requests

Conversation

@qwes5s5
Copy link
Copy Markdown
Collaborator

@qwes5s5 qwes5s5 commented Mar 24, 2026

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Currently, the logic for interrupting requests and stopping inference can only be triggered by the client disconnecting. Since there is no interface for active triggering, this new endpoint is required to encapsulate and expose the existing internal capabilities.

Modifications

The /v1/abort_requests endpoint has been added to both the api_server and the router.

Usage or Command

The /v1/abort_requests endpoint accepts two parameters: abort_all and req_ids. At least one of these must be provided.

Abort all current requests:

curl -X POST http://0.0.0.0:8180/v1/abort_requests \
  -H "Content-Type: application/json" \
  -d '{"abort_all": true}'

Abort specific requests:

curl -X POST http://0.0.0.0:8180/v1/abort_requests \
  -H "Content-Type: application/json" \
  -d '{"req_ids": ["chatcmpl-abc123", "chatcmpl-def456"]}'

Output Example:

{
    "request_id": "control-...",
    "status": "success",
    "error_message": null,
    "result": {
        "aborted": [
            {"request_id": "chatcmpl-abc123_0", "output_token_count": 15}
        ],
        "not_found": ["chatcmpl-notexist"]
    }
}

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Mar 24, 2026

Thanks for your contribution!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 9.09091% with 110 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@5c60e2f). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/engine/common_engine.py 2.73% 71 Missing ⚠️
fastdeploy/router/router.py 14.81% 23 Missing ⚠️
fastdeploy/entrypoints/openai/api_server.py 20.00% 8 Missing ⚠️
fastdeploy/entrypoints/openai/serving_chat.py 0.00% 2 Missing and 2 partials ⚠️
...astdeploy/entrypoints/openai/serving_completion.py 0.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6992   +/-   ##
==========================================
  Coverage           ?   73.49%           
==========================================
  Files              ?      402           
  Lines              ?    56569           
  Branches           ?     8935           
==========================================
  Hits               ?    41573           
  Misses             ?    12060           
  Partials           ?     2936           
Flag Coverage Δ
GPU 73.49% <9.09%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@qwes5s5 qwes5s5 requested a review from Jiang-Jia-Jun March 24, 2026 12:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 在 FastDeploy 的 OpenAI 兼容服务链路中新增主动中断推理请求的能力,通过新增 /v1/abort_requests 控制接口让客户端可显式终止正在运行/排队的请求,而不再仅依赖客户端断连触发。

Changes:

  • api_serverrouter 新增 /v1/abort_requests 路由并向 engine 下发 control method。
  • 在 engine 增加 abort_requests 控制方法,触发 scheduler/resource_manager 中止并回填部分结果。
  • 在 chat/completion 的响应处理中尝试将被中止请求的 finish_reason 标记为 abort

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
fastdeploy/router/router.py 新增 router 侧 /v1/abort_requests,转发到各实例并聚合结果
fastdeploy/entrypoints/openai/api_server.py 新增 api_server 侧 /v1/abort_requests,封装为 ControlRequest 下发到 engine
fastdeploy/engine/common_engine.py 新增 engine 控制方法 _control_abort_requests 与等待清理逻辑
fastdeploy/engine/sched/resource_manager_v1.py abort 回收后更新 metrics;新增 aborting 集合查询方法
fastdeploy/entrypoints/openai/serving_chat.py 检测 “Aborted” 错误并设置 finish_reason
fastdeploy/entrypoints/openai/serving_completion.py 检测 “Aborted” 错误并设置 finish_reason
Comments suppressed due to low confidence (1)

fastdeploy/entrypoints/openai/serving_chat.py:817

  • 这里把 finish_reason 设为 "abort",但 fastdeploy/entrypoints/openai/protocol.py 中 ChatCompletionResponseChoice.finish_reason 的 Literal 目前不包含 "abort"。该值会在构造 ChatCompletionResponseChoice 时触发 Pydantic 校验错误,导致请求失败。建议同步扩展 protocol.py 的 finish_reason 取值范围,或改用现有合法 finish_reason 并通过 error_msg/自定义字段区分 abort。
        finish_reason = "stop"
        if previous_num_tokens != max_tokens:
            finish_reason = "stop"
            if output.get("tool_calls"):
                finish_reason = "tool_calls"
        else:
            finish_reason = "length"
        if data.get("error_msg", None) is not None and "Recover" in data["error_msg"]:
            finish_reason = "recover_stop"

        if data.get("error_msg", None) is not None and "Aborted" in data["error_msg"]:
            finish_reason = "abort"
        return ChatCompletionResponseChoice(
            index=idx,
            message=message,
            logprobs=logprobs_full_res,
            draft_logprobs=draft_logprobs_full_res,
            prompt_logprobs=prompt_logprobs_full_res,
            finish_reason=finish_reason,
            speculate_metrics=speculate_metrics,

@qwes5s5
Copy link
Copy Markdown
Collaborator Author

qwes5s5 commented Mar 30, 2026

/re-run ci_xpu

Jiang-Jia-Jun
Jiang-Jia-Jun previously approved these changes Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants