fix: fix return double first token #3241

Shunkangz · 2025-04-02T13:51:49Z

In PD, we have different behaviors for overlap and non-overlap scheduler. With non-overlap scheduler, we always return the first two generated tokens togethers. With overlap scheduler, the request might return the response without calculating the second generated token. This MR fix the change of #2986 in overlap scheduler case.

Shunkangz · 2025-04-02T13:56:24Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-04-02T14:02:02Z

PR_Github #1022 [ run ] triggered by Bot

tensorrt_llm/_torch/pyexecutor/decoder.py

tensorrt_llm/_torch/pyexecutor/py_executor.py

tensorrt-cicd · 2025-04-03T00:24:12Z

PR_Github #1022 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #787 completed with status: 'FAILURE'

tensorrt_llm/_torch/pyexecutor/py_executor.py

Shunkangz · 2025-04-07T15:22:20Z

/bot run

tensorrt-cicd · 2025-04-07T15:27:39Z

PR_Github #1336 [ run ] triggered by Bot

Shunkangz · 2025-04-07T15:35:52Z

/bot run

tensorrt-cicd · 2025-04-07T15:41:18Z

PR_Github #1340 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-07T15:42:27Z

PR_Github #1336 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-04-07T16:07:28Z

PR_Github #1340 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #1005 completed with status: 'FAILURE'

Tabrizian · 2025-04-07T23:12:53Z

tensorrt_llm/_torch/pyexecutor/py_executor.py

+                if request.state == LlmRequestState.GENERATION_IN_PROGRESS:
+                    if request.py_decoding_iter == 1:
+                        new_active_requests.append(request)
+                        continue


Can we modify the condition here to return a nullptr when the request is generation only but has created only one token? I think that might be cleaner.

Iman, I also think this would be cleaner, but we might need to change C++ disaggregated examples, so that we extract tokens from context response. It doesn't look like we're doing that right now:

https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/cpp/executor/executorExampleDisaggregated.cpp#L259-L266

https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/cpp/disaggServerBenchmark.cpp

Thanks for the suggestion. I will modify this.

Shunkangz · 2025-04-08T04:47:45Z

/bot run

tensorrt-cicd · 2025-04-08T04:53:14Z

PR_Github #1405 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-08T05:02:48Z

PR_Github #1405 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #1053 completed with status: 'FAILURE'

Shunkangz · 2025-04-08T05:29:34Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-04-08T05:35:10Z

PR_Github #1409 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-08T11:31:21Z

PR_Github #1409 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1058 completed with status: 'FAILURE'

Shunkangz · 2025-04-09T04:12:15Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-04-09T04:17:34Z

PR_Github #1544 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-09T04:50:44Z

PR_Github #1544 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1154 completed with status: 'FAILURE'

Shunkangz · 2025-04-09T09:25:09Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-04-09T09:30:23Z

PR_Github #1591 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-09T11:33:35Z

PR_Github #1591 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1191 completed with status: 'FAILURE'

Shunkangz · 2025-04-09T12:17:34Z

/bot run --add-multi-gpu-test

Signed-off-by: Shunkang <[email protected]> Add double first token check Signed-off-by: Shunkang <[email protected]> Adapt for py_decoding_iter Signed-off-by: Shunkang <[email protected]> Roll back CI change Signed-off-by: Shunkang <[email protected]> Add check Signed-off-by: Shunkang <[email protected]>

Signed-off-by: Shunkang <[email protected]>

Shunkangz · 2025-04-09T12:20:24Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-04-09T12:22:58Z

PR_Github #1611 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-09T12:25:51Z

PR_Github #1612 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-09T12:26:41Z

PR_Github #1611 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-04-09T12:27:11Z

PR_Github #1612 [ run ] completed with state FAILURE

Shunkangz · 2025-04-09T12:27:39Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-04-09T12:33:12Z

PR_Github #1613 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-09T15:10:19Z

PR_Github #1613 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1207 completed with status: 'FAILURE'

pcastonguay · 2025-04-09T19:24:29Z

tensorrt_llm/executor/result.py

+        if self.disaggregated_params is not None and \
            len(response_tensors.output_token_ids[src_idx]) == 2:
            output._last_token_ids_len = 1


I don't think we should rely on the len of output_token_ids since for spec decoding, we could have 2 tokens even after the first gen token. Can you have a look at: #3427. I think it's a more general fix.

Thank you. I also think that should be a good solution. I will close this PR.

Shunkangz requested review from kaiyux and Shixiaowei02 April 2, 2025 13:51

Shunkangz self-assigned this Apr 2, 2025

Shunkangz requested review from pcastonguay and Tabrizian April 2, 2025 13:52

kaiyux reviewed Apr 2, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/decoder.py Show resolved Hide resolved

Tabrizian reviewed Apr 2, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/py_executor.py Show resolved Hide resolved

Tabrizian reviewed Apr 3, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/py_executor.py Show resolved Hide resolved

Shunkangz force-pushed the fix_double_first_gen_token branch from eb8ebe0 to 399bc94 Compare April 5, 2025 13:24

Shunkangz mentioned this pull request Apr 6, 2025

fix: fix the py_decoding_iter update in decoder #3297

Merged

Shunkangz force-pushed the fix_double_first_gen_token branch 2 times, most recently from df5c8c7 to 8fdb7a8 Compare April 7, 2025 15:31

Tabrizian reviewed Apr 7, 2025

View reviewed changes

Shunkangz force-pushed the fix_double_first_gen_token branch 2 times, most recently from f1bd653 to ed97ba3 Compare April 8, 2025 04:46

Shunkangz force-pushed the fix_double_first_gen_token branch from ed97ba3 to 7331397 Compare April 8, 2025 05:29

Shunkang added 2 commits April 9, 2025 20:20

Fix MTP problem

1df2674

Signed-off-by: Shunkang <[email protected]>

Shunkangz force-pushed the fix_double_first_gen_token branch from cc12e50 to 1df2674 Compare April 9, 2025 12:20

Tabrizian approved these changes Apr 9, 2025

View reviewed changes

pcastonguay reviewed Apr 9, 2025

View reviewed changes

Shunkangz closed this Apr 10, 2025

fix: fix return double first token #3241

fix: fix return double first token #3241

Uh oh!

Conversation

Shunkangz commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shunkangz commented Apr 2, 2025

Uh oh!

tensorrt-cicd commented Apr 2, 2025

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Apr 3, 2025

Uh oh!

Uh oh!

Shunkangz commented Apr 7, 2025

Uh oh!

tensorrt-cicd commented Apr 7, 2025

Uh oh!

Shunkangz commented Apr 7, 2025

Uh oh!

tensorrt-cicd commented Apr 7, 2025

Uh oh!

tensorrt-cicd commented Apr 7, 2025

Uh oh!

tensorrt-cicd commented Apr 7, 2025

Uh oh!

Tabrizian Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

pcastonguay Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

Shunkangz Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

Shunkangz commented Apr 8, 2025

Uh oh!

tensorrt-cicd commented Apr 8, 2025

Uh oh!

tensorrt-cicd commented Apr 8, 2025

Uh oh!

Shunkangz commented Apr 8, 2025

Uh oh!

tensorrt-cicd commented Apr 8, 2025

Uh oh!

tensorrt-cicd commented Apr 8, 2025

Uh oh!

Shunkangz commented Apr 9, 2025

Uh oh!

tensorrt-cicd commented Apr 9, 2025

Uh oh!

tensorrt-cicd commented Apr 9, 2025

Uh oh!

Shunkangz commented Apr 9, 2025

Uh oh!

tensorrt-cicd commented Apr 9, 2025

Uh oh!

tensorrt-cicd commented Apr 9, 2025

Uh oh!

Shunkangz commented Apr 9, 2025

Uh oh!

Shunkangz commented Apr 9, 2025

Uh oh!

tensorrt-cicd commented Apr 9, 2025

Uh oh!

tensorrt-cicd commented Apr 9, 2025

Uh oh!

tensorrt-cicd commented Apr 9, 2025

Uh oh!

tensorrt-cicd commented Apr 9, 2025

Uh oh!

Shunkangz commented Apr 9, 2025

Uh oh!

tensorrt-cicd commented Apr 9, 2025

Uh oh!

tensorrt-cicd commented Apr 9, 2025

Uh oh!

pcastonguay Apr 9, 2025

Shunkangz commented Apr 2, 2025 •

edited

Loading