-
Couldn't load subscription status.
- Fork 1.8k
fix: fix return double first token #3241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -179,7 +179,7 @@ def _handle_sequence(self, finish_reasons, response_tensors, | |
| # Skip output the first generated token in generation response | ||
| # TODO: We should have a better way to handle this when enable | ||
| # beam search with PD. | ||
| if not self.sampling_params.use_beam_search and \ | ||
| if self.disaggregated_params is not None and \ | ||
| len(response_tensors.output_token_ids[src_idx]) == 2: | ||
| output._last_token_ids_len = 1 | ||
|
Comment on lines
+182
to
184
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should rely on the len of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you. I also think that should be a good solution. I will close this PR. |
||
|
|
||
|
|
@@ -352,10 +352,12 @@ class GenerationResult(GenerationResultBase): | |
| executor (GenerationExecutor, optional): The executor that created this result. Defaults to None. | ||
| ''' | ||
|
|
||
| def __init__(self, | ||
| generation_request: "GenerationRequest", | ||
| background_error_handler: Optional[Callable] = None, | ||
| executor: Optional["GenerationExecutor"] = None) -> None: | ||
| def __init__( | ||
| self, | ||
| generation_request: "GenerationRequest", | ||
| background_error_handler: Optional[Callable] = None, | ||
| executor: Optional["GenerationExecutor"] = None, | ||
| disaggregated_params: Optional[DisaggregatedParams] = None) -> None: | ||
| super().__init__( | ||
| generation_request.id, | ||
| generation_request.sampling_params, | ||
|
|
@@ -364,6 +366,7 @@ def __init__(self, | |
| ) | ||
| self._generation_request = generation_request | ||
| self._streaming = generation_request.streaming | ||
| self.disaggregated_params = disaggregated_params | ||
|
|
||
| # for aborting the request | ||
| self._executor: Optional[weakref.ReferenceType[ | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.