-
Notifications
You must be signed in to change notification settings - Fork 121
How to pass parameter in ensemble model? #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think I don't really get your question. |
For ensemble, you should pass |
Any guide for this please? as the REQUEST_INPUT_LEN is the intermediate result. |
I get your point. As far as I know, there is no way to support such feature in tensorrt_llm backend directly because it needs to change the source code of batch_manager. You can try to map the output of preporcess to input of postprocess directly. I am not sure is it doable. You can ask in tritonserver repo. |
okay, thanks. |
Any way for the ensemble mode to know the request token length, as I wanna cut the original tokens. |
May be you can reference this #95 |
@GooVincent Hi , |
Here is a demo, you can pass out the REQUEST_INPUT_LEN in the middle, |
Since the issue is solved, close this bug. If you still have issue/question, feel free to ask and we will open the issue again. |
I don't get you point of "set model_transaction_policy to be True". |
Could you try scripts here https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md#end-to-end-workflow-to-run-llama to run the streamming? |
* Add and run pre-commit hooks * Restore clang-format * Fix yaml spacing * Normalize spacing * Fix indentation of pre-commit-config.yaml * Clang to enforce 80 chars, pre-commit all PRs * Update copyrights * Remove extra line
As the normal procedure for tensorrtllm_backend is preprocessing -> (tensorrt_llm) process -> postprocessing. How to pass the customer parameter from the request, like request token length.
In my understanding, tensorrt_llm backend will finish the infer, it won't work to add input and output parameter. then the issue coming, in ensemble pipeline, how to pass the parameter from the preprocess module to poseprocess module.
Please any way to solve this issue?
The text was updated successfully, but these errors were encountered: