Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: add support to outputs key in inference request inputs #3405

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gabrielmscampos
Copy link

Description

According to KServe documentation on Open Inference Protocol, the server can optionally accept an outputs key in the inference request body that instructs how many outputs the inference response should have.

Considering a model that takes a tensor NxM as input and output multiple values, the current implementation only returns as many outputs as inputs were originally sent.

Example:

Consider the following inference request body:

{
  "id" : "42",
  "inputs" : [
    {
      "name" : "input-0",
      "shape" : [ 2, 2 ],
      "datatype" : "UINT32",
      "data" : [ 1, 2, 3, 4 ]
    }
  ]
}

If the deployed model outputs multiple values, such as:

with torch.inference_mode():
    output_1, output_2, output_3 = self.model(data)

KServev2Envelope will raise an IndexError in _batch_to_json, considering the postprocess function returns more then one value:

def postprocess(self, data):
    output_1, output_2 = data
    return [output_1, output_2]

This merge request introduces an optional outputs key in the inference request, so that the user can specify multiple outputs. For example:

{
  "id" : "42",
  "inputs" : [
    {
      "name" : "input-0",
      "shape" : [ 2, 2 ],
      "datatype" : "UINT32",
      "data" : [ 1, 2, 3, 4 ]
    }
  ],
  "outputs":[
    {
      "name": "output-0",
    },
    {
      "name": "output-1",
    }
  ]
}

Those output names are considered in _batch_to_json and conveniently format the output, considering as many outputs sent in the inference request.

If no outputs key is sent, _batch_to_json fallback to the original behavior of formatting the output, considering as many inputs sent.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A
    Logs for Test A

  • Test B
    Logs for Test B

Checklist:

  • [ x] Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • [ x] Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

gabrielmscampos added a commit to gabrielmscampos/ml_examples that referenced this pull request Mar 26, 2025
Although pytorch supports KServe v2 protocol, its implementation doesn't fully adhere to the v2 protocol and models that output more than one array cause the KServeEvelope to crash. More at: pytorch/serve#3405
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant