refactor: add support to outputs key in inference request inputs #3405

gabrielmscampos · 2025-03-25T11:14:41Z

Description

According to KServe documentation on Open Inference Protocol, the server can optionally accept an outputs key in the inference request body that instructs how many outputs the inference response should have.

Considering a model that takes a tensor NxM as input and output multiple values, the current implementation only returns as many outputs as inputs were originally sent.

Example:

Consider the following inference request body:

{
  "id" : "42",
  "inputs" : [
    {
      "name" : "input-0",
      "shape" : [ 2, 2 ],
      "datatype" : "UINT32",
      "data" : [ 1, 2, 3, 4 ]
    }
  ]
}

If the deployed model outputs multiple values, such as:

with torch.inference_mode():
    output_1, output_2, output_3 = self.model(data)

KServev2Envelope will raise an IndexError in _batch_to_json, considering the postprocess function returns more then one value:

def postprocess(self, data):
    output_1, output_2 = data
    return [output_1, output_2]

This merge request introduces an optional outputs key in the inference request, so that the user can specify multiple outputs. For example:

{
  "id" : "42",
  "inputs" : [
    {
      "name" : "input-0",
      "shape" : [ 2, 2 ],
      "datatype" : "UINT32",
      "data" : [ 1, 2, 3, 4 ]
    }
  ],
  "outputs":[
    {
      "name": "output-0",
    },
    {
      "name": "output-1",
    }
  ]
}

Those output names are considered in _batch_to_json and conveniently format the output, considering as many outputs sent in the inference request.

If no outputs key is sent, _batch_to_json fallback to the original behavior of formatting the output, considering as many inputs sent.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
Logs for Test A
Test B
Logs for Test B

Checklist:

[ x] Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
[ x] Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Although pytorch supports KServe v2 protocol, its implementation doesn't fully adhere to the v2 protocol and models that output more than one array cause the KServeEvelope to crash. More at: pytorch/serve#3405

refactor: add support to outputs key in inference request inputs

147e833

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: add support to outputs key in inference request inputs #3405

refactor: add support to outputs key in inference request inputs #3405

Uh oh!

gabrielmscampos commented Mar 25, 2025

Uh oh!

Uh oh!

refactor: add support to outputs key in inference request inputs #3405

Are you sure you want to change the base?

refactor: add support to outputs key in inference request inputs #3405

Uh oh!

Conversation

gabrielmscampos commented Mar 25, 2025

Description

Example:

Type of change

Feature/Issue validation/testing

Checklist:

Uh oh!

Uh oh!