Skip to content

Commit

Permalink
updating 0.9 to 0.10
Browse files Browse the repository at this point in the history
Signed-off-by: agriffith50 <[email protected]>
  • Loading branch information
alexagriffith committed Dec 21, 2022
1 parent 9979ea4 commit cb5484d
Show file tree
Hide file tree
Showing 6 changed files with 72 additions and 82 deletions.
2 changes: 1 addition & 1 deletion docs/admin/serverless/kourier_networking/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Deploy InferenceService with Alternative Networking Layer
KServe v0.9 and prior versions create the top level `Istio Virtual Service` for routing to `InferenceService` components based on the virtual host or path based routing.
KServe creates the top level `Istio Virtual Service` for routing to `InferenceService` components based on the virtual host or path based routing.
Now KServe provides an option for disabling the top level virtual service to allow configuring other networking layers Knative supports.
For example, [Kourier](https://developers.redhat.com/blog/2020/06/30/kourier-a-lightweight-knative-serving-ingress) is an alternative networking layer and
the following steps show how you can deploy KServe with `Kourier`.
Expand Down
2 changes: 1 addition & 1 deletion docs/get_started/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,6 @@ The [Kubernetes CLI (`kubectl`)](https://kubernetes.io/docs/tasks/tools/install-
You can get started with a local deployment of KServe by using _KServe Quick installation script on Kind_:

```bash
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.9/hack/quick_install.sh" | bash
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.10/hack/quick_install.sh" | bash
```

21 changes: 10 additions & 11 deletions docs/modelserving/data_plane/data_plane.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ By implementing this protocol both inference clients and servers will increase t
portability by operating seamlessly on platforms that have standardized around this API. Kserve's inference protocol is endorsed by NVIDIA
Triton Inference Server, TensorFlow Serving, and ONNX Runtime Server.

![Data Plane](../images/dataplane.jpg)
Note: Protocol V2 uses /infer instead of :predict
![Data Plane](../../images/dataplane.jpg)
<br> Note: Protocol V2 uses /infer instead of :predict

### Concepts
**Component**: Each endpoint is composed of multiple components: "predictor", "explainer", and "transformer". The only required component is the predictor, which is the core of the system. As KServe evolves, we plan to increase the number of supported components to enable use cases like Outlier Detection.
Expand All @@ -25,7 +25,7 @@ Note: Protocol V2 uses /infer instead of :predict

Kserve supports two versions of its data plane, V1 and V2. V1 protocol offers a standard prediction workflow with HTTP/REST. The second version of the data-plane protocol addresses several issues found with the V1 data-plane protocol, including performance and generality across a large number of model frameworks and servers. Protocol V2 expands the capabilities of V1 by adding gRPC APIs.

### Main changes between V1 & V2 dataplane
### Main changes

* V2 does not currently support the explain endpoint
* V2 added Server Readiness/Liveness/Metadata endpoints
Expand All @@ -39,23 +39,22 @@ Kserve supports two versions of its data plane, V1 and V2. V1 protocol offers a
| API | Verb | Path |
| ------------- | ------------- | ------------- |
| List Models | GET | /v1/models |
| Model Ready | GET | /v1/models/<model_name> |
| Predict | POST | /v1/models/<model_name>:predict |
| Explain | POST | /v1/models/<model_name>:explain |
| Model Ready | GET | /v1/models/\<model_name\> |
| Predict | POST | /v1/models/\<model_name\>:predict |
| Explain | POST | /v1/models/\<model_name\>:explain |

### V2 APIs

| API | Verb | Path |
| ------------- | ------------- | ------------- |
| Inference | POST | v2/models/<model_name>[/versions/<model_version>]/infer |
<!-- TODO: uncomment when implemented | Model Readiness| GET | v2/models/<model_name>[/versions/<model_version>]/ready | -->
| Model Metadata | GET | v2/models/<model_name>[/versions/<model_version>] |
| Inference | POST | v2/models/\<model_name\>[/versions/\<model_version\>]/infer |
| Model Metadata | GET | v2/models/\<model_name\>[/versions/\<model_version\>] |
| Server Readiness | GET | v2/health/ready |
| Server Liveness | GET | v2/health/live |
| Server Metadata | GET | v2 |
| Inference | POST | v2/models/<model_name>[/versions/<model_version>]/infer |
<!-- TODO: uncomment when implemented | Model Readiness| GET | v2/models/\<model_name\>[/versions/<model_version>]/ready | -->

** path contents in `[]` are optional

Please see [V1 Protocol](/docs/modelserving/data_plane/v1_protocol.md) and [V2 Protocol](/docs/modelserving/data_plane/v2_protocol.md) documentation for more information.
Please see [V1 Protocol](./v1_protocol.md) and [V2 Protocol](./v2_protocol.md) documentation for more information.

12 changes: 6 additions & 6 deletions docs/modelserving/data_plane/v1_protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ KServe's V1 protocol offers a standardized prediction workflow across all model

| API | Verb | Path | Request Payload | Response Payload |
| ------------- | ------------- | ------------- | ------------- | ------------- |
| List Models | GET | /v1/models | | {"models": [<model_name>]} |
| Model Ready| GET | /v1/models/<model_name> | | {"name":<model_name>,"ready": $bool} |
| Predict | POST | /v1/models/<model_name>:predict | {"instances": []} ** | {"predictions": []} |
| Explain | POST | /v1/models/<model_name>:explain | {"instances": []} **| {"predictions": [], "explanations": []} | |
| List Models | GET | /v1/models | | {"models": \[\<model_name\>\]} |
| Model Ready| GET | /v1/models/\<model_name> | | {"name": \<model_name\>,"ready": $bo\ol} |
| Predict | POST | /v1/models/\<model_name\>:predict | {"instances": []} ** | {"predictions": []} |
| Explain | POST | /v1/models/\<model_name\>:explain | {"instances": []} **| {"predictions": [], "explanations": []} | |

** = payload is optional

Expand All @@ -21,6 +21,6 @@ TODO: make sure list models/model ready is correct.
| Predict | The "predict" API performs inference on a model. The response is the prediction result. All InferenceServices speak the [Tensorflow V1 HTTP API](https://www.tensorflow.org/tfx/serving/api_rest#predict_api). |
| Explain | The "explain" API is an optional component that provides model explanations in addition to predictions. The standardized explainer interface is identical to the Tensorflow V1 HTTP API with the addition of an ":explain" verb.|
| Model Ready | The “model ready” health API indicates if a specific model is ready for inferencing. If the model(s) is downloaded and ready to serve requests, the model ready endpoint returns the list of accessible <model_name>(s). |
| List Models | The "models" API exposes a list of models in the model registry. |

## Examples
TODO
<!-- TODO: ## Examples -->
115 changes: 53 additions & 62 deletions docs/modelserving/data_plane/v2_protocol.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
## Open Inference Protocol / V2 Protocol

**For an inference server to be compliant with this protocol the server must implement the health, metadata, and inference V2 APIs**. Optional features that are explicitly noted are not required. A compliant inference server may choose to implement the HTTP/REST API and/or the GRPC API.
**For an inference server to be compliant with this protocol the server must implement the health, metadata, and inference V2 APIs**. Optional features that are explicitly noted are not required. A compliant inference server may choose to implement the [HTTP/REST API](#httprest) and/or the [GRPC API](#grpc).

The V2 protocol supports an extension mechanism as a required part of the API, but this document does not propose any specific extensions. Any specific extensions will be proposed separately.

Note: For the below API descriptions, all strings in all contexts are case-sensitive.
Note: For all API descriptions on this page, all strings in all contexts are case-sensitive.

### Note on changes between V1 & V2

Expand All @@ -18,15 +18,16 @@ language independent. In all JSON schemas shown in this document
$number, $string, $boolean, $object and $array refer to the
fundamental JSON types. #optional indicates an optional JSON field.

See also: The HTTP/REST endpoints are defined in [rest_predict_v2.yaml](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/rest_predict_v2.yaml)

| API | Verb | Path | Request Payload | Response Payload |
| ------------- | ------------- | ------------- | ------------- | ------------- |
| Inference | POST | v2/models/<model_name>[/versions/<model_version>]/infer | [$inference_request](#inference-request-json-object) | [$inference_response](#inference-response-json-object) |
<!-- TODO: uncomment when implemented | Model Ready| GET | | | -->
| Model Metadata | GET | v2/models/<model_name>[/versions/<model_version>] | | [$metadata_model_response](#model-metadata-response-json-object) |
| Server Ready | GET | v2/health/ready | [$ready_server_response](#server-ready-response-json-object) |
| Server Live | GET | v2/health/live | [$live_server_response](#server-live-response-json-objet)|
| Inference | POST | v2/models/<model_name>[/versions/\<model_version\>]/infer | [$inference_request](#inference-request-json-object) | [$inference_response](#inference-response-json-object) |
| Model Metadata | GET | v2/models/\<model_name\>[/versions/\<model_version\>] | | [$metadata_model_response](#model-metadata-response-json-object) |
| Server Ready | GET | v2/health/ready | | [$ready_server_response](#server-ready-response-json-object) |
| Server Live | GET | v2/health/live | | [$live_server_response](#server-live-response-json-objet)|
| Server Metadata | GET | v2 | | [$metadata_server_response](#server-metadata-response-json-object) |
<!-- TODO: uncomment when implemented | Model Ready| GET | | | -->

** path contents in `[]` are optional

Expand All @@ -37,17 +38,16 @@ For example, if a model does not implement a version, the Model Metadata request

<!-- // TODO: add example with -d inputs. -->

<details>
<summary> API Definitions </summary>
### **API Definitions**

| API | Definition |
| --- | --- |
| Inference | The `/infer` endpoint performs inference on a model. The response is the prediction result.|
<!-- TODO: uncomment when implemented | Model Ready | The “model ready” health API indicates if a specific model is ready for inferencing. The model name and (optionally) version must be available in the URL. | -->
| Model Metadata | The "model metadata" API is a per-model endpoint that returns details about the model passed in the path. |
| Server Ready | The “server ready” health API indicates if all the models are ready for inferencing. The “server ready” health API can be used directly to implement the Kubernetes readinessProbe |
| Server Live | The “server live” health API indicates if the inference server is able to receive and respond to metadata and inference requests. The “server live” API can be used directly to implement the Kubernetes livenessProbe. |
| Server Metadata | The "server metadata" API returns details describing the server. |
<!-- TODO: uncomment when implemented | Model Ready | The “model ready” health API indicates if a specific model is ready for inferencing. The model name and (optionally) version must be available in the URL. | -->

### Health/Readiness/Liveness Probes

Expand All @@ -56,11 +56,7 @@ The Model Readiness probe the question "Did the model download and is it able to
To read more about liveness and readiness probe concepts, visit the [Configure Liveness, Readiness and Startup Probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
Kubernetes documentation.

</details>

<details>
<summary> Payload Contents </summary>

### **Payload Contents**

<!-- TODO: uncomment when implemented ### **Model Ready**
Expand All @@ -73,7 +69,7 @@ The model ready endpoint returns the readiness probe response for the server alo
} -->


### **Server Ready**
### Server Ready

The server ready endpoint returns the readiness probe response for the server.

Expand All @@ -86,7 +82,7 @@ The server ready endpoint returns the readiness probe response for the server.

---

### **Server Live**
### Server Live

The server live endpoint returns the liveness probe response for the server.

Expand All @@ -99,7 +95,7 @@ The server live endpoint returns the liveness probe response for the server.

---

### **Server Metadata**
### Server Metadata

The server metadata endpoint provides information about the server. A
server metadata request is made with an HTTP GET to a server metadata
Expand Down Expand Up @@ -152,7 +148,7 @@ based on its own policies or return an error.

---

### **Model Metadata**
### Model Metadata

The per-model metadata endpoint provides information about a model. A
model metadata request is made with an HTTP GET to a model metadata
Expand Down Expand Up @@ -221,7 +217,7 @@ status (typically 400). The HTTP body must contain the

---

### **Inference**
### Inference

An inference request is made with an HTTP POST to an inference
endpoint. In the request the HTTP body contains the
Expand Down Expand Up @@ -332,11 +328,10 @@ code. The inference response object, identified as

---

</details>

<details>
<summary> Inference Request Examples </summary>
## Inference Request Examples
### **Inference Request Examples**

### Inference Request Examples

The following example shows an inference request to a model with two
inputs and one output. The HTTP Content-Length header gives the size
Expand Down Expand Up @@ -388,22 +383,18 @@ type FP32 the following response would be returned.
]
}

</details>


## gRPC

The GRPC API closely follows the concepts defined in the
[HTTP/REST](#httprest) API. A compliant server must implement the
health, metadata, and inference APIs described in this section.

All strings in all contexts are case-sensitive.


| API | rpc Endpoint | Request Message | Response Message |
| --- | --- | --- | ---|
| Inference | [ModelInfer](#inference) | ModelInferRequest | ModelInferResponse |
| Model Ready | [ModelReady](#model-ready) | ModelReadyRequest | ModelReadyResponse |
| Model Ready | [ModelReady](#model-ready) | [ModelReadyRequest] | ModelReadyResponse |
| Model Metadata | [ModelMetadata](#model-metadata)| ModelMetadataRequest | ModelMetadataResponse |
| Server Ready | [ServerReady](#server-ready) | ServerReadyRequest | ServerReadyResponse |
| Server Live | [ServerLive](#server-live) | ServerLiveRequest | ServerLiveResponse |
Expand All @@ -413,8 +404,7 @@ For more detailed information on each endpoint and its contents, see `API Defini
See also: The gRPC endpoints, request/response messages and contents are defined in [grpc_predict_v2.proto](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/grpc_predict_v2.proto)


<details>
<summary> API Definitions </summary>
### **API Definitions**


The GRPC definition of the service is:
Expand Down Expand Up @@ -443,12 +433,9 @@ The GRPC definition of the service is:
rpc ModelInfer(ModelInferRequest) returns (ModelInferResponse) {}
}

</details>
### **Message Contents**

<details>
<summary> Message Contents </summary>

### **Health**
### Health

A health request is made using the ServerLive, ServerReady, or
ModelReady endpoint. For each of these endpoints errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure.
Expand Down Expand Up @@ -501,7 +488,11 @@ inferencing. The request and response messages for ModelReady are:
bool ready = 1;
}

#### Server Metadata
---

### Metadata

#### Server Metadata

The ServerMetadata API provides information about the server. Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure. The request and response messages for ServerMetadata are:

Expand Down Expand Up @@ -568,7 +559,27 @@ request and response messages for ModelMetadata are:
repeated TensorMetadata outputs = 5;
}

### **Inference**
#### Platforms

A platform is a string indicating a DL/ML framework or
backend. Platform is returned as part of the response to a
[Model Metadata](#model_metadata) request but is information only. The
proposed inference APIs are generic relative to the DL/ML framework
used by a model and so a client does not need to know the platform of
a given model to use the API. Platform names use the format
“<project>_<format>”. The following platform names are allowed:

* tensorrt_plan : A TensorRT model encoded as a serialized engine or “plan”.
* tensorflow_graphdef : A TensorFlow model encoded as a GraphDef.
* tensorflow_savedmodel : A TensorFlow model encoded as a SavedModel.
* onnx_onnxv1 : A ONNX model encoded for ONNX Runtime.
* pytorch_torchscript : A PyTorch model encoded as TorchScript.
* mxnet_mxnet: An MXNet model
* caffe2_netdef : A Caffe2 model encoded as a NetDef.

---

### Inference

The ModelInfer API performs inference using the specified
model. Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate
Expand Down Expand Up @@ -700,7 +711,7 @@ failure. The request and response messages for ModelInfer are:
repeated bytes raw_output_contents = 6;
}

### **Parameters**
#### Parameters

The Parameters message describes a “name”/”value” pair, where the
“name” is the name of the parameter and the “value” is a boolean,
Expand Down Expand Up @@ -728,8 +739,9 @@ Currently no parameters are defined. As required a future proposal may define on
}
}

---

### **Tensor Data**
### Tensor Data

In all representations tensor data must be flattened to a
one-dimensional, row-major order of the tensor elements. Element
Expand Down Expand Up @@ -818,25 +830,4 @@ of each type, in bytes.
| FP64 | 8 |
| BYTES | Variable (max 2<sup>32</sup>) |



### **Platforms**

A platform is a string indicating a DL/ML framework or
backend. Platform is returned as part of the response to a
[Model Metadata](#model_metadata) request but is information only. The
proposed inference APIs are generic relative to the DL/ML framework
used by a model and so a client does not need to know the platform of
a given model to use the API. Platform names use the format
“<project>_<format>”. The following platform names are allowed:

* tensorrt_plan : A TensorRT model encoded as a serialized engine or “plan”.
* tensorflow_graphdef : A TensorFlow model encoded as a GraphDef.
* tensorflow_savedmodel : A TensorFlow model encoded as a SavedModel.
* onnx_onnxv1 : A ONNX model encoded for ONNX Runtime.
* pytorch_torchscript : A PyTorch model encoded as TorchScript.
* mxnet_mxnet: An MXNet model
* caffe2_netdef : A Caffe2 model encoded as a NetDef.


</details>
---
2 changes: 1 addition & 1 deletion docs/modelserving/v1beta1/triton/torchscript/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,7 @@ class ImageTransformerV2(kserve.Model):
return {output["name"]: np.array(output["data"]).reshape(output["shape"]).tolist()
for output in results["outputs"]}
```
Please find [the code example](https://github.com/kserve/kserve/tree/release-0.9/docs/samples/v1beta1/triton/torchscript/image_transformer_v2) and [Dockerfile](https://github.com/kserve/kserve/blob/release-0.9/docs/samples/v1beta1/triton/torchscript/transformer.Dockerfile).
Please find [the code example](https://github.com/kserve/kserve/tree/release-0.10/docs/samples/v1beta1/triton/torchscript/image_transformer_v2) and [Dockerfile](https://github.com/kserve/kserve/blob/release-0.10/docs/samples/v1beta1/triton/torchscript/transformer.Dockerfile).
### Build Transformer docker image
```
Expand Down

0 comments on commit cb5484d

Please sign in to comment.