updating 0.9 to 0.10

Signed-off-by: agriffith50 <[email protected]>
kserve · Dec 21, 2022 · cb5484d · cb5484d
1 parent 9979ea4
commit cb5484d
Show file tree

Hide file tree

Showing 6 changed files with 72 additions and 82 deletions.
diff --git a/docs/admin/serverless/kourier_networking/README.md b/docs/admin/serverless/kourier_networking/README.md
@@ -1,5 +1,5 @@
 # Deploy InferenceService with Alternative Networking Layer
-KServe v0.9 and prior versions create the top level `Istio Virtual Service` for routing to `InferenceService` components based on the virtual host or path based routing.
+KServe creates the top level `Istio Virtual Service` for routing to `InferenceService` components based on the virtual host or path based routing.
 Now KServe provides an option for disabling the top level virtual service to allow configuring other networking layers Knative supports.
 For example, [Kourier](https://developers.redhat.com/blog/2020/06/30/kourier-a-lightweight-knative-serving-ingress) is an alternative networking layer and
 the following steps show how you can deploy KServe with `Kourier`.

diff --git a/docs/get_started/README.md b/docs/get_started/README.md
@@ -19,6 +19,6 @@ The [Kubernetes CLI (`kubectl`)](https://kubernetes.io/docs/tasks/tools/install-
 You can get started with a local deployment of KServe by using _KServe Quick installation script on Kind_:
 
 ```bash
-curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.9/hack/quick_install.sh" | bash
+curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.10/hack/quick_install.sh" | bash
 ```
 
diff --git a/docs/modelserving/data_plane/data_plane.md b/docs/modelserving/data_plane/data_plane.md
@@ -8,8 +8,8 @@ By implementing this protocol both inference clients and servers will increase t
 portability by operating seamlessly on platforms that have standardized around this API. Kserve's inference protocol is endorsed by NVIDIA
 Triton Inference Server, TensorFlow Serving, and ONNX Runtime Server.
 
-![Data Plane](../images/dataplane.jpg)
-Note: Protocol V2 uses /infer instead of :predict
+![Data Plane](../../images/dataplane.jpg)
+<br> Note: Protocol V2 uses /infer instead of :predict
 
 ### Concepts
 **Component**: Each endpoint is composed of multiple components: "predictor", "explainer", and "transformer". The only required component is the predictor, which is the core of the system. As KServe evolves, we plan to increase the number of supported components to enable use cases like Outlier Detection.
@@ -25,7 +25,7 @@ Note: Protocol V2 uses /infer instead of :predict
 
 Kserve supports two versions of its data plane, V1 and V2. V1 protocol offers a standard prediction workflow with HTTP/REST. The second version of the data-plane protocol addresses several issues found with the V1 data-plane protocol, including performance and generality across a large number of model frameworks and servers. Protocol V2 expands the capabilities of V1 by adding gRPC APIs. 
 
-### Main changes between V1 & V2 dataplane
+### Main changes
 
 * V2 does not currently support the explain endpoint 
 * V2 added Server Readiness/Liveness/Metadata endpoints
@@ -39,23 +39,22 @@ Kserve supports two versions of its data plane, V1 and V2. V1 protocol offers a
 | API  | Verb | Path | 
 | ------------- | ------------- | ------------- | 
 | List Models | GET | /v1/models |
-| Model Ready | GET   | /v1/models/<model_name>          | 
-| Predict  | POST  | /v1/models/<model_name>:predict  | 
-| Explain  | POST  | /v1/models/<model_name>:explain  | 
+| Model Ready | GET   | /v1/models/\<model_name\>          | 
+| Predict  | POST  | /v1/models/\<model_name\>:predict  | 
+| Explain  | POST  | /v1/models/\<model_name\>:explain  | 
 
 ### V2 APIs
 
 | API  | Verb | Path |
 | ------------- | ------------- | ------------- | 
-| Inference | POST | v2/models/<model_name>[/versions/<model_version>]/infer |
-<!-- TODO: uncomment when implemented | Model Readiness| GET   | v2/models/<model_name>[/versions/<model_version>]/ready | -->
-| Model Metadata | GET | v2/models/<model_name>[/versions/<model_version>] | 
+| Inference | POST | v2/models/\<model_name\>[/versions/\<model_version\>]/infer |
+| Model Metadata | GET | v2/models/\<model_name\>[/versions/\<model_version\>] | 
 | Server Readiness | GET | v2/health/ready |
 | Server Liveness | GET | v2/health/live |
 | Server Metadata | GET | v2 | 
-| Inference | POST | v2/models/<model_name>[/versions/<model_version>]/infer |
+<!-- TODO: uncomment when implemented | Model Readiness| GET   | v2/models/\<model_name\>[/versions/<model_version>]/ready | -->
 
 ** path contents in `[]` are optional
 
-Please see [V1 Protocol](/docs/modelserving/data_plane/v1_protocol.md) and [V2 Protocol](/docs/modelserving/data_plane/v2_protocol.md) documentation for more information. 
+Please see [V1 Protocol](./v1_protocol.md) and [V2 Protocol](./v2_protocol.md) documentation for more information. 
 
diff --git a/docs/modelserving/data_plane/v1_protocol.md b/docs/modelserving/data_plane/v1_protocol.md
@@ -3,10 +3,10 @@ KServe's V1 protocol offers a standardized prediction workflow across all model
 
 | API  | Verb | Path | Request Payload | Response Payload |
 | ------------- | ------------- | ------------- | ------------- | ------------- |
-| List Models | GET | /v1/models | | {"models": [<model_name>]} | 
-| Model Ready| GET   | /v1/models/<model_name>     |     | {"name":<model_name>,"ready": $bool}  |
-| Predict  | POST  | /v1/models/<model_name>:predict  | {"instances": []} ** | {"predictions": []} |
-| Explain  | POST  | /v1/models/<model_name>:explain  | {"instances": []} **| {"predictions": [], "explanations": []}   | |
+| List Models | GET | /v1/models | | {"models": \[\<model_name\>\]} | 
+| Model Ready| GET   | /v1/models/\<model_name>     |     | {"name": \<model_name\>,"ready": $bo\ol}  |
+| Predict  | POST  | /v1/models/\<model_name\>:predict  | {"instances": []} ** | {"predictions": []} |
+| Explain  | POST  | /v1/models/\<model_name\>:explain  | {"instances": []} **| {"predictions": [], "explanations": []}   | |
 
 ** = payload is optional
 
@@ -21,6 +21,6 @@ TODO: make sure list models/model ready is correct.
 | Predict | The "predict" API performs inference on a model. The response is the prediction result. All InferenceServices speak the [Tensorflow V1 HTTP API](https://www.tensorflow.org/tfx/serving/api_rest#predict_api). | 
 | Explain | The "explain" API is an optional component that provides model explanations in addition to predictions. The standardized explainer interface is identical to the Tensorflow V1 HTTP API with the addition of an ":explain" verb.| 
 | Model Ready | The “model ready” health API indicates if a specific model is ready for inferencing. If the model(s) is downloaded and ready to serve requests, the model ready endpoint returns the list of accessible <model_name>(s). | 
+| List Models | The "models" API exposes a list of models in the model registry. |
 
-## Examples
-TODO
+<!-- TODO: ## Examples -->
diff --git a/docs/modelserving/data_plane/v2_protocol.md b/docs/modelserving/data_plane/v2_protocol.md
@@ -1,10 +1,10 @@
 ## Open Inference Protocol / V2 Protocol
 
-**For an inference server to be compliant with this protocol the server must implement the health, metadata, and inference V2 APIs**. Optional features that are explicitly noted are not required. A compliant inference server may choose to implement the HTTP/REST API and/or the GRPC API.
+**For an inference server to be compliant with this protocol the server must implement the health, metadata, and inference V2 APIs**. Optional features that are explicitly noted are not required. A compliant inference server may choose to implement the [HTTP/REST API](#httprest) and/or the [GRPC API](#grpc).
 
 The V2 protocol supports an extension mechanism as a required part of the API, but this document does not propose any specific extensions. Any specific extensions will be proposed separately.
 
-Note: For the below API descriptions, all strings in all contexts are case-sensitive.
+Note: For all API descriptions on this page, all strings in all contexts are case-sensitive.
 
 ### Note on changes between V1 & V2 
 
@@ -18,15 +18,16 @@ language independent. In all JSON schemas shown in this document
 $number, $string, $boolean, $object and $array refer to the
 fundamental JSON types. #optional indicates an optional JSON field.
 
+See also: The HTTP/REST endpoints are defined in [rest_predict_v2.yaml](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/rest_predict_v2.yaml)
 
 | API  | Verb | Path | Request Payload | Response Payload | 
 | ------------- | ------------- | ------------- | ------------- | ------------- |
-| Inference | POST | v2/models/<model_name>[/versions/<model_version>]/infer | [$inference_request](#inference-request-json-object) | [$inference_response](#inference-response-json-object) |
-<!-- TODO: uncomment when implemented | Model Ready| GET   | |  | -->
-| Model Metadata | GET | v2/models/<model_name>[/versions/<model_version>] | | [$metadata_model_response](#model-metadata-response-json-object) | 
-| Server Ready | GET | v2/health/ready | [$ready_server_response](#server-ready-response-json-object) | 
-| Server Live | GET | v2/health/live | [$live_server_response](#server-live-response-json-objet)| 
+| Inference | POST | v2/models/<model_name>[/versions/\<model_version\>]/infer | [$inference_request](#inference-request-json-object) | [$inference_response](#inference-response-json-object) |
+| Model Metadata | GET | v2/models/\<model_name\>[/versions/\<model_version\>] | | [$metadata_model_response](#model-metadata-response-json-object) | 
+| Server Ready | GET | v2/health/ready | | [$ready_server_response](#server-ready-response-json-object) | 
+| Server Live | GET | v2/health/live | | [$live_server_response](#server-live-response-json-objet)| 
 | Server Metadata | GET | v2 | | [$metadata_server_response](#server-metadata-response-json-object) |
+<!-- TODO: uncomment when implemented | Model Ready| GET   | |  | -->
 
 ** path contents in `[]` are optional
 
@@ -37,17 +38,16 @@ For example, if a model does not implement a version, the Model Metadata request
 
 <!-- // TODO: add example with -d inputs. -->
 
-<details> 
-    <summary> API Definitions </summary>
+### **API Definitions**
 
 | API  | Definition | 
 | --- | --- |
 | Inference | The `/infer` endpoint performs inference on a model. The response is the prediction result.| 
-<!-- TODO: uncomment when implemented | Model Ready | The “model ready” health API indicates if a specific model is ready for inferencing. The model name and (optionally) version must be available in the URL. |  -->
 | Model Metadata | The "model metadata" API is a per-model endpoint that returns details about the model passed in the path. | 
 | Server Ready | The “server ready” health API indicates if all the models are ready for inferencing. The “server ready” health API can be used directly to implement the Kubernetes readinessProbe |
 | Server Live | The “server live” health API indicates if the inference server is able to receive and respond to metadata and inference requests. The “server live” API can be used directly to implement the Kubernetes livenessProbe. |
 | Server Metadata | The "server metadata" API returns details describing the server. | 
+<!-- TODO: uncomment when implemented | Model Ready | The “model ready” health API indicates if a specific model is ready for inferencing. The model name and (optionally) version must be available in the URL. |  -->
 
 ### Health/Readiness/Liveness Probes
 
@@ -56,11 +56,7 @@ The Model Readiness probe the question "Did the model download and is it able to
 To read more about liveness and readiness probe concepts, visit the [Configure Liveness, Readiness and Startup Probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
 Kubernetes documentation.
 
-</details>
-
-<details> 
-    <summary> Payload Contents </summary>
-
+### **Payload Contents**
 
 <!-- TODO: uncomment when implemented ### **Model Ready**
 
@@ -73,7 +69,7 @@ The model ready endpoint returns the readiness probe response for the server alo
     } -->
 
 
-### **Server Ready**
+### Server Ready
 
 The server ready endpoint returns the readiness probe response for the server.
 
@@ -86,7 +82,7 @@ The server ready endpoint returns the readiness probe response for the server.
 
 ---
 
-### **Server Live**
+### Server Live
 
 The server live endpoint returns the liveness probe response for the server.
 
@@ -99,7 +95,7 @@ The server live endpoint returns the liveness probe response for the server.
 
 ---
 
-### **Server Metadata**
+### Server Metadata
 
 The server metadata endpoint provides information about the server. A
 server metadata request is made with an HTTP GET to a server metadata
@@ -152,7 +148,7 @@ based on its own policies or return an error.
 
 ---
 
-### **Model Metadata**
+### Model Metadata
 
 The per-model metadata endpoint provides information about a model. A
 model metadata request is made with an HTTP GET to a model metadata
@@ -221,7 +217,7 @@ status (typically 400). The HTTP body must contain the
 
 ---
 
-### **Inference**
+### Inference
 
 An inference request is made with an HTTP POST to an inference
 endpoint. In the request the HTTP body contains the
@@ -332,11 +328,10 @@ code. The inference response object, identified as
 
 ---
 
-</details>
 
-<details>
-    <summary> Inference Request Examples </summary>
-    ## Inference Request Examples
+### **Inference Request Examples**
+
+### Inference Request Examples
 
 The following example shows an inference request to a model with two
 inputs and one output. The HTTP Content-Length header gives the size
@@ -388,22 +383,18 @@ type FP32 the following response would be returned.
       ]
     }
 
-</details>
-
 
 ## gRPC
 
 The GRPC API closely follows the concepts defined in the
 [HTTP/REST](#httprest) API. A compliant server must implement the
 health, metadata, and inference APIs described in this section.
 
-All strings in all contexts are case-sensitive.
-
 
 | API  | rpc Endpoint | Request Message | Response Message | 
 | --- | --- | --- | ---| 
 | Inference | [ModelInfer](#inference) | ModelInferRequest | ModelInferResponse | 
-| Model Ready | [ModelReady](#model-ready) | ModelReadyRequest | ModelReadyResponse |
+| Model Ready | [ModelReady](#model-ready) | [ModelReadyRequest] | ModelReadyResponse |
 | Model Metadata | [ModelMetadata](#model-metadata)| ModelMetadataRequest | ModelMetadataResponse | 
 | Server Ready | [ServerReady](#server-ready) | ServerReadyRequest | ServerReadyResponse |
 | Server Live | [ServerLive](#server-live) | ServerLiveRequest | ServerLiveResponse | 
@@ -413,8 +404,7 @@ For more detailed information on each endpoint and its contents, see `API Defini
 See also: The gRPC endpoints, request/response messages and contents are defined in [grpc_predict_v2.proto](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/grpc_predict_v2.proto)
 
 
-<details>
-  <summary> API Definitions </summary>
+### **API Definitions** 
 
 
 The GRPC definition of the service is:
@@ -443,12 +433,9 @@ The GRPC definition of the service is:
       rpc ModelInfer(ModelInferRequest) returns (ModelInferResponse) {}
     }
 
-</details>
+### **Message Contents**
 
-<details> 
-    <summary> Message Contents </summary>
-
-### **Health**
+### Health
 
 A health request is made using the ServerLive, ServerReady, or
 ModelReady endpoint. For each of these endpoints errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure.
@@ -501,7 +488,11 @@ inferencing. The request and response messages for ModelReady are:
       bool ready = 1;
     }
 
-#### Server Metadata
+---
+
+### Metadata
+
+#### Server Metadata 
 
 The ServerMetadata API provides information about the server. Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure. The request and response messages for ServerMetadata are:
 
@@ -568,7 +559,27 @@ request and response messages for ModelMetadata are:
       repeated TensorMetadata outputs = 5;
     }
 
-### **Inference**
+#### Platforms
+
+A platform is a string indicating a DL/ML framework or
+backend. Platform is returned as part of the response to a
+[Model Metadata](#model_metadata) request but is information only. The
+proposed inference APIs are generic relative to the DL/ML framework
+used by a model and so a client does not need to know the platform of
+a given model to use the API. Platform names use the format
+“<project>_<format>”. The following platform names are allowed:
+
+* tensorrt_plan : A TensorRT model encoded as a serialized engine or “plan”.
+* tensorflow_graphdef : A TensorFlow model encoded as a GraphDef.
+* tensorflow_savedmodel : A TensorFlow model encoded as a SavedModel.
+* onnx_onnxv1 : A ONNX model encoded for ONNX Runtime.
+* pytorch_torchscript : A PyTorch model encoded as TorchScript.
+* mxnet_mxnet: An MXNet model
+* caffe2_netdef : A Caffe2 model encoded as a NetDef.
+
+---
+
+### Inference
 
 The ModelInfer API performs inference using the specified
 model. Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate
@@ -700,7 +711,7 @@ failure. The request and response messages for ModelInfer are:
       repeated bytes raw_output_contents = 6;
     }
 
-### **Parameters**
+#### Parameters
 
 The Parameters message describes a “name”/”value” pair, where the
 “name” is the name of the parameter and the “value” is a boolean,
@@ -728,8 +739,9 @@ Currently no parameters are defined. As required a future proposal may define on
       }
     }
 
+---
 
-### **Tensor Data**
+### Tensor Data
 
 In all representations tensor data must be flattened to a
 one-dimensional, row-major order of the tensor elements. Element
@@ -818,25 +830,4 @@ of each type, in bytes.
 | FP64      | 8            |
 | BYTES     | Variable (max 2<sup>32</sup>) |
 
-
-
-### **Platforms**
-
-A platform is a string indicating a DL/ML framework or
-backend. Platform is returned as part of the response to a
-[Model Metadata](#model_metadata) request but is information only. The
-proposed inference APIs are generic relative to the DL/ML framework
-used by a model and so a client does not need to know the platform of
-a given model to use the API. Platform names use the format
-“<project>_<format>”. The following platform names are allowed:
-
-* tensorrt_plan : A TensorRT model encoded as a serialized engine or “plan”.
-* tensorflow_graphdef : A TensorFlow model encoded as a GraphDef.
-* tensorflow_savedmodel : A TensorFlow model encoded as a SavedModel.
-* onnx_onnxv1 : A ONNX model encoded for ONNX Runtime.
-* pytorch_torchscript : A PyTorch model encoded as TorchScript.
-* mxnet_mxnet: An MXNet model
-* caffe2_netdef : A Caffe2 model encoded as a NetDef.
-
-
-</details>
+---
diff --git a/docs/modelserving/v1beta1/triton/torchscript/README.md b/docs/modelserving/v1beta1/triton/torchscript/README.md
@@ -376,7 +376,7 @@ class ImageTransformerV2(kserve.Model):
         return {output["name"]: np.array(output["data"]).reshape(output["shape"]).tolist()
                 for output in results["outputs"]}
 ```
-Please find [the code example](https://github.com/kserve/kserve/tree/release-0.9/docs/samples/v1beta1/triton/torchscript/image_transformer_v2) and [Dockerfile](https://github.com/kserve/kserve/blob/release-0.9/docs/samples/v1beta1/triton/torchscript/transformer.Dockerfile).
+Please find [the code example](https://github.com/kserve/kserve/tree/release-0.10/docs/samples/v1beta1/triton/torchscript/image_transformer_v2) and [Dockerfile](https://github.com/kserve/kserve/blob/release-0.10/docs/samples/v1beta1/triton/torchscript/transformer.Dockerfile).
 
 ### Build Transformer docker image
 ```