Skip to content

Commit

Permalink
update docs for v0.10.0 (#205)
Browse files Browse the repository at this point in the history
* clarify runtimes.txt usage in custom model

#180
Signed-off-by: agriffith50 <[email protected]>

* updating docs to reflect v0.10
Signed-off-by: agriffith50 <[email protected]>

Signed-off-by: agriffith50 <[email protected]>

* update wording ab model version

Signed-off-by: agriffith50 <[email protected]>

* add model ready

Signed-off-by: agriffith50 <[email protected]>

* update serving runtime table

Signed-off-by: agriffith50 <[email protected]>

* add mlfow to runtime table

Signed-off-by: agriffith50 <[email protected]>

* update dataplane md

Signed-off-by: agriffith50 <[email protected]>

* update v1

Signed-off-by: agriffith50 <[email protected]>

* fix links

Signed-off-by: agriffith50 <[email protected]>

* resolving pr comments

Signed-off-by: agriffith50 <[email protected]>

Signed-off-by: agriffith50 <[email protected]>
  • Loading branch information
alexagriffith committed Jan 21, 2023
1 parent 1fe7698 commit 34ddb0a
Show file tree
Hide file tree
Showing 11 changed files with 334 additions and 243 deletions.
2 changes: 1 addition & 1 deletion docs/admin/serverless/kourier_networking/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Deploy InferenceService with Alternative Networking Layer
KServe v0.9 and prior versions create the top level `Istio Virtual Service` for routing to `InferenceService` components based on the virtual host or path based routing.
KServe creates the top level `Istio Virtual Service` for routing to `InferenceService` components based on the virtual host or path based routing.
Now KServe provides an option for disabling the top level virtual service to allow configuring other networking layers Knative supports.
For example, [Kourier](https://developers.redhat.com/blog/2020/06/30/kourier-a-lightweight-knative-serving-ingress) is an alternative networking layer and
the following steps show how you can deploy KServe with `Kourier`.
Expand Down
2 changes: 1 addition & 1 deletion docs/get_started/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,6 @@ The [Kubernetes CLI (`kubectl`)](https://kubernetes.io/docs/tasks/tools/install-
You can get started with a local deployment of KServe by using _KServe Quick installation script on Kind_:

```bash
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.9/hack/quick_install.sh" | bash
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.10/hack/quick_install.sh" | bash
```

36 changes: 0 additions & 36 deletions docs/modelserving/data_plane.md

This file was deleted.

60 changes: 60 additions & 0 deletions docs/modelserving/data_plane/data_plane.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Data Plane
The InferenceService Data Plane architecture consists of a static graph of components which coordinate requests for a single model. Advanced features such as Ensembling, A/B testing, and Multi-Arm-Bandits should compose InferenceServices together.

## Introduction
KServe's data plane protocol introduces an inference API that is independent of any specific ML/DL framework and model server. This allows for quick iterations and consistency across Inference Services and supports both easy-to-use and high-performance use cases.

By implementing this protocol both inference clients and servers will increase their utility and
portability by operating seamlessly on platforms that have standardized around this API. Kserve's inference protocol is endorsed by NVIDIA
Triton Inference Server, TensorFlow Serving, and TorchServe.

![Data Plane](../../images/dataplane.jpg)
<br> Note: Protocol V2 uses /infer instead of :predict

### Concepts
**Component**: Each endpoint is composed of multiple components: "predictor", "explainer", and "transformer". The only required component is the predictor, which is the core of the system. As KServe evolves, we plan to increase the number of supported components to enable use cases like Outlier Detection.

**Predictor**: The predictor is the workhorse of the InferenceService. It is simply a model and a model server that makes it available at a network endpoint.

**Explainer**: The explainer enables an optional alternate data plane that provides model explanations in addition to predictions. Users may define their own explanation container, which configures with relevant environment variables like prediction endpoint. For common use cases, KServe provides out-of-the-box explainers like Alibi.

**Transformer**: The transformer enables users to define a pre and post processing step before the prediction and explanation workflows. Like the explainer, it is configured with relevant environment variables too. For common use cases, KServe provides out-of-the-box transformers like Feast.


## Data Plane V1 & V2

KServe supports two versions of its data plane, V1 and V2. V1 protocol offers a standard prediction workflow with HTTP/REST. The second version of the data-plane protocol addresses several issues found with the V1 data-plane protocol, including performance and generality across a large number of model frameworks and servers. Protocol V2 expands the capabilities of V1 by adding gRPC APIs.

### Main changes

* V2 does not currently support the explain endpoint
* V2 added Server Readiness/Liveness/Metadata endpoints
* V2 endpoint paths contain `/` instead of `:`
* V2 renamed `:predict` endpoint to `/infer`
* V2 allows for model versions in the request path (optional)


### V1 APIs

| API | Verb | Path |
| ------------- | ------------- | ------------- |
| List Models | GET | /v1/models |
| Model Ready | GET | /v1/models/\<model_name\> |
| Predict | POST | /v1/models/\<model_name\>:predict |
| Explain | POST | /v1/models/\<model_name\>:explain |

### V2 APIs

| API | Verb | Path |
| ------------- | ------------- | ------------- |
| Inference | POST | v2/models/\<model_name\>[/versions/\<model_version\>]/infer |
| Model Metadata | GET | v2/models/\<model_name\>[/versions/\<model_version\>] |
| Server Readiness | GET | v2/health/ready |
| Server Liveness | GET | v2/health/live |
| Server Metadata | GET | v2 |
| Model Readiness| GET | v2/models/\<model_name\>[/versions/<model_version>]/ready |

** path contents in `[]` are optional

Please see [V1 Protocol](./v1_protocol.md) and [V2 Protocol](./v2_protocol.md) documentation for more information.

25 changes: 25 additions & 0 deletions docs/modelserving/data_plane/v1_protocol.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Data Plane (V1)
KServe's V1 protocol offers a standardized prediction workflow across all model frameworks. This protocol version is still supported, but it is recommended that users migrate to the [V2 protocol](./v2_protocol.md) for better performance and standardization among serving runtimes. However, if a use case requires a more flexibile schema than protocol v2 provides, v1 protocol is still an option.

| API | Verb | Path | Request Payload | Response Payload |
| ------------- | ------------- | ------------- | ------------- | ------------- |
| List Models | GET | /v1/models | | {"models": \[\<model_name\>\]} |
| Model Ready| GET | /v1/models/\<model_name> | | {"name": \<model_name\>,"ready": $bool} |
| Predict | POST | /v1/models/\<model_name\>:predict | {"instances": []} ** | {"predictions": []} |
| Explain | POST | /v1/models/\<model_name\>:explain | {"instances": []} **| {"predictions": [], "explanations": []} | |

** = payload is optional

Note: The response payload in V1 protocol is not strictly enforced. A custom server define and return its own response payload. We encourage using the KServe defined response payload for consistency.


## API Definitions

| API | Definition |
| --- | --- |
| Predict | The "predict" API performs inference on a model. The response is the prediction result. All InferenceServices speak the [Tensorflow V1 HTTP API](https://www.tensorflow.org/tfx/serving/api_rest#predict_api). |
| Explain | The "explain" API is an optional component that provides model explanations in addition to predictions. The standardized explainer interface is identical to the Tensorflow V1 HTTP API with the addition of an ":explain" verb.|
| Model Ready | The “model ready” health API indicates if a specific model is ready for inferencing. If the model(s) is downloaded and ready to serve requests, the model ready endpoint returns the list of accessible <model_name>(s). |
| List Models | The "models" API exposes a list of models in the model registry. |

<!-- TODO: ## Examples -->
Loading

0 comments on commit 34ddb0a

Please sign in to comment.