From 47b73996176ea5b60fadabce449eb481b4290cdf Mon Sep 17 00:00:00 2001 From: agriffith50 Date: Tue, 10 Jan 2023 16:23:17 -0500 Subject: [PATCH] update serving runtime table --- docs/modelserving/data_plane/v2_protocol.md | 7 ++-- docs/modelserving/v1beta1/serving_runtime.md | 34 +++++++++++++------- 2 files changed, 27 insertions(+), 14 deletions(-) diff --git a/docs/modelserving/data_plane/v2_protocol.md b/docs/modelserving/data_plane/v2_protocol.md index 29b0e7e08..190732f19 100644 --- a/docs/modelserving/data_plane/v2_protocol.md +++ b/docs/modelserving/data_plane/v2_protocol.md @@ -1,10 +1,11 @@ ## Open Inference Protocol (V2 Inference Protocol) -**For an inference server to be compliant with this protocol the server must implement the health, metadata, and inference V2 APIs**. Optional features that are explicitly noted are not required. A compliant inference server may choose to implement the [HTTP/REST API](#httprest) and/or the [GRPC API](#grpc). +**For an inference server to be compliant with this protocol the server must implement the health, metadata, and inference V2 APIs**. +Optional features that are explicitly noted are not required. A compliant inference server may choose to implement the [HTTP/REST API](#httprest) and/or the [GRPC API](#grpc). -The V2 protocol supports an extension mechanism as a required part of the API, but this document does not propose any specific extensions. Any specific extensions will be proposed separately. +Check the [model serving runtime table](../v1beta1/serving_runtime.md) / the `protocolVersion` field in the [runtime YAML](https://github.com/kserve/kserve/tree/master/config/runtimes) to ensure V2 protocol is supported for model serving runtime that you are using. -Note: For all API descriptions on this page, all strings in all contexts are case-sensitive. +Note: For all API descriptions on this page, all strings in all contexts are case-sensitive. The V2 protocol supports an extension mechanism as a required part of the API, but this document does not propose any specific extensions. Any specific extensions will be proposed separately. ### Note on changes between V1 & V2 diff --git a/docs/modelserving/v1beta1/serving_runtime.md b/docs/modelserving/v1beta1/serving_runtime.md index bce165a7c..a4e554bcf 100644 --- a/docs/modelserving/v1beta1/serving_runtime.md +++ b/docs/modelserving/v1beta1/serving_runtime.md @@ -21,18 +21,30 @@ After models are deployed with InferenceService, you get all the following serve - Out-of-the-box metrics - Ingress/Egress control -| Model Serving Runtime | Exported model| Prediction Protocol | HTTP | gRPC | Versions | Examples | + +--- + +The table below identifies each of the model serving runtimes supported by KServe. The HTTP and gRPC columns indicate the prediction protocol version that the serving runtime supports. The KServe prediction protocol is noted as either "v1" or "v2". Some serving runtimes also support their own prediction protocol, these are noted with an `*`. The default serving runtime version column defines the source and version of the serving runtime - MLServer, KServe or its own. These versions can also be found in the [runtime kustomization YAML](https://github.com/alexagriffith/kserve/blob/master/config/runtimes/kustomization.yaml). All KServe native model serving runtimes use the current KServe release version (v0.10). The supported framework version column lists the **major** version of the model that is supported. These can also be found in the respective [runtime YAML](https://github.com/alexagriffith/kserve/tree/master/config/runtimes) under the `supportedModelFormats` field. For model frameworks using the KServe serving runtime, the specific default version can be found in [kserve/python](https://github.com/alexagriffith/kserve/tree/master/python). In a given serving runtime directory the setup.py file contains the exact model framework version used. For example, in [kserve/python/lgbserver](https://github.com/alexagriffith/kserve/tree/master/python/lgbserver) the [setup.py](https://github.com/alexagriffith/kserve/blob/master/python/lgbserver/setup.py) file sets the model framework version to 3.3.2, `lightgbm == 3.3.2`. + +| Model Serving Runtime | Exported model | HTTP | gRPC | Default Serving Runtime Version | Supported Framework (Major) Version(s) | Examples | | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |--------------------------------------| -| [Triton Inference Server](https://github.com/triton-inference-server/server) | [TensorFlow,TorchScript,ONNX](https://github.com/triton-inference-server/server/blob/r21.09/docs/model_repository.md)| v2 | :heavy_check_mark: | :heavy_check_mark: | [Compatibility Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)| [Torchscript cifar](triton/torchscript) | -| [TFServing](https://www.tensorflow.org/tfx/guide/serving) | [TensorFlow SavedModel](https://www.tensorflow.org/guide/saved_model) | v1 | :heavy_check_mark: | :heavy_check_mark: | [TFServing Versions](https://github.com/tensorflow/serving/releases) | [TensorFlow flower](./tensorflow) | -| [TorchServe](https://pytorch.org/serve/server.html) | [Eager Model/TorchScript](https://pytorch.org/docs/master/generated/torch.save.html) | v1/v2 REST | :heavy_check_mark: | :heavy_check_mark: | 0.5.3 | [TorchServe mnist](./torchserve) | -| [SKLearn MLServer](https://github.com/SeldonIO/MLServer) | [Pickled Model](https://scikit-learn.org/stable/modules/model_persistence.html) | v2 | :heavy_check_mark: | :heavy_check_mark: | 1.0.1 | [SKLearn Iris V2](./sklearn/v2) | -| [XGBoost MLServer](https://github.com/SeldonIO/MLServer) | [Saved Model](https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html) | v2 | :heavy_check_mark: | :heavy_check_mark: | 1.5.0 | [XGBoost Iris V2](./xgboost) | -| [SKLearn ModelServer](https://github.com/kserve/kserve/tree/master/python/sklearnserver) | [Pickled Model](https://scikit-learn.org/stable/modules/model_persistence.html) | v1 | :heavy_check_mark: | -- | 1.0.1 | [SKLearn Iris](./sklearn/v2) | -| [XGBoost ModelServer](https://github.com/kserve/kserve/tree/master/python/xgbserver) | [Saved Model](https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html) | v1 | :heavy_check_mark: | -- | 1.5.0 | [XGBoost Iris](./xgboost) | -| [PMML ModelServer](https://github.com/kserve/kserve/tree/master/python/pmmlserver) | [PMML](http://dmg.org/pmml/v4-4-1/GeneralStructure.html) | v1 | :heavy_check_mark: | -- | [PMML4.4.1](https://github.com/autodeployai/pypmml) | [SKLearn PMML](./pmml) | -| [LightGBM ModelServer](https://github.com/kserve/kserve/tree/master/python/lightgbm) | [Saved LightGBM Model](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html#lightgbm.Booster.save_model) | v1 | :heavy_check_mark: | -- | 3.2.0 | [LightGBM Iris](./lightgbm) | -| [Custom ModelServer](https://github.com/kserve/kserve/tree/master/python/kserve/kserve) | -- | v1 | :heavy_check_mark: | -- | -- | [Custom Model](custom/custom_model) | +| [Custom ModelServer](https://github.com/kserve/kserve/tree/master/python/kserve/kserve) | -- | v1, v2 | v2 | -- | -- | [Custom Model](custom/custom_model) | +| [LightGBM MLServer](https://mlserver.readthedocs.io/en/latest/runtimes/lightgbm.html) | [Saved LightGBM Model](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html#lightgbm.Booster.save_model) | v2 | v2 | v1.0.0 (MLServer) | 3 | [LightGBM Iris V2](./lightgbm) | +| [LightGBM ModelServer](https://github.com/kserve/kserve/tree/master/python/lgbserver) | [Saved LightGBM Model](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html#lightgbm.Booster.save_model) | v1 | -- | v0.10.0 (KServe) | 3 | [LightGBM Iris](./lightgbm) | +| [PMML ModelServer](https://github.com/kserve/kserve/tree/master/python/pmmlserver) | [PMML](http://dmg.org/pmml/v4-4-1/GeneralStructure.html) | v1 | -- | v0.10.0 (KServe) | 3, 4 ([PMML4.4.1](https://github.com/autodeployai/pypmml)) | [SKLearn PMML](./pmml) | +| [SKLearn MLServer](https://github.com/SeldonIO/MLServer) | [Pickled Model](https://scikit-learn.org/stable/modules/model_persistence.html) | v2 | v2| v1.0.0 (MLServer) | 1 | [SKLearn Iris V2](./sklearn/v2) | +| [SKLearn ModelServer](https://github.com/kserve/kserve/tree/master/python/sklearnserver) | [Pickled Model](https://scikit-learn.org/stable/modules/model_persistence.html) | v1 | -- | v0.10.0 (KServe) | 1 | [SKLearn Iris](./sklearn/v2) | +| [TFServing](https://www.tensorflow.org/tfx/guide/serving) | [TensorFlow SavedModel](https://www.tensorflow.org/guide/saved_model) | v1 | *tensorflow | 2.6.2 ([TFServing Versions](https://github.com/tensorflow/serving/releases)) | 2 | [TensorFlow flower](./tensorflow) | +| [TorchServe](https://pytorch.org/serve/server.html) | [Eager Model/TorchScript](https://pytorch.org/docs/master/generated/torch.save.html) | v1, v2, *torchserve | *torchserve | 0.7.0 (TorchServe) | 1 | [TorchServe mnist](./torchserve) | +| [Triton Inference Server](https://github.com/triton-inference-server/server) | [TensorFlow,TorchScript,ONNX](https://github.com/triton-inference-server/server/blob/r21.09/docs/model_repository.md)| v2 | v2 | 21.09-py3 (Triton) | 8 (TensoRT), 1, 2 (TensorFlow), 1 (PyTorch), 2 (Triton) [Compatibility Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)| [Torchscript cifar](triton/torchscript) | +| [XGBoost MLServer](https://github.com/SeldonIO/MLServer) | [Saved Model](https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html) | v2 | v2 | v1.0.0 (MLServer) | 1 | [XGBoost Iris V2](./xgboost) | +| [XGBoost ModelServer](https://github.com/kserve/kserve/tree/master/python/xgbserver) | [Saved Model](https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html) | v1 | -- | v0.10.0 (KServe) | 1 | [XGBoost Iris](./xgboost) | + + + +*tensorflow - Tensorflow implements its own prediction protocol in addition to KServe's. See: [Tensorflow Serving Prediction API](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/apis/prediction_service.proto) documentation + +*torchserve - PyTorch implements its own predicition protocol in addition to KServe's. See: [Torchserve gRPC API](https://pytorch.org/serve/grpc_api.html#) documentation !!! Note The model serving runtime version can be overwritten with the `runtimeVersion` field on InferenceService yaml and we highly recommend