❗ Important Please use the InferenceService CR for deploying on ModelMesh.
Here is a complete example of a Predictor
spec:
apiVersion: serving.kserve.io/v1alpha1
kind: Predictor
metadata:
name: my-mnist-predictor
spec:
modelType:
name: tensorflow
version: "1.15" # Optional
runtime: # Optional
name: triton-2.x
path: my_models/mnist-tf
schemaPath: my_schemas/mnist-schema.json # Optional
storage:
s3:
secretKey: my_storage
bucket: my_bucket # Optional
Note
runtime
is optional. If included, the model will be loaded/served using theServingRuntime
with the specified name, and the predictorsmodelType
must match an entry in that runtime'ssupportedModels
list (see runtimes)- The CRD contains additional fields but they have been omitted here for now since they are not yet fully supported
The Status section of the Predictor
custom resource reflects details about its current state and comprises the following fields.
grpcEndpoint
- The gRPC target endpoint (host/port) corresponding to this predictor. Note that this will currently be the same for all predictors owned by a given ModelMesh Serving installation.
httpEndpoint
- The HTTP (REST) endpoint corresponding to this predictor. This endpoint is provided through a REST proxy sidecar (if enabled), and this will also be the same for all predictors owned by a given ModelMesh Serving installation.
available
- A boolean value indicating whether the predictor's endpoints are ready to serve inferencing requests. Note that this does not necessarily mean requests will respond immediately, the corresponding model may or may not be loaded in memory. In the case that it isn't there may be some delay before the response comes back.
activeModelState
- The state of the model currently being served by the predictor's endpoints. It may be one of:
Pending
- The ModelMesh Serving controller has not yet acknowledged/registered this (new) predictor/model.Standby
- The model is currently not loaded in memory anywhere, but will be automatically upon first usage. This means the first requests to this predictor will likely take longer to respond.Loading
- The model is in the process of loading. Requests may take longer to respond since they will be blocked until the loading completes.Loaded
- The model is loaded in at least one pod and ready to respond immediately to inferencing requests.FailedToLoad
- The model could not be loaded for some reason. See thelastFailureInfo
field for more details.
targetModelState
- This will be set only when transitionStatus
is not UpToDate
, meaning that the target model differs from the currently-active model. The target model always corresponds to the predictor's current spec. The possible values are the same as activeModelState
but should generally only be either Loading
or FailedToLoad
.
transitionStatus
- Indicates state of the predictor relative to its current spec. It may be one of:
UpToDate
- The predictor's current model reflects its spec, that is, its active model matches its target model.InProgress
- The predictor's currently active model configuration is older than its current spec reflects. This is usually the case immediately after the spec changes, while a new target model is loading (targetModelState
should beLoading
). Once the target model finishes loading successfully, the active model will become the target model and thetransitionStatus
will return toUpToDate
.BlockedByFailedLoad
- The predictor's currently active model configuration is older than its current spec reflects, because there was a problem loading the corresponding model. See thelastFailureInfo
field for more details of the failure.InvalidSpec
- The predictor's currently active model configuration was not transitioned to match its current spec because the current spec is invalid. There may be more details of the error in thelastFailureInfo
field.
failedCopies
- The number of copies of the active or target model that failed to load recently (there will be at most one of each per pod)
lastFailureInfo
- Details about the most recent error associated with this predictor. Not all of the contained fields will necessarily have a value.
reason
- A high level code indicating the nature of the failure, may be one of:ModelLoadFailed
- The model failed to load within a serving runtime container. Loading is automatically retried in other pods of the same runtime if they exist, thefailedCopies
field indicates how many different pods the model recently failed to load in.RuntimeUnhealthy
- CorrespondingServingRuntime
pods failed to start or are unhealthy.NoSupportingRuntime
- There are noServingRuntime
s which support the specified model type.RuntimeNotRecognized
- There is noServingRuntime
defined with the specified runtime name.InvalidPredictorSpec
- The currentPredictor
spec is invalid or unsupported.
location
- Indication of the pod in which a loading failure most recently occurred, if applicable. Its value will be the last 12 digits of the pod's full name.message
- A message containing more detail about the error/failure.modelId
- The internal id of the model in question. This includes a hash of thePredictor
's spec.time
- The time at which the failure occurred, if applicable.
Upon creation, Predictors will always transition to Loaded
state (unless the loading fails), but later if unused it is possible that they end up in a Standby
state which means they are still available to serve requests but the first request could incur a loading delay. Whether this happens is a function of the available capacity and usage pattern of other models. It's possible that models will transition from Standby
back to Loaded
"by themselves" if more capacity becomes available.
Model loading will be retried immediately in other pods if it fails, after which it will be re-attempted periodically (every ten minutes or so).