Is your feature request related to a problem? Please describe.
When using Triton Inference Server with --model-control-mode=explicit and the Python backend, a Python model version whose repository directory is missing model.py can remain stuck in LOADING.
In our test, the model repository contained config.pbtxt and a version directory, but the version directory did not contain model.py. After calling the repository load API:
POST /v2/repository/models/<model_name>/load
Triton logged:
Failed to preinitialize Python stub: Python model file not found in '<model_repo>/<model_name>/<version>/model.py'
However, POST /v2/repository/index continued to report the model version as LOADING instead of transitioning to UNAVAILABLE/error with a reason. Retrying load after adding model.py did not recover the model, and unload also did not clear the stuck state. The practical recovery was restarting the Triton server/pod.
This is different from other Python backend failures we tested, such as empty model.py, missing imports, syntax errors, or exceptions in initialize(). Those returned HTTP 400 from /load and showed UNAVAILABLE with a useful reason in the repository index.
Describe the solution you'd like
For missing model.py and similar Python backend preinitialization failures, Triton should transition the model version from LOADING to UNAVAILABLE or another terminal error state, with the failure reason visible in repository/index.
It would also help to have a supported recovery API for a model stuck in LOADING, such as:
- allowing
unload to clear a stuck LOADING model/version, or
- exposing an admin/repository API to force a failed terminal state for a stuck load.
Describe alternatives you've considered
Our current workaround is to add preflight validation before calling Triton /load, specifically checking that Python backend model artifacts contain model.py.
If Triton still enters this stuck LOADING state, the only reliable recovery we found is restarting the affected Triton server/pod. This is operationally expensive because it can affect other loaded models on the same server.
Additional context
Environment tested:
- Triton image:
nvcr.io/nvidia/tritonserver:24.10-pyt-python-py3
- Triton server version:
2.51.0
- Python backend stub linked to Python
3.10
- Model control mode:
explicit
- Backend:
python
Observed behavior:
- Missing
model.py: /load did not produce a clean terminal UNAVAILABLE state; repository index stayed LOADING.
- Empty
model.py: /load returned HTTP 400; repository index showed UNAVAILABLE with AttributeError.
- Missing import:
/load returned HTTP 400; repository index showed UNAVAILABLE with ModuleNotFoundError.
- Syntax error:
/load returned HTTP 400; repository index showed UNAVAILABLE with SyntaxError.
initialize() exception: /load returned HTTP 400; repository index showed UNAVAILABLE with the exception reason.
Is your feature request related to a problem? Please describe.
When using Triton Inference Server with
--model-control-mode=explicitand the Python backend, a Python model version whose repository directory is missingmodel.pycan remain stuck inLOADING.In our test, the model repository contained
config.pbtxtand a version directory, but the version directory did not containmodel.py. After calling the repository load API:POST /v2/repository/models/<model_name>/loadTriton logged:
Failed to preinitialize Python stub: Python model file not found in '<model_repo>/<model_name>/<version>/model.py'However,
POST /v2/repository/indexcontinued to report the model version asLOADINGinstead of transitioning toUNAVAILABLE/error with a reason. Retryingloadafter addingmodel.pydid not recover the model, andunloadalso did not clear the stuck state. The practical recovery was restarting the Triton server/pod.This is different from other Python backend failures we tested, such as empty
model.py, missing imports, syntax errors, or exceptions ininitialize(). Those returnedHTTP 400from/loadand showedUNAVAILABLEwith a useful reason in the repository index.Describe the solution you'd like
For missing
model.pyand similar Python backend preinitialization failures, Triton should transition the model version fromLOADINGtoUNAVAILABLEor another terminal error state, with the failure reason visible inrepository/index.It would also help to have a supported recovery API for a model stuck in
LOADING, such as:unloadto clear a stuckLOADINGmodel/version, orDescribe alternatives you've considered
Our current workaround is to add preflight validation before calling Triton
/load, specifically checking that Python backend model artifacts containmodel.py.If Triton still enters this stuck
LOADINGstate, the only reliable recovery we found is restarting the affected Triton server/pod. This is operationally expensive because it can affect other loaded models on the same server.Additional context
Environment tested:
nvcr.io/nvidia/tritonserver:24.10-pyt-python-py32.51.03.10explicitpythonObserved behavior:
model.py:/loaddid not produce a clean terminalUNAVAILABLEstate; repository index stayedLOADING.model.py:/loadreturnedHTTP 400; repository index showedUNAVAILABLEwithAttributeError./loadreturnedHTTP 400; repository index showedUNAVAILABLEwithModuleNotFoundError./loadreturnedHTTP 400; repository index showedUNAVAILABLEwithSyntaxError.initialize()exception:/loadreturnedHTTP 400; repository index showedUNAVAILABLEwith the exception reason.