Helm: add Liveness/Readiness for stac-proxy deployment

### Reference

- https://github.com/IFRCGo/go-deploy/tree/develop/applications/argocd/staging/applications/montandon-eoapi

### Problem

If an error happens during application startup or runtime, the container logs the error but the pod is not restarted automatically.
This leaves the pod in a broken state, and a manual restart is required.

### Error log

```python
INFO:     Will watch for changes in these directories: ['/app']
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [1] using StatReload
DEBUG:    Mounted app at /stac
DEBUG:    CORS: credentials enabled with wildcard origins, using origin reflection
INFO:     CORS: handling locally (allow_origins=['*'], allow_methods=['*'], allow_credentials=True)
INFO:     Started server process [8]
INFO:     Waiting for application startup.
DEBUG:    Appending required conformance for collections filter
DEBUG:    Appending required conformance for items filter
INFO:     Running upstream server health checks...
INFO:     Upstream API 'http://montandon-eoapi-stac:8080/' is healthy
ERROR:    Traceback (most recent call last):
  File "/usr/local/lib/python3.13/site-packages/starlette/routing.py", line 694, in lifespan
    async with self.lifespan_context(app) as maybe_state:
               ~~~~~~~~~~~~~~~~~~~~~^^^^^
  File "/usr/local/lib/python3.13/contextlib.py", line 214, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/fastapi/routing.py", line 201, in merged_lifespan
    async with original_context(app) as maybe_original_state:
               ~~~~~~~~~~~~~~~~^^^^^
  File "/usr/local/lib/python3.13/contextlib.py", line 214, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/src/stac_auth_proxy/lifespan.py", line 141, in lifespan
    await check_server_healths(
        settings.upstream_url, settings.oidc_discovery_internal_url
    )
  File "/app/src/stac_auth_proxy/lifespan.py", line 24, in check_server_healths
    await check_server_health(url)
  File "/app/src/stac_auth_proxy/lifespan.py", line 47, in check_server_health
    response.raise_for_status()
  File "/usr/local/lib/python3.13/site-packages/httpx/_models.py", line 829, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)

httpx.HTTPStatusError: Server error '502 Bad Gateway' for url 'https://goadmin-stage.ifrc.org/o/.well-known/openid-configuration'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/502

ERROR:    Application startup failed. Exiting.
```
> [!NOTE]
> The logs show **"ERROR: Application startup failed. Exiting."**, but the container does not actually exit. The pod continues running and appears **healthy** when checked through the Kubernetes API.

### Scenario

`stac-auth-proxy` runs in Azure AKS. When a node scales up or down, both the `stac-auth-proxy` pod and its dependency `go-api` pod are recreated simultaneously if they were running on the node that was scaled down.

As the dependent service `goadmin-stage.ifrc.org` is temporarily unavailable during that time, the startup health check fails and `stac-auth-proxy` throws error but doesn't exit.

The pod then stays in this failed state and does not restart automatically. Recovery only happens if:

- the node scales again, or
- someone manually deletes the pod.

### Possible Solution

- Exit the container when a startup error occurs so Kubernetes can restart the pod automatically.
- Add proper **liveness** and **readiness** probes to ensure Kubernetes can detect unhealthy pods and restart them when needed.

---

@batpad @alukach @pantierra

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helm: add Liveness/Readiness for stac-proxy deployment #140

Reference

Problem

Error log

Scenario

Possible Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Helm: add Liveness/Readiness for stac-proxy deployment #140

Description

Reference

Problem

Error log

Scenario

Possible Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions