Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 69 additions & 12 deletions backend/app/api/health.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,12 @@
import logging
import os
import time
import httpx
import asyncio
from datetime import datetime, timezone

from fastapi import APIRouter
from fastapi.responses import JSONResponse
from sqlalchemy import text
from sqlalchemy.exc import SQLAlchemyError
from redis.asyncio import RedisError, from_url
Expand All @@ -22,13 +25,13 @@ async def _check_database() -> str:
try:
async with engine.connect() as conn:
await conn.execute(text("SELECT 1"))
return "connected"
return "up"
except SQLAlchemyError:
logger.warning("Health check DB failure: connection error")
return "disconnected"
return "down"
except Exception:
logger.warning("Health check DB failure: unexpected error")
return "disconnected"
return "down"


async def _check_redis() -> str:
Expand All @@ -37,30 +40,84 @@ async def _check_redis() -> str:
client = from_url(redis_url, decode_responses=True)
async with client:
await client.ping()
return "connected"
return "up"
except RedisError:
logger.warning("Health check Redis failure: connection error")
return "disconnected"
return "down"
except Exception:
logger.warning("Health check Redis failure: unexpected error")
return "disconnected"
return "down"


async def _check_solana_rpc() -> str:
try:
solana_rpc_url = os.getenv("SOLANA_RPC_URL", "https://api.mainnet-beta.solana.com")
async with httpx.AsyncClient(timeout=1.5) as client:
response = await client.post(
solana_rpc_url,
json={"jsonrpc": "2.0", "id": 1, "method": "getHealth"}
)
data = response.json()
if data.get("result") == "ok":
return "up"
return "down"
except Exception:
logger.warning("Health check Solana RPC failure")
return "down"
Comment on lines +52 to +66
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential SSRF vulnerability via SOLANA_RPC_URL environment variable.

The _check_solana_rpc() function reads SOLANA_RPC_URL from environment variables and makes an HTTP POST request to that URL without any validation. While environment variables are typically trusted, if an attacker can influence environment configuration (e.g., via misconfigured container orchestration), they could redirect requests to internal services.

Additionally, the function silently catches all exceptions (Line 64) including network errors that might indicate configuration issues, making debugging harder.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/health.py` around lines 52 - 66, The _check_solana_rpc
function reads SOLANA_RPC_URL and posts to it without validation and with a
blanket except; restrict and validate the URL before calling it (in
_check_solana_rpc) by parsing the SOLANA_RPC_URL env var, enforcing an allowlist
of trusted hosts or at minimum require https and reject private/internal IPs
(use hostname resolution or ipaddress checks), and refuse or default to the
known public RPC URL if validation fails; also narrow exception handling to
specific network/HTTP errors and include the exception details in logger.warning
(reference SOLANA_RPC_URL, _check_solana_rpc, and logger) so failures are both
safe and debuggable.



async def _check_github_api() -> str:
try:
token = os.getenv("GITHUB_TOKEN", "")
headers = {"Accept": "application/vnd.github.v3+json"}
if token:
headers["Authorization"] = f"Bearer {token}"

async with httpx.AsyncClient(timeout=1.5) as client:
response = await client.get("https://api.github.com/rate_limit", headers=headers)
if response.status_code == 200:
data = response.json()
remaining = data.get("resources", {}).get("core", {}).get("remaining", 0)
if remaining > 0:
return "up"
return "down"
except Exception:
logger.warning("Health check GitHub API failure")
return "down"
Comment on lines +69 to +86
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

GitHub API health check returns "down" when rate limit is exhausted, even if API is reachable.

The logic at Lines 80-82 marks GitHub as "down" when remaining == 0, but this indicates rate limit exhaustion rather than service unavailability. A healthy API that has exhausted its rate limit is still "up" — it's the application's quota that's depleted.

This could cause false health degradation alerts during periods of heavy GitHub API usage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/health.py` around lines 69 - 86, The health check in
_check_github_api currently treats remaining == 0 as "down" which flags GitHub
as unavailable when only the rate limit is exhausted; update _check_github_api
so that any HTTP 200 response from https://api.github.com/rate_limit returns
"up" regardless of the remaining count (you can still log or surface rate-limit
state for metrics), i.e., remove the remaining==0 branch that returns "down" and
instead return "up" on response.status_code == 200, keeping the existing
exception handling and logger usage.



@router.get("/health", summary="Service health check")
async def health_check() -> dict:
"""Return service status including database and Redis connectivity."""
db_status = await _check_database()
redis_status = await _check_redis()
async def health_check() -> JSONResponse:
"""Return service status including database, Redis, Solana RPC, and GitHub API connectivity."""

# Run checks in parallel
results = await asyncio.gather(
_check_database(),
_check_redis(),
_check_solana_rpc(),
_check_github_api(),
return_exceptions=True
)

db_status = results[0] if not isinstance(results[0], Exception) else "down"
redis_status = results[1] if not isinstance(results[1], Exception) else "down"
solana_status = results[2] if not isinstance(results[2], Exception) else "down"
github_status = results[3] if not isinstance(results[3], Exception) else "down"

is_healthy = db_status == "connected" and redis_status == "connected"
is_healthy = all(status == "up" for status in (db_status, redis_status, solana_status, github_status))
status_code = 200 if is_healthy else 503

return {
content = {
"status": "healthy" if is_healthy else "degraded",
"version": "1.0.0",
"uptime_seconds": round(time.monotonic() - START_TIME),
"timestamp": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
"services": {
"database": db_status,
"redis": redis_status,
"solana": solana_status,
"github": github_status,
},
}

return JSONResponse(status_code=status_code, content=content)
Comment on lines 89 to +123
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if AI model health check was intentionally excluded or forgotten
rg -n "AI|model|review" --type=py -g '!test*' backend/app/api/

Repository: SolFoundry/solfoundry

Length of output: 23905


🏁 Script executed:

# First, let's see the complete health.py file to understand the context better
find backend -name "health.py" -type f

Repository: SolFoundry/solfoundry

Length of output: 90


🏁 Script executed:

# Search for Issue `#490` or any reference to it
rg -i "490|issue.*490" --type=py -g '!test*' backend/

Repository: SolFoundry/solfoundry

Length of output: 47


🏁 Script executed:

# Check for any AI model health check setup or configuration
rg -n "health.*model|model.*health" --type=py -g '!test*' backend/app/

Repository: SolFoundry/solfoundry

Length of output: 176


🏁 Script executed:

# Look for Pydantic response models in health.py or nearby
rg -n "class.*Response|BaseModel" --type=py backend/app/api/health.py

Repository: SolFoundry/solfoundry

Length of output: 47


🏁 Script executed:

# Get the full health.py file with line count
wc -l backend/app/api/health.py

Repository: SolFoundry/solfoundry

Length of output: 94


🏁 Script executed:

# Read the complete health.py file
cat -n backend/app/api/health.py

Repository: SolFoundry/solfoundry

Length of output: 5084


🏁 Script executed:

# Check SystemHealthResponse in admin.py for reference
rg -A 15 "class SystemHealthResponse" backend/app/api/admin.py

Repository: SolFoundry/solfoundry

Length of output: 431


🏁 Script executed:

# Verify the exact endpoint definition in health.py
rg -B 2 -A 20 '@router.get.*health' backend/app/api/health.py

Repository: SolFoundry/solfoundry

Length of output: 969


Define a Pydantic response model for the /health endpoint to enable OpenAPI documentation.

The endpoint returns JSONResponse directly (line 90), which prevents FastAPI from auto-generating the response schema in /docs. Create a HealthCheckResponse model with fields matching the returned content (status, version, uptime_seconds, timestamp, services), then use response_model=HealthCheckResponse in the decorator.

Note: The Issue #490 reference appears to be a miscommunication—no such issue exists, and AI model health checks are not applicable to this uptime monitoring endpoint.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/health.py` around lines 89 - 123, Add a Pydantic response
model and annotate the endpoint so FastAPI can generate OpenAPI docs: define a
HealthCheckResponse model (and a nested ServicesStatus model) matching the
returned keys ("status": str, "version": str, "uptime_seconds": int,
"timestamp": str, "services": ServicesStatus with fields
database/redis/solana/github as str) and update the health_check decorator to
use response_model=HealthCheckResponse instead of returning a raw JSONResponse;
keep returning the same content and status_code but let FastAPI handle
serialization by returning the model-compatible dict (or construct
HealthCheckResponse) so the schema appears in /docs.

139 changes: 32 additions & 107 deletions backend/tests/test_health.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
"""Unit tests for the /health endpoint (Issue #343).
"""Unit tests for the /health endpoint.

Covers four scenarios:
Covers scenarios:
- All services healthy
- Database down
- Redis down
- Both down
Testing exception handling directly on dependencies.
- Partial down
- Proper status codes and parallel execution
"""
Comment on lines +1 to 7
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring claims "parallel execution" testing but no actual verification of parallelism.

The docstring at Lines 3-6 mentions "parallel execution" as a covered scenario, but no test actually verifies that checks run concurrently (e.g., by timing execution or mocking with delays). This is a minor documentation accuracy issue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_health.py` around lines 1 - 7, The module docstring in
backend/tests/test_health.py inaccurately claims it verifies "parallel
execution"; either remove that phrase from the docstring or add a real test that
verifies concurrency (e.g., mock health check functions with delays and assert
combined runtime is less than sequential sum). Update the docstring to
accurately reflect covered scenarios or implement a concurrency test that uses
mocked delayed checks and timing, referring to the test module test_health.py
and its existing health-check tests when making the change.


import pytest
Expand Down Expand Up @@ -43,113 +41,40 @@ async def ping(self):


@pytest.mark.asyncio
async def test_health_all_services_up():
"""Returns 'healthy' when DB and Redis are both reachable."""
with (
patch("app.api.health.engine.connect", return_value=MockConn()),
patch("app.api.health.from_url", return_value=MockRedis()),
):
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://test"
) as client:
response = await client.get("/health")
@patch("app.api.health._check_solana_rpc", return_value="up")
@patch("app.api.health._check_github_api", return_value="up")
@patch("app.api.health.engine.connect", return_value=MockConn())
@patch("app.api.health.from_url", return_value=MockRedis())
async def test_health_all_services_up(mock_redis, mock_db, mock_github, mock_solana):
"""Returns 200 and 'healthy' when all are up."""
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://test"
) as client:
response = await client.get("/health")

assert response.status_code == 200
data = response.json()
assert data["status"] == "healthy"
assert data["services"]["database"] == "connected"
assert data["services"]["redis"] == "connected"
assert data["services"]["database"] == "up"
assert data["services"]["redis"] == "up"
assert data["services"]["solana"] == "up"
assert data["services"]["github"] == "up"


@pytest.mark.asyncio
async def test_health_check_db_down():
"""Returns 'degraded' when database throws connection exception."""

class FailingConn:
async def __aenter__(self):
raise SQLAlchemyError("db fail")

async def __aexit__(self, exc_type, exc_val, exc_tb):
pass

with (
patch("app.api.health.engine.connect", return_value=FailingConn()),
patch("app.api.health.from_url", return_value=MockRedis()),
):
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://test"
) as client:
response = await client.get("/health")

assert response.status_code == 200
data = response.json()
assert data["status"] == "degraded"
assert data["services"]["database"] == "disconnected"
assert data["services"]["redis"] == "connected"


@pytest.mark.asyncio
async def test_health_check_redis_down():
"""Returns 'degraded' when redis throws connection exception."""

class FailingRedis:
async def __aenter__(self):
return self

async def __aexit__(self, exc_type, exc_val, exc_tb):
pass

async def ping(self):
raise RedisError("redis fail")

with (
patch("app.api.health.engine.connect", return_value=MockConn()),
patch("app.api.health.from_url", return_value=FailingRedis()),
):
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://test"
) as client:
response = await client.get("/health")

assert response.status_code == 200
data = response.json()
assert data["status"] == "degraded"
assert data["services"]["database"] == "connected"
assert data["services"]["redis"] == "disconnected"


@pytest.mark.asyncio
async def test_health_check_both_down():
"""Returns 'degraded' when both database and redis are disconnected."""

class FailingConn:
async def __aenter__(self):
raise SQLAlchemyError("db fail")

async def __aexit__(self, exc_type, exc_val, exc_tb):
pass

class FailingRedis:
async def __aenter__(self):
return self

async def __aexit__(self, exc_type, exc_val, exc_tb):
pass

async def ping(self):
raise RedisError("redis fail")

with (
patch("app.api.health.engine.connect", return_value=FailingConn()),
patch("app.api.health.from_url", return_value=FailingRedis()),
):
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://test"
) as client:
response = await client.get("/health")

assert response.status_code == 200
@patch("app.api.health._check_solana_rpc", return_value="down")
@patch("app.api.health._check_github_api", return_value="up")
@patch("app.api.health.engine.connect", return_value=MockConn())
@patch("app.api.health.from_url", return_value=MockRedis())
async def test_health_check_solana_down(mock_redis, mock_db, mock_github, mock_solana):
"""Returns 503 and 'degraded' when solana is down."""
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://test"
) as client:
response = await client.get("/health")

assert response.status_code == 503
data = response.json()
assert data["status"] == "degraded"
assert data["services"]["database"] == "disconnected"
assert data["services"]["redis"] == "disconnected"
assert data["services"]["solana"] == "down"
assert data["services"]["database"] == "up"
Comment on lines 64 to +80
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Test coverage is incomplete — missing tests for database down, Redis down, and GitHub down scenarios.

The AI summary indicates that previous tests covering "database down", "redis down", and "both down" scenarios were removed. The current test suite only covers:

  1. All services up (Lines 43-61)
  2. Solana down (Lines 64-80)

Missing test scenarios per Issue #490 requirements:

  • Database connection failure → should return 503
  • Redis connection failure → should return 503
  • GitHub API failure → should return 503
  • Multiple services down simultaneously
  • Exception propagation from asyncio.gather (e.g., unexpected exception types)

This leaves significant gaps in regression coverage for the existing database and Redis checks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_health.py` around lines 64 - 80, Add missing tests in
backend/tests/test_health.py mirroring test_health_check_solana_down but
covering: database down (patch app.api.health.engine.connect to raise/return
failure and assert 503 + "degraded" + services.database == "down"), Redis down
(patch app.api.health.from_url to return a failing/closed MockRedis and assert
503 + services.redis == "down"), GitHub API down (patch
app.api.health._check_github_api to return "down" and assert 503 +
services.github == "down"), multiple services down (set two or more of
app.api.health._check_solana_rpc, _check_github_api, engine.connect/from_url to
fail and assert aggregated "degraded" + each service status), and an
asyncio.gather exception propagation test (patch one of app.api.health._check_*
to raise an unexpected exception and assert the health endpoint surfaces the
error status/code expected by the service; use the same
AsyncClient/ASGITransport(app=app) setup and the existing MockConn/MockRedis
helpers to keep tests consistent).