-
Notifications
You must be signed in to change notification settings - Fork 12
Description
I've been up for two days trying to fix this. Very little sleep, very frustrated, even used AI to try to fix this... nothing works.
I have two machines, I'm trying to run distributed inference with ollama in WSL.
olol server --host 0.0.0.0 --port 50051 --ollama-host http://localhost:11434
(works on both machines Ollama gRPC server started on port 50051)
olol rpc-server --host 0.0.0.0 --port 50052 --device auto
(fails on both machines WARNING - Failed to get Ollama status: HTTP 404)
curl with api/tags pulls the models, ollama ls pulls the models, but olol rpc-server does not pull the models. So the problem must be with olol since I can use ollama just find and pull from other places as well.
olol proxy --host 0.0.0.0 --port 8000 --servers "192.168.x.x:50051,192.168.y.y:50051" --distributed --rpc-servers "192.168.x.x:50052,192.168.y.y:50052"
Servers: 2/2 healthy Distributed Inference: ENABLED Active Requests: 0 Total Requests: 0 Uptime: 00:00:02 Generate: 0 Chat: 0 Embeddings: 0
BUT it says 0 models on both!
What's the deal?
Tried again:
olol rpc-server --host 0.0.0.0 --port 50052 --device cuda --quantize q5_0 --flash-attention --context-window 16384
2025-08-14 23:17:02,870 - INFO - Ollama already running at http://localhost:11434
2025-08-14 23:17:02,870 - INFO - Connected to Ollama at http://localhost:11434
2025-08-14 23:17:03,904 - WARNING - Failed to get Ollama status: HTTP 404
2025-08-14 23:17:05,119 - INFO - Server initialized with device: cuda:0
2025-08-14 23:17:05,119 - INFO - Device capabilities: {'backend_type': 'cuda', 'device_id': 0, 'memory': 25756696576, 'compute_capability': '8.9', 'name': 'NVIDIA GeForce RTX 4090'}
2025-08-14 23:17:05,119 - INFO - Starting Ollama health check thread (interval: 30s)
2025-08-14 23:17:05,125 - INFO - RPC server started on 0.0.0.0:50052 with device cuda:0
2025-08-14 23:17:07,199 - INFO - Discovery service started for server with ID 93bed723-b704-44a6-ab49-8b6f4f46cd75 (IPv6: Supported)
2025-08-14 23:17:07,199 - INFO - Auto-discovery service started
----UPDATE FROM AI RESEARCH--
To fix the issue where the olol RPC server is making a status check call that results in a 404 from Ollama, you will need to modify or disable the health/status check request that olol sends to the Ollama HTTP server.
Currently, olol is attempting to fetch a status endpoint that Ollama does not provide, which leads to the HTTP 404. Since Ollama's API doesn't officially have a status endpoint (beyond version checks), the 404 is expected unless olol is updated to use a valid endpoint.
Options to fix this:
Modify olol to skip or change the status check:
Edit the code in olol that performs the health or status check on the Ollama HTTP server.
Change or remove the call to the HTTP path causing the 404.
Use /api/version or another valid endpoint for health checks instead.
Patch olol source if open source:
Find the section in olol’s RPC server code that queries Ollama’s HTTP status endpoint during startup or health checks.
Replace it with a call to /api/version or a no-op health check that doesn't cause 404.
Request a feature or fix from olol maintainers:
If this behavior can't be configured, consider opening an issue or feature request with olol’s repository to support a proper health check or skip invalid calls.
Suppress warnings if code change is not feasible:
If you cannot modify olol code immediately, ignore the 404 warning because it doesn’t block the RPC server startup or operation.
In summary:
The root cause is olol making a call to an unsupported Ollama API endpoint.
Fix involves modifying olol code/config to avoid that invalid call.
Use /api/version as a simple alternative endpoint for health check.
If you need, I can help identify the piece of code or config in olol causing this and assist with changing it.