refactor(experimental): reuse HTTP clients, add response models, and parallelize ops by guozhihao-224 · Pull Request #1253 · inclusionAI/AReaL

guozhihao-224 · 2026-04-24T10:43:57Z

Description

Refactor the inference service HTTP layer to reuse long-lived httpx clients instead of creating per-request clients. Adds Pydantic response models across all services for type safety, and parallelizes sequential operations (health checks, proxy registrations, broadcasts) for better throughput.

Related Issue

Fixes #1217

Type of Change

Checklist

I have read the Contributing Guide
Pre-commit hooks pass (pre-commit run --all-files)
Relevant tests pass; new tests added for new functionality
Documentation updated (if applicable; built with ./docs/build_all.sh)
Branch is up to date with main
Self-reviewed via /review-pr command
This PR was created by a coding agent via /create-pr
This PR is a breaking change

Breaking Change Details (if applicable):

N/A

Additional Context

Key changes:

Controller: shared httpx.Client/AsyncClient, idempotent destroy(), parallel proxy registration via ThreadPoolExecutor, parallel set_version/pause/continue via asyncio.gather
Gateway: shared AsyncClient via lifespan, _use_client() context manager in streaming module, parallel data proxy registration
Router: shared AsyncClient, parallel health checks via asyncio.gather, proper lifespan cleanup with try/finally
Data proxy: Pydantic response models, shared client for non-streaming requests, parallel callback delivery, proper InfBridge cleanup on shutdown and backend reconfiguration
InfBridge: shared AsyncClient with aclose() lifecycle method

Files changed:

areal/experimental/inference_service/controller/controller.py
areal/experimental/inference_service/data_proxy/app.py
areal/experimental/inference_service/data_proxy/pause.py
areal/experimental/inference_service/gateway/app.py
areal/experimental/inference_service/gateway/streaming.py
areal/experimental/inference_service/inf_bridge.py
areal/experimental/inference_service/router/app.py
tests/experimental/inference_service/test_data_proxy_chat.py

…parallelize ops in inference service Replace per-request httpx/requests client creation with shared long-lived clients across the inference service stack (controller, gateway, router, data proxy, InfBridge). This eliminates repeated TCP connection setup and TLS handshake overhead on every API call. Key changes: - Controller: shared httpx.Client/AsyncClient, idempotent destroy() - Gateway: shared AsyncClient via lifespan, _use_client() helper in streaming - Router: shared AsyncClient, parallel health checks via asyncio.gather - Data proxy: Pydantic response models, shared client, parallel callbacks - InfBridge: shared AsyncClient with proper aclose() lifecycle - Parallelize: proxy registration, set_version, pause/continue broadcasts - Add Pydantic BaseModel response types across all services for type safety Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request refactors the inference service components to use shared httpx clients instead of creating new ones per request or using the requests library. It introduces Pydantic response models for better API documentation and validation across the controller, data proxy, gateway, and router. Additionally, several operations have been parallelized using asyncio.gather or ThreadPoolExecutor to improve performance, including worker health checks, callback deliveries, and version updates. Feedback includes a recommendation to avoid catching BaseException to prevent interfering with task cancellation, and suggestions to reuse shared httpx clients in contexts where they were incorrectly assumed to be thread-unsafe or incompatible with streaming.

gemini-code-assist · 2026-04-24T10:48:35Z

+                config.request_timeout,
+                client=http_client,
+            )
+        except BaseException as exc:


Catching BaseException is generally discouraged as it includes system-level exceptions like KeyboardInterrupt and SystemExit. More importantly, in an async context, it will catch asyncio.CancelledError, which can prevent tasks from being cancelled properly and lead to unexpected behavior or hangs during shutdown. Use Exception instead.

Suggested change

except BaseException as exc:

except Exception as exc:

gemini-code-assist · 2026-04-24T10:48:35Z

+            # Each thread gets its own httpx.Client because httpx.Client
+            # is not thread-safe and must not be shared across threads.
+            with httpx.Client() as client:
+                resp = client.post(
+                    f"{router_addr}/register",
+                    json={"worker_addr": data_proxy_addr},
+                    headers={"Authorization": f"Bearer {admin_key}"},
+                    timeout=5,
+                )


The comment stating that httpx.Client is not thread-safe is incorrect. According to the httpx documentation, both Client and AsyncClient are thread-safe and intended to be shared across threads and tasks. Creating a new client per thread is unnecessary and inefficient. You should use the existing self._sync_client instance instead.

# httpx.Client is thread-safe and can be shared across threads. resp = self._sync_client.post( f"{router_addr}/register", json={"worker_addr": data_proxy_addr}, headers={"Authorization": f"Bearer {admin_key}"}, timeout=5, )

gemini-code-assist · 2026-04-24T10:48:35Z

    try:
-        async with httpx.AsyncClient(timeout=httpx.Timeout(timeout)) as client:
-            async with client.stream(
+        async with httpx.AsyncClient(timeout=httpx.Timeout(timeout)) as c:


The note above (lines 452-454) claiming that streaming requires a per-request client is incorrect. httpx.AsyncClient supports multiple concurrent streams using the same client instance. Reusing the shared client via the client parameter (using the _use_client helper already defined in this file) would be more efficient and consistent with the PR's objectives.

Suggested change

async with httpx.AsyncClient(timeout=httpx.Timeout(timeout)) as c:

async with _use_client(client, timeout) as c:

garrett4wade

LGTM

guozhihao-224 requested review from garrett4wade, nuzant and rchardx as code owners April 24, 2026 10:43

guozhihao-224 force-pushed the refactor/inference-service-http-perf branch from 4215028 to e5e9e94 Compare April 24, 2026 10:46

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

garrett4wade approved these changes Apr 27, 2026

View reviewed changes

garrett4wade merged commit 8cc52ba into inclusionAI:main Apr 27, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(experimental): reuse HTTP clients, add response models, and parallelize ops#1253

refactor(experimental): reuse HTTP clients, add response models, and parallelize ops#1253
garrett4wade merged 1 commit intoinclusionAI:mainfrom
guozhihao-224:refactor/inference-service-http-perf

guozhihao-224 commented Apr 24, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Uh oh!

garrett4wade left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	async with httpx.AsyncClient(timeout=httpx.Timeout(timeout)) as c:
	async with _use_client(client, timeout) as c:

Conversation

guozhihao-224 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

Checklist

Additional Context

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

garrett4wade left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

guozhihao-224 commented Apr 24, 2026 •

edited

Loading