Skip to content

Conversation

Zenulous
Copy link

@Zenulous Zenulous commented Apr 3, 2025

Summary

This pull request addresses a rare edge case issue causing a thread deadlock with access to _optional_thread_lock in ConnectionPool. The solution involves changing ThreadLock to ThreadRLock to allow reentrant locking and resolve the deadlock.

We have encountered a rare deadlock issue in our production environment while using HTTP Core in a multithreaded setup. The issue manifests in rare cases as threads indefinitely waiting to acquire a lock, causing the entire worker to hang. This deadlock occurs when the same thread attempts to acquire the lock multiple times without releasing it.

This is important to meet the library's goal of being thread safe.

For more details, please refer to the discussion here.
This issue is also felt by other users of the lib although explained less clearly like in #997.

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.

@encode encode deleted a comment from begilbert238 Apr 28, 2025
@Zenulous
Copy link
Author

Zenulous commented Jul 4, 2025

@tomchristie Is the team aware of this PR? I'd love to get an opinion on this PR because we still see this issue pop up. (#990)

@xiexianbin
Copy link

I tested it, and this pull request doesn't seem to work for #1029 ?

@Zenulous
Copy link
Author

Zenulous commented Sep 1, 2025

I tested it, and this pull request doesn't seem to work for #1029 ?

Could be, the issue we experience is not really related to large files so your problem likely has a different root cause

@lovelydinosaur
Copy link
Contributor

Unfortunately after quite some time trying we ourselves do not know the exact condition causing this.

I wouldn't make this change without a clear understanding of why a re-entrant lock would be required here.

Incidentally: The httpx 1..0 prelease has a simpler stack here.
I'd be more inclined to put my time into pushing that forward. https://www.encode.io/httpnext/

@basilisk487
Copy link

I wouldn't make this change without a clear understanding of why a re-entrant lock would be required here.

We are seeing this issue sporadically as well. Specifically, it seems to occur when an exception is thrown in the middle of http response streaming, and an immediate retry after that. We haven't been able to reproduce it reliably based on these factors alone, unfortunately, but I managed to grab a few thread dumps, and it clearly shows an attempt to acquire a lock from within _assign_requests_to_connections which already holds that lock.

  File "/opt/venv/lib/python3.12/site-packages/openai/resources/chat/completions/completions.py", line 182, in parse
    return self._post(
  File "/opt/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/opt/venv/lib/python3.12/site-packages/openai/_base_client.py", line 982, in request
    response = self._client.send(
  File "/opt/venv/lib/python3.12/site-packages/httpx/_client.py", line 914, in send
    response = self._send_handling_auth(
  File "/opt/venv/lib/python3.12/site-packages/httpx/_client.py", line 942, in _send_handling_auth
    response = self._send_handling_redirects(
  File "/opt/venv/lib/python3.12/site-packages/httpx/_client.py", line 979, in _send_handling_redirects
    response = self._send_single_request(request)
  File "/opt/venv/lib/python3.12/site-packages/httpx/_client.py", line 1014, in _send_single_request
    response = transport.handle_request(request)
  File "/opt/venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 250, in handle_request
    resp = self._pool.handle_request(req)
  File "/opt/venv/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 228, in handle_request
    closing = self._assign_requests_to_connections()
  File "/opt/venv/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 294, in _assign_requests_to_connections
    and len([connection.is_idle() for connection in self._connections])
  File "/opt/venv/lib/python3.12/site-packages/httpcore/_sync/connection.py", line 192, in is_idle
    def is_idle(self) -> bool:
  File "/opt/venv/lib/python3.12/site-packages/httpx/_models.py", line 900, in iter_bytes
    yield chunk
  File "/opt/venv/lib/python3.12/site-packages/httpx/_models.py", line 954, in iter_raw
    yield chunk
  File "/opt/venv/lib/python3.12/site-packages/httpx/_client.py", line 154, in __iter__
    yield chunk
  File "/opt/venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 128, in __iter__
    yield part
  File "/opt/venv/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 406, in __iter__
    self.close()
  File "/opt/venv/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 416, in close
    with self._pool._optional_thread_lock:
  File "/opt/venv/lib/python3.12/site-packages/httpcore/_synchronization.py", line 268, in __enter__
    self._lock.acquire()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants