Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock While Opening Database Connection with ddtrace 2.10.3 #10504

Open
jlmorton opened this issue Sep 4, 2024 · 1 comment
Open

Deadlock While Opening Database Connection with ddtrace 2.10.3 #10504

jlmorton opened this issue Sep 4, 2024 · 1 comment
Assignees
Labels
Profiling Continous Profling stale

Comments

@jlmorton
Copy link

jlmorton commented Sep 4, 2024

We're seeing some sort of a deadlock, or some other type of locking problem while opening new Postgres DB connections using ddtrace:2.10.4. The problem seems to have been introduced between 2.9.2 and 2.10.3. (I know that the problem does not occur on 2.9.2, but that is occurs on both 2.10.3 and 2.10.4, but haven't tried other versions. Unfortunately, it's tough for me to determine the specific release it was introduced in better, as I don't have a reproducible test case).

The problem manifests in an application where:

  1. We're running async code under ASGI
  2. We're calling some database code using asgiref SyncToAsync. Critically, SyncToAsync is implemented by executing the synchronous code in a ThreadPoolWorker

When this ThreadPoolWorker has all of its active threads exhausted (32 workers, in our case), then anything in our application that awaits something SyncToAsync will hang, ultimately crashing our app.

With some debug logging to examine that executor, we've managed to capture some stacks from the callables stuck in that executor:

Ultimately, there are various examples, but all of these end up stuck in:

  File "/usr/local/lib/python3.10/site-packages/dj_db_conn_pool/core/mixins/core.py", line 58, in get_new_connection
    with pool_container.lock:
  File "/usr/local/lib/python3.10/site-packages/ddtrace/profiling/collector/_lock.py", line 241, in __enter__
    return self._acquire(self.__wrapped__.__enter__, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/ddtrace/profiling/collector/_lock.py", line 105, in _acquire
    return inner_func(*args, **kwargs)

I've noticed there've been several commits to collector/_lock.py in the past few releases since 2.9.2, but haven't caught the breaking change.

Here's the full stack:

ile "/usr/local/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.10/site-packages/sentry_sdk/integrations/threading.py", line 99, in run
    return _run_old_run_func()
  File "/usr/local/lib/python3.10/site-packages/sentry_sdk/integrations/threading.py", line 92, in _run_old_run_func
    return old_run_func(self, *a, **kw)
  File "/usr/local/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 83, in _worker
    work_item.run()
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/site-packages/ddtrace/contrib/futures/threading.py", line 44, in _wrap_execution
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/channels/db.py", line 13, in thread_handler
    return super().thread_handler(loop, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/asgiref/sync.py", line 522, in thread_handler
    return func(*args, **kwargs)
  File "/usr/src/odm/papi/authentication.py", line 23, in authenticate_token
    return User.objects.filter(id=int(access_token['user_id'])).first(), token
  File "/usr/local/lib/python3.10/site-packages/cacheops/query.py", line 350, in first
    return self._no_monkey.first(self._clone().cache())
  File "/usr/local/lib/python3.10/site-packages/django/db/models/query.py", line 1057, in first
    for obj in queryset[:1]:
  File "/usr/local/lib/python3.10/site-packages/django/db/models/query.py", line 398, in __iter__
    self._fetch_all()
  File "/usr/local/lib/python3.10/site-packages/cacheops/query.py", line 260, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/usr/local/lib/python3.10/site-packages/django/db/models/query.py", line 91, in __iter__
    results = compiler.execute_sql(
  File "/usr/local/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1560, in execute_sql
    cursor = self.connection.cursor()
  File "/usr/local/lib/python3.10/site-packages/ddtrace/contrib/trace_utils.py", line 335, in wrapper
    return func(mod, pin, wrapped, instance, args, kwargs)
  File "/usr/local/lib/python3.10/site-packages/ddtrace/contrib/django/patch.py", line 144, in cursor
    cursor = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/django/db/backends/base/base.py", line 330, in cursor
    return self._cursor()
  File "/usr/local/lib/python3.10/site-packages/django/db/backends/base/base.py", line 306, in _cursor
    self.ensure_connection()
  File "/usr/local/lib/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection
    self.connect()
  File "/usr/local/lib/python3.10/site-packages/sentry_sdk/utils.py", line 1713, in runner
    return sentry_patched_function(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/sentry_sdk/integrations/django/__init__.py", line 689, in connect
    return real_connect(self)
  File "/usr/local/lib/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/django/db/backends/base/base.py", line 270, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/usr/local/lib/python3.10/site-packages/dj_db_conn_pool/backends/postgresql/mixins.py", line 18, in get_new_connection
    connection = super().get_new_connection(conn_params)
  File "/usr/local/lib/python3.10/site-packages/dj_db_conn_pool/core/mixins/core.py", line 58, in get_new_connection
    with pool_container.lock:
  File "/usr/local/lib/python3.10/site-packages/ddtrace/profiling/collector/_lock.py", line 241, in __enter__
    return self._acquire(self.__wrapped__.__enter__, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/ddtrace/profiling/collector/_lock.py", line 105, in _acquire
    return inner_func(*args, **kwargs)
@mabdinur mabdinur added the Profiling Continous Profling label Sep 5, 2024
@taegyunkim
Copy link
Contributor

Hi James, thanks for reporting would you mind running your application with few different environment variables?

  • DD_PROFILING_LOCK_ENABLED=False this turns off the lock profiler
  • DD_PROFILING_ENABLED=False turns off the profiler (stack, memory and lock)

@github-actions github-actions bot added the stale label Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Profiling Continous Profling stale
Projects
None yet
Development

No branches or pull requests

3 participants