Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Async Django #1394

Open
wants to merge 38 commits into
base: main
Choose a base branch
from
Open

Conversation

jaw9c
Copy link
Collaborator

@jaw9c jaw9c commented Apr 2, 2023

Feedback requested.

This PR adds support for Django async views. The general strategy follows the same principal as the base graphene library when supporting async. It implements the following things:

  • Django based resolvers now check if they are being executed in an async thread and if so wrap any ORM queries in sync_to_async
  • Resolvers now follow the same principals as the base graphene lib to handle resolving async and sync agnostically.
  • A new async view - credit to this PR

The current progress is as followed:

  • Get the library working for all custom fields in graphene-django when running under async
  • Duplicate the existing tests and edit them to have an async version

Other thoughts:

  • It's annoying to wrap all resolvers in sync_to_async if they are performing ORM ops. It would be nice to detect this error and throw a warning on existing codebases.

@jaw9c
Copy link
Collaborator Author

jaw9c commented May 5, 2023

Progress update: All the fields resolvers are supporting both being executed in the async context and wrapping django sync code into a sync context. I've made the assumption that any resolving of DjangoXType/Field fields will need to be executed in a sync context which I think is pretty safe. I've started running through all the tests and adding an extra assertion that running in an async executor produces the same result as the sync one.

Currently people switching to using the async view will need to wrap all their top level resolvers (those that are directly on the Query class passed into the schema) in @sync_to_async if they run sync code. If someone has an idea of how to prevent this that would be great, but can't seem to wrap my head around that!

Next steps are to run through all the test files and adding the helper.

It would great if people could test this either on their production projects (speciically when running under ASGI), as 'm interested in how swapping in/out of the sync context affects the performance of the execution.

@firaskafri
Copy link
Collaborator

Progress update: All the fields resolvers are supporting both being executed in the async context and wrapping django sync code into a sync context. I've made the assumption that any resolving of DjangoXType/Field fields will need to be executed in a sync context which I think is pretty safe. I've started running through all the tests and adding an extra assertion that running in an async executor produces the same result as the sync one.

Currently people switching to using the async view will need to wrap all their top level resolvers (those that are directly on the Query class passed into the schema) in @sync_to_async if they run sync code. If someone has an idea of how to prevent this that would be great, but can't seem to wrap my head around that!

Next steps are to run through all the test files and adding the helper.

It would great if people could test this either on their production projects (speciically when running under ASGI), as 'm interested in how swapping in/out of the sync context affects the performance of the execution.

This is great news! Would discuss with my team to try this on production once its ready!

@alexandrubese
Copy link

alexandrubese commented Jan 9, 2024

Migrating to v3 version, without making sure dataloaders and everything works, so that people need to implement new libraries (check @superlevure comments above)

@dima-kov
Copy link

dima-kov commented Jan 9, 2024

@alexandrubese so what's the way?

@alexandrubese
Copy link

alexandrubese commented Jan 9, 2024

@dima-kov
Use an older version of django-graphene where things work?

Which might not be ideal because you’re basically just adding “legacy code” that you will need to update once they will make v3 work (although it’s been almost a year this is discussed, I don’t know when this will work and they will fix dataloaders)

I don’t jnderstand the intricacies of the v3 update, maybe they had to do it, that’s why it was maybe forced ?

Or use a different library

PS: I don’t know why the Django team didn’t work to add proper GraphQL support.

Spring, .Net MVC and many other “web frameworks” did it.

@dima-kov
Copy link

dima-kov commented Feb 1, 2024

Regarding the comment from @superlevure:

I agree with releasing AsyncGraphQLView.

We are almost there; the only remaining tasks are performance benchmarks and documentation updates. This implies that most of the work is already done, and only a few percentages are left.

@jaw9c, it seems you might be short on time for this. I'm willing to contribute efforts at this point.

@superlevure, to be transparent, I haven't conducted performance testing before, but I'm attempting it now. Any suggestions on how to approach this appropriately would be highly appreciated.

charn added a commit to City-of-Helsinki/open-city-profile that referenced this pull request Feb 2, 2024
Data loaders that exist are not fully compatible with new versions of
graphene and graphene-django. DjangoConnectionField doesn't seem to handle
loaders correctly and instead return errors like:

"Cannot return null for non-nullable field EmailNodeConnection.edges."

So for now, data loaders will be disabled for this field type.

Use graphql-sync-dataloaders to make other types of fields work with
data loaders.

Some GitHub issues for reference:
- graphql-python/graphene-django#1394
- graphql-python/graphene-django#1263
- graphql-python/graphene-django#1425

Refs: HP-2082
@dima-kov
Copy link

dima-kov commented Feb 2, 2024

Benchmarks:
16.547 seconds vs 83.194 seconds for i/o bound queries. 5x faster. async wins

Details are here: https://github.com/dima-kov/django-graphene-benchmarks?tab=readme-ov-file#tldr

Async:

Concurrency Level:      10
Time taken for tests:   16.547 seconds
Complete requests:      1000

vs

Sync

Concurrency Level:      10
Time taken for tests:   83.194 seconds
Complete requests:      1000

@superlevure are we fine now? What should be next steps to make this public?

Copy link

@dima-kov dima-kov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@kamilglod
Copy link

@dima-kov shouldn't you run sync benchmark with 10 workers instead of 4? I know that measuring sync vs async is hard but with 10 workers we would have consistent number of requests that are handled in the same time by the server so we should get actual comparison. I guess we should get similar results which is fine as the biggest benefit of using async is less resources used.

@dima-kov
Copy link

dima-kov commented Feb 3, 2024

hmm, that would eats lots of memory.
from docs:

Gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second. Gunicorn relies on the operating system to provide all of the load balancing when handling requests. Generally we recommend (2 x $num_cores) + 1 as the number of workers to start off with.

I'm running on 4 cores machine, so trying gunicorn with 9 workers I've got only slightly better result:

Concurrency Level:      10
Time taken for tests:   80.785 seconds
Complete requests:      1000
Failed requests:        0

and memory usage was 9 processes ~80mb each = 720. The bottleneck here might be sqlite3 database.


vs async:

Concurrency Level:      10
Time taken for tests:   17.586 seconds
Complete requests:      1000
Failed requests:        0

and mem usage was 1 process 60mb.

@kamilglod
Copy link

Sqlite docs says that concurrent writes are locked, but reads should be fine so I think it's not a problem with sqlite itself.

Maybe it's because of the differences in sync and async resolvers? In async you're fetching related octopus_type, in sync not.
https://github.com/dima-kov/django-graphene-benchmarks/blob/main/project/api/schema/queries.py#L43-L51
https://github.com/dima-kov/django-graphene-benchmarks/blob/main/project/api/schema_async/queries.py#L48-L64

@dima-kov
Copy link

dima-kov commented Feb 7, 2024

oh, shame on me. so here is fixed version comparison:

Comparison

Sync Async Sync Async
Requests 1000 1000 1000 1000
Concurrency 100 100 100 100
Processes 9 1 4 1
Threads per proc 1 100 1 100
Mem ~720mb ~80mb ~320mb ~80mb
Time 23.384s 13.411s 24.465s 13.670s

@dima-kov
Copy link

dima-kov commented Feb 7, 2024

Now, it can be asserted that the releasing of the Async version will result in a twofold acceleration of I/O endpoints.

Moreover, take a look on this much fair (same resources) comparison:

Sync Async
Requests 1000 1000
Concurrency 100 100
Processes 9 9
Threads per process 1 1-30
Mem ~720mb ~720mb
Time 23.384s 4.719s

We encounter a tradeoff of x5 with identical resource utilization, excluding threads!

@dima-kov
Copy link

dima-kov commented Feb 8, 2024

Guys, we really need this (having async resolvers). Please tell us how can we help to make it public., I do not want to start using it unsure this is merged in main.

Or at least let us know your are going to release this, but we are missing: 1,2,3...

@superlevure
Copy link
Collaborator

Thanks for the benchmarks, I'll review the PR again tomorrow. Note that I don't have merge rights on this repo, we'll need a review from one of @firaskafri, @sjdemartini, @kiendang (or others)

@firaskafri
Copy link
Collaborator

firaskafri commented Feb 9, 2024

@superlevure i think it is good to go as soon as we clarify the docs
What do you think @kiendang @jaw9c @sjdemartini

@superlevure
Copy link
Collaborator

@dima-kov I had a look at your benchmarks and I have few remarks.

First, it looks like you're comparing the sync and async resolvers on the same branch of graphene-django (this PR's branch). I believe it would actually be more fair to compare the async / sync versions of this branch and the sync version of graphene-django's main branch since this PR also affects sync resolvers and the point of the benchmarks is also to make sure no performance penalty is introduced to already existing code.

Second, I noticed the sync resolver is returning 500 objects while the async version is only returning 10 objects.

I took the liberty of pushing a PR to your repo that covers those point, as well as setting up a docker based environment and postgres as a DB to be closer to a real life use case.

I obtain the following results:

# Sync version [main]
Concurrency Level:      100
Time taken for tests:   45.517 seconds
Complete requests:      1000
Failed requests:        0
Non-2xx responses:      1000
Total transferred:      60232000 bytes
Total body sent:        238000
HTML transferred:       59963000 bytes
Requests per second:    21.97 [#/sec] (mean)
Time per request:       4551.739 [ms] (mean)
Time per request:       45.517 [ms] (mean, across all concurrent requests)

# Sync version [this branch]
Concurrency Level:      100
Time taken for tests:   203.300 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      45805000 bytes
Total body sent:        244000
HTML transferred:       45382000 bytes
Requests per second:    4.92 [#/sec] (mean)
Time per request:       20330.019 [ms] (mean)
Time per request:       203.300 [ms] (mean, across all concurrent requests)

# Async version [this branch]
# didn't run till the end, see below

Data tends to show a huge perf hit for the sync version of this PR. Concerning the async version, I run into the following DB error which which makes all requests fail until I restart the DB:

2024-02-11 16:31:39.310 UTC [4579] FATAL:  sorry, too many clients already

I suspect the async version is leaving DB connections opened somewhere (Postgres max_connections settings is set at 100 which is the concurrency level used in the benchmark)

I'll continue to play with the PR a bit tonight to make sure there's nothing wrong with my setup, but I'm curious if others can reproduce similar results.

@dima-kov
Copy link

@superlevure thank you for looking into it! I missed that part

@dima-kov
Copy link

dima-kov commented Feb 11, 2024

regarding the results you've got: hmm, thats really strange. Both sync and async (100 requests concurrently and 1k requests in total) showed me +- 20s. 100+s is smth unbelievable for me.
Maybe the thing is with dockerization.

I'm playing with main version now.
Main:

Concurrency Level:      100
Time taken for tests:   23.706 seconds
Complete requests:      1000
Failed requests:        0

@dolgidmi
Copy link

dolgidmi commented Mar 4, 2024

Data tends to show a huge perf hit for the sync version of this PR. Concerning the async version, I run into the following DB error which which makes all requests fail until I restart the DB:

2024-02-11 16:31:39.310 UTC [4579] FATAL:  sorry, too many clients already

I suspect the async version is leaving DB connections opened somewhere (Postgres max_connections settings is set at 100 which is the concurrency level used in the benchmark)

Django currently doesn't support persistent connections for ASGI https://code.djangoproject.com/ticket/33497

@henadzit
Copy link

Thank you for all your work! How can we help to get it merged?

@dima-kov
Copy link

dima-kov commented Jul 17, 2024

Data tends to show a huge perf hit for the sync version of this PR. Concerning the async version, I run into the following DB error which which makes all requests fail until I restart the DB:

2024-02-11 16:31:39.310 UTC [4579] FATAL:  sorry, too many clients already

I suspect the async version is leaving DB connections opened somewhere (Postgres max_connections settings is set at 100 which is the concurrency level used in the benchmark)

Django currently doesn't support persistent connections for ASGI https://code.djangoproject.com/ticket/33497

For those who use any external connection pooler like pgbouncer or AWS RDS Connection Pooling, this will not be a problem.

https://docs.djangoproject.com/en/5.1/releases/5.1/#postgresql-connection-pools

django 5.1 will support connection pool and this might be an opportunity to merge the PR

@pcraciunoiu
Copy link
Contributor

I was getting an error that the view is not async on the latest Django release

Based on this, there must be an async method or the property returns false. As a quick fix, I pushed this classproperty to be true:

UpliftAgency@cc665df

Feel free to add that in.

@jaw9c
Copy link
Collaborator Author

jaw9c commented Aug 1, 2024

Apologies for the radio silence on this thread. Appreciate the feedback and efforts from everyone on this PR - in particular the benchmarking.

A fair few months ago, we rolled this into production at my work, utilising the new AsyncGraphQLView but found that we very quickly maxed out the DB connection pool. Since then we had to put the efforts on ice in lieu of making sure the startup was default alive... We've been using this branch in production since my last post, server under ASGI, but not using the AsyncGraphQLView.

I think this is safe to merge, but with an experimental warning for those willing to test and use the AsyncGraphQLView. (Perhaps we rename it to experimental_AsyncGraphQLView` or log a warning? Just to make sure people understand the DB connections point!

I've been carefully watching the movements by the Django team on making the ORM fully async, this PR is the latest, which is really encouraging!

@henadzit
Copy link

django 5.1 will support connection pool and this might be an opportunity to merge the PR

Looks like Django 5.1 got released!

@chris-stetter
Copy link

Are there plans to publish a tagged release? Happy to support if anything is missing.

@superlevure
Copy link
Collaborator

[..]
I think this is safe to merge, but with an experimental warning for those willing to test and use the AsyncGraphQLView. (Perhaps we rename it to experimental_AsyncGraphQLView` or log a warning? Just to make sure people understand the DB connections point!

[..]

@jaw9c, do you think it's safe to merge even though my benchmarks showed this PR degrades performance for sync resolvers?

@vt-rc
Copy link

vt-rc commented Nov 7, 2024

any luck with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.