Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up serialization and deserialization using Rust #258

Draft
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

thedrow
Copy link
Member

@thedrow thedrow commented Feb 26, 2019

Current performance advantage is unclear as I'm aiming for feature parity at the moment.
I have added some benchmarks which we will use later on to measure how we're doing.

@matusvalo
Copy link
Member

@thedrow this is interesting. I have recently done speed up of serialization using cython in pure mode. I haven't time to measure the performance but I can create WIP pull request to compare with your approach if you want.

@auvipy
Copy link
Member

auvipy commented Feb 27, 2019

@matusvalo this is what I was looking for on the internet lately. you could proceed with that

@matusvalo
Copy link
Member

@auvipy I will create a WIP PR for that so you review it. But, I need it rebase with master. Stay tuned.

@thedrow
Copy link
Member Author

thedrow commented Feb 27, 2019

@matusvalo If you'd like to collaborate, let's chat sometime soon.
Your profile does not include an email or Twitter or anything.
Is there a way to reach you?

@thedrow
Copy link
Member Author

thedrow commented Feb 27, 2019

It seems like a better approach would be to avoid crossing the Python barrier when deserializing.
We can do so using structview but that will require a more extensive rewrite.
The current code is far from optimal and there are many things to complete before considering performance.
In any case, this is not a doomed effort since we're all learning how PyO3 works and what are it's limitations.

@matusvalo
Copy link
Member

@thedrow I have sent you an e-mail directly. I am not much social on the internet so I don't have much accounts on social networks but if you want pick some possible chat and I can create account if needed.

@thedrow thedrow force-pushed the rust-serialization branch 2 times, most recently from ba49c9e to 7a0b59e Compare March 2, 2019 11:51
@thedrow
Copy link
Member Author

thedrow commented Mar 2, 2019

As of now, the Rust extension is compiled in CI!
I've also figured out how to raise exceptions.

I have a problem with strings inside arrays and tables.
@matusvalo Can you take a look?

@thedrow
Copy link
Member Author

thedrow commented Mar 2, 2019

Seems like there's a problem with the tox plugin I made for Python 3.4 and Python 3.5.

@thedrow
Copy link
Member Author

thedrow commented Mar 4, 2019

The only thing left besides fixing tox-pyo3, is a failing integration test.
Then we can proceed to optimizing performance.

@thedrow
Copy link
Member Author

thedrow commented Mar 4, 2019

Oh wait. That's not it.

@auvipy
Copy link
Member

auvipy commented Mar 4, 2019

well, what about rustpython?

@thedrow
Copy link
Member Author

thedrow commented Mar 4, 2019

That's completely different. It's a Python interpreter like PyPy and CPython.
It's also experimental.

@thedrow
Copy link
Member Author

thedrow commented Mar 4, 2019

This doesn't happen to me locally:
https://travis-ci.org/celery/py-amqp/jobs/501359009#L672
Strange.

@michael-k
Copy link
Contributor

Sentry also uses Rust in combination with Python. Here are a few resources if you're not aware of that:

I'm not affiliated with them. I just saw one of the talks once and though it could help. :)

@thedrow
Copy link
Member Author

thedrow commented Mar 4, 2019

So far this is a HUGE win:

--------------------------------------------------------------------------------------------------------- benchmark 'bitmaps': 8 tests ---------------------------------------------------------------------------------------------------------
Name (time in ns)                                             Min                    Max                  Mean                StdDev                Median                 IQR              Outliers  OPS (Kops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_deserialize_bitmap[8 bits | Rust Extension]         546.4499 (1.0)       2,927.8002 (1.0)        604.6150 (1.0)        169.6272 (1.0)        561.3001 (1.0)       20.1999 (1.81)      4063;6090    1,653.9450 (1.0)       82974          20
test_deserialize_bitmap[16 bits | Rust Extension]        607.7000 (1.11)      4,030.8001 (1.38)       672.2640 (1.11)       183.1508 (1.08)       624.9502 (1.11)      11.1748 (1.0)      4372;13697    1,487.5109 (0.90)      78260          20
test_deserialize_bitmap[4 bits | Rust Extension]         642.0014 (1.17)     20,974.0028 (7.16)       771.1376 (1.28)       847.4251 (5.00)       717.0020 (1.28)      27.9979 (2.51)       370;4615    1,296.7854 (0.78)     107748           1
test_deserialize_bitmap[128 bits | Rust Extension]     1,887.9946 (3.46)     26,020.9999 (8.89)     2,197.7211 (3.63)     1,563.1523 (9.22)     2,013.9996 (3.59)      97.9999 (8.77)      1589;3346      455.0168 (0.28)     146564           1
test_deserialize_bitmap[4 bits | Pure Python]          2,407.0032 (4.40)     29,316.0047 (10.01)    2,783.5932 (4.60)     1,663.3030 (9.81)     2,585.9990 (4.61)     138.0031 (12.35)     1166;2053      359.2479 (0.22)      97590           1
test_deserialize_bitmap[8 bits | Pure Python]          2,474.0002 (4.53)     28,490.9984 (9.73)     2,880.6455 (4.76)     1,793.1283 (10.57)    2,638.9971 (4.70)     127.9986 (11.45)     1807;2901      347.1444 (0.21)      98737           1
test_deserialize_bitmap[16 bits | Pure Python]         2,483.9974 (4.55)     28,920.0034 (9.88)     2,930.1679 (4.85)     1,893.7634 (11.16)    2,678.0035 (4.77)     104.9993 (9.40)      1596;4228      341.2774 (0.21)     104548           1
test_deserialize_bitmap[128 bits | Pure Python]        3,743.0000 (6.85)     54,964.0026 (18.77)    4,288.9784 (7.09)     1,997.7935 (11.78)    3,988.0033 (7.10)     123.0001 (11.01)     1709;5869      233.1558 (0.14)      80084           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------- benchmark 'mixed': 6 tests --------------------------------------------------------------------------------------------
Name (time in us)                                     Min                Max              Mean            StdDev            Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_deserialize[11 elements | Rust Extension]     1.1860 (1.0)      31.6080 (1.02)     1.4840 (1.0)      1.6157 (1.0)      1.3290 (1.0)      0.0450 (1.0)     1400;9540      673.8390 (1.0)      160540           1
test_deserialize[22 elements | Rust Extension]     1.8380 (1.55)     31.1340 (1.0)      2.1830 (1.47)     1.7499 (1.08)     2.0190 (1.52)     0.1200 (2.67)      361;903      458.0775 (0.68)      33836           1
test_deserialize[33 elements | Rust Extension]     2.2360 (1.89)     37.7780 (1.21)     2.6835 (1.81)     2.0558 (1.27)     2.3960 (1.80)     0.1310 (2.91)    2320;3559      372.6512 (0.55)      85441           1
test_deserialize[11 elements | Pure Python]        3.1410 (2.65)     37.7710 (1.21)     3.7259 (2.51)     2.4016 (1.49)     3.3580 (2.53)     0.1790 (3.98)    1875;2821      268.3907 (0.40)      80704           1
test_deserialize[22 elements | Pure Python]        3.7990 (3.20)     39.7260 (1.28)     4.5275 (3.05)     2.6540 (1.64)     4.0650 (3.06)     0.2320 (5.16)    2707;3602      220.8706 (0.33)      75251           1
test_deserialize[33 elements | Pure Python]        4.1780 (3.52)     31.6110 (1.02)     4.9647 (3.35)     2.7812 (1.72)     4.4180 (3.32)     0.2470 (5.49)    3302;4117      201.4235 (0.30)      75341           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------- benchmark 'timestamps': 6 tests ---------------------------------------------------------------------------------------------------------
Name (time in ns)                                                  Min                    Max                  Mean                StdDev                Median                 IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_deserialize_timestamp[1 elements | Rust Extension]       972.0025 (1.0)      23,231.0049 (1.0)      1,186.1733 (1.0)      1,131.3413 (1.0)      1,104.9997 (1.0)       82.0000 (1.0)      558;1761      843.0471 (1.0)      100382           1
test_deserialize_timestamp[2 elements | Rust Extension]     1,278.0001 (1.31)     32,967.0002 (1.42)     1,699.2989 (1.43)     1,747.8630 (1.54)     1,503.9986 (1.36)      92.0045 (1.12)    1558;7754      588.4780 (0.70)     143164           1
test_deserialize_timestamp[4 elements | Rust Extension]     1,906.0026 (1.96)     52,118.0045 (2.24)     2,294.1612 (1.93)     1,985.9009 (1.76)     2,036.0021 (1.84)     123.9969 (1.51)    1276;4580      435.8892 (0.52)      99632           1
test_deserialize_timestamp[1 elements | Pure Python]        2,889.9958 (2.97)     53,932.9994 (2.32)     3,492.7481 (2.94)     2,174.4417 (1.92)     3,118.0025 (2.82)     148.9971 (1.82)    2009;3158      286.3075 (0.34)      61539           1
test_deserialize_timestamp[2 elements | Pure Python]        3,257.9992 (3.35)     53,288.0003 (2.29)     4,074.1629 (3.43)     2,796.9133 (2.47)     3,610.0028 (3.27)     236.9998 (2.89)    2089;6657      245.4492 (0.29)      81064           1
test_deserialize_timestamp[4 elements | Pure Python]        3,825.0000 (3.94)     54,976.0007 (2.37)     4,660.7399 (3.93)     2,979.5412 (2.63)     4,109.9993 (3.72)     264.0081 (3.22)    2375;7673      214.5582 (0.25)      82516           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

This is a release build. A development may be slower.
You can check the numbers in CI since we run the benchmarks there to ensure we haven't broken them.

@matusvalo
Copy link
Member

@thedrow this is great result. What test did you run? I would like to repeat it also against cythonized version just for curiosity. Is it able to build against windows? I saw you had issues with integration tests. Now fixed? Do you need a help?

@thedrow
Copy link
Member Author

thedrow commented Mar 5, 2019

No, the integration tests are working correctly. The only thing left is the exception message formatting error on Python 2.
At this point, I'd rather avoid Cython and C altogether since Rust and PyO3 has proven themselves and Rust has significant advantages over Cython.
We can reuse parts of the Rust implementation in CFFI for PyPy, something we can't do in Cython.
Rust also prevents data races and segfaults as it is memory safe and PyO3 exposes a safe API to CPython which won't cause crashes in case of errors.
I did not try Windows and I'm not sure if PyO3 supports it officially.
Go ahead and try. Just create a new branch out of mine and write the relevant commands to run them in AppVeyor.

@auvipy auvipy self-requested a review February 2, 2020 00:59
@matusvalo matusvalo mentioned this pull request Mar 30, 2020
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants