-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make list slightly less embarrassingly slow #515
Conversation
Wow — thank you! Excellent idea and fantastic results! I'm really looking forward to this. I recently played a lot with #73 and that is going to become so much faster. Can we move I plan to do a more detailed review soon. |
Done ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the updates and explanations.
Fantastic work!
Make list faster in some cases.
Part of #448
On my computer, running:
Take 42s on main vs 0.4s on my branch.
I don't think this approach is general enough, though. Having pattern matching would make performant numbat code easier to write.
How does it works?
I replaced the
NumbatList
type with a customArcListView
type that shares its allocation with all other list referring to the same allocation.It hold a reference counted ptr to the allocation + the bounds accessible by the allocation:
Then, doing a
tail
is a matter of advancing the starting view by one.head
simply returns the first element.And
push_front
check if he's the only owner of the allocation, if it's the case it never allocate (this is always the case while building the list for example so its important).if it's not the only owner of the allocation, it clones the allocation entirely. We could probably do something smarter here with a chained list or something, but since it yields pretty good results, I didn't try harder.
We now need to take extra care while implementing PartialEq
EDIT: While playing with list a little bit I noticed that the following code was still taking around 15s:
Which is expected considering the number of operations we're doing to push something at the end of a list even though we internally uses a vecdeque.
By providing an ffi
cons_end
function I went from 15s to 0.002s