Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Begin of atomic append facilities #10

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Begin of atomic append facilities #10

wants to merge 11 commits into from

Conversation

cehteh
Copy link
Contributor

@cehteh cehteh commented Sep 17, 2024

this is WIP I am still fleshing out some things. posting this early that you can see my intentions. Everything is still in flux.

You proposed 2 types one for non atomic and one for atomic access. Actually I need to do this in one type because i have some hybrid access. A non shared Vec can be mutated with &mut self, a shared Vec can be appended by a single thread from &self. When a &mut self is not available then .len() doing Relaxed ordering. This would be racy vs another thread that appends to the Vec but always memory safe when the append follows the contracts (only once thread append, length must be set after appending data). Clone does acquire semantic, since clone is already a expensive operation the cost is negligible and its the correct way when the atomic append is used.

A push_atomic/push_atomic_slice comes next.

Note: no rush: i said i want to use this in cowstr but that transition is a bit down in my queue.

@cehteh
Copy link
Contributor Author

cehteh commented Sep 17, 2024

so, I leave this now for review, probably for some weeks. This is still not completely finished, i will add a push_slice() and push_slice_atomic() method at least.

Whatever is finished is open for some bikeshedding about method names and whenever the atomic_append feature should be default or not. I don't have ARM hardware where it would make a measurable difference in performance. On x86 atomics are pretty free/nil and differences between with/without atomics are below the noise floor.
Because of this I decided to include the atomic_append in default. Even in embedded the majority of platforms nowadays support atomics.

On my side there is no urge to merge this yet. If this needs to be finished ping me.

@vadixidav
Copy link
Member

I see, so essentially len still has the same performance characteristics as the previous implementation? It looks like this still preserves the old behavior, just adding the new behavior for synchronization. I think we can stick to one type then. Thanks for the draft. I am looking forward to the final version being published and I can give a review then.

@cehteh
Copy link
Contributor Author

cehteh commented Sep 23, 2024

added reserve/reserve_exact/shrink_to/shrink_to_fit, reworked the resize_insert fn for that. The resize that can reservation of multiple elements is vital for adding slices efficiently (and i need it for other things). I see that i find time to add the slice things next when i find time.

* introduces a helper union to fix simplify alignment calculations
  no need for cmp and phantom data anymore
* add simple testcase that triggered the miri issue
* change end_ptr_atomic_mut(), using the rewritten start_ptr()
@cehteh cehteh marked this pull request as ready for review September 23, 2024 17:32
@cehteh
Copy link
Contributor Author

cehteh commented Sep 23, 2024

So, this concludes my work for now. Eventually other std::Vec mehtods/Traits (like Extend) could be added. Currently i have no urgent need for those. Integration into cowstr is kindof later in the queue i'll see if i miss some things then.

@cehteh
Copy link
Contributor Author

cehteh commented Dec 17, 2024

ping --- any news on my PR, do you intend to merge it?

@vadixidav
Copy link
Member

Sorry, I didn't realize it was out of draft. I will take a look at it later today. Thanks for the ping.

@vadixidav
Copy link
Member

I haven't forgotten, but yesterday was a bit busier than expected. I should be able to take care of this tonight.

@cehteh
Copy link
Contributor Author

cehteh commented Dec 18, 2024 via email

@cehteh cehteh mentioned this pull request Jan 10, 2025
// correct the len
let len_again = self.len_atomic_add_release(slice.len());
// in debug builds we check for races, the chance to catch these are still pretty minimal
#[cfg(debug_assertions)]
Copy link
Contributor Author

@cehteh cehteh Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[cfg(debug_assertions)] can be removed
may need #[allow(unused_variable)]

@vadixidav
Copy link
Member

Okay, today I will be getting to this. I will start reviewing now, but I will have to finish later today as I have something else.

/// Before incrementing the length of the vector, you must ensure that new elements are
/// properly initialized.
#[inline(always)]
unsafe fn len_atomic_add_release(&self, n: usize) -> usize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused by this method. I added the line about what it returns, but the Safety section is the same. From what it says, it is necessary for new elements to be initialized properly prior to this length being updated. Due to this, the only correct usage scenario where the original length could be different than the one you expected it to be is if two threads somehow coordinated the allocation of two new items in the vector, added their items, hit a barrier that blocks both of them from continuing until the items were added, then each subsequently atomically adds to the length in each thread. This is because if two threads are adding items, they must both synchronize the process of adding items to the vector, and therefore must already pre-agree upon what the length will ultimately be and what slot each thread will allocate into prior to incrementing the length to uphold the guarantee that looking inside the vector always results in valid initialized values. In this case there is no need for this function to return the size. In fact, it would make more sense for this to return nothing at all and instead be passed in the old size as an argument, panicking in the case that the returned value isn't equal to the expected old length.

My understanding might be incomplete though, but I am having a hard time understanding how this API would be used currently since the person adding the length must already be absolutely sure that all the items were added by all threads prior to it incrementing the length. Basically, two threads cant increment the length until some kind of synchronization has occurred to ensure the values were already added by the other threads into unique slots in the vector. The most likely use case would probably just be that there are N observers, but only 1 owner that can actually add items to the vector. In that case, I would suggest that we create an API to reflect that reality.

Let me know if I am misunderstanding how this is used.

Since we always point to a valid allocation we can use NonNull here.
This will benefit from the niche optimization:
 size_of::<HeaderVec<H,T>>() == size_of::<Option<HeaderVec<H,T>>>

Also adds a test to verfiy that HeaderVec are always lean and niche optimized.
@cehteh
Copy link
Contributor Author

cehteh commented Jan 20, 2025

Note: I am not happy with the name atomic_append this was prolly a bit rushed. Since the API it introduces only handles the length atomically. Pushes to immutable HeaderVecs need to properly synchronized still (thats why the atomic_push ops are unsafe). Any idea for a better name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants