-
Notifications
You must be signed in to change notification settings - Fork 20
Add a "shrink-if-excessive" API to Vec
#553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just concerning implementation: Should it not instead shrink to the smallest possible power of two capacity? I think "shrink to |
I don't think so. If I happen to have a capacity-257 vector, I don't want it to shrink to 256. That's a waste of cycles. That said, I'd certainly like to -- like in
The initial implementation I'd propose here is something along the lines of if (v.cap() - v.len()) > cmp::max(v.len(), RawVec<T>::MIN_NON_ZERO_CAP) which, yes, is an inverse of the growth factor. (The growth factor is doubling, not rounding up to a power of two.) |
I think users interested in using this function would want to control the strategy or load factor rather than trust the implementation choice. Something like this with a float, integer ratio, or strategy enum argument type. fn shrink_with_factor(&mut self, load_factor: f32) -> bool; Users may want to know if the resize operation was executed so the function could return |
I agree with the problem statement and motivation. But it would be unfortunate to introduce a new easy way to be accidentally quadratic. The “beware of calling this in a loop that adds items” advice rules out some valid use cases, and is not essential for bounding excess capacity. I think it’s desirable and easily possible to keep the same amortized time complexity for basically any usage pattern — even if you call Basically, just choose the threshold (function) for when to shrink the capacity so that there’s always Θ(cap) distance between the length right after “normal” (not requested via reserve) growth and the length where it’ll shrink again. For a fixed growth factor X, that means don’t shrink until |
I don't think so, because if you want to control it exactly then you already have all the functionality you need, between To me, the point of this is that it's the obvious "just do this after a collect or retain that you're not going to |
I think we have to be careful here and implement some hysteresis - making the shrinking a bit less aggressive than the growth logic - so that the capacity wouldn't oscillate between 2 states around each breaking point. TC noted in today's libs meeting that he'd call this after each work item to make the memory footprint more tightly follow what's needed, to avoid lingering bloat. So that would essentially be calling it in a loop, albeit an outer loop.
The wording doesn't seem quite right. If you only add items to a vec and don't remove anything then shrinking should be a noop and provide the same amortized O(1) behavior. So the effective behavior depends on the insert/remove/shrink patterns and I don't think any of those would be O(n²), just decaying from O(1) to O(n)? |
Agreed, can someone produce an example case where this would become accidentally quadratic? |
By "quadratic" in the description I meant that the overall algorithm would be quadratic, not So something like a loop that does |
But in that case, it's not quadratic overall - the shrinking would happen once at most, the rest of the time the "capacity double length" condition is not met for the actual shrink. You would have to be pretty creative, maybe something like "pop length/2+1 times, shrink, push" repeatedly. If we want to defend against something like that, we'd want to leave some extra space behind sometimes. One option is to shrink to the smallest possible power of two. In many cases the growth of capacity follows powers of two thanks to the 2x growth factor, so the shrinking would reflect the growth. |
what happens when you have for _ in 0..1000 {
v.shrink(); // now capacity == 16
v.push(1); // has to realloc so capacity == 32
v.pop(); // back where we started
} I think it might be better to shrink only if the capacity is more than twice the length, and not shrink if it's less or equal to twice the length, that way you avoid bad cases like the above. |
for comparison: |
I don't disagree. To be clear, I would propose something like this if (cap - len) > cmp::max(len, MIN_NON_ZERO_CAP) {
self.shrink_to(len.next_power_of_two());
} |
I really don't see why a power of two is special here. And that proposed implementation has the problem that len=129, cap=259 shrinks to 256, which is a waste -- it's silly to reallocate just to drop the capacity by about 1%, especially in an API that exists to avoid such unhelpful reallocations. We can add slack (to use the word finger trees use for why they have 1..=4 element digits instead of just 2..=3) without needing to privilege any particular thresholds. (Like the BigUint example that was given.)
Yes, that's why I used |
In the “it’s a collect that might have reduced an allocation” case, why not just use Edit: to be clear, the approach I’m emphasizing will also result in shrinking to 0 excess capacity whenever it does shrink. It just schedules the shrinks to avoid quadratic complexity in cases like the |
We discussed this during today's libs-API meeting. There was some skepticism whether we can provide a method with the right heuristics that would satisfy everyone's needs since those needs may be workload-dependent. There was also the question how common the problem is. So we'd like to know how often this comes up, what it is used for and which heuristics were used when this was implemented manually. |
one use case that may eventually switch to using this method:
|
After some more discussion we'd like to know whether the goal of adding the shrink method is a) to have some "just do the right thing" sort of convenience or whether it's about b) the issue that user code shouldn't rely on/try to match implementation details - the growth strategy - which we could also expose as something like |
It’s not clear the me that “limit excess capacity without becoming accidentally quadratic” can be implemented with just Also, keep in mind that some use cases want to do the “shrink if excessive” check very frequently, e.g., after every operation that may remove elements from the vector. A built-in implementation can make this quite efficient because it can combine knowledge of the growth strategy with the leeway it has in picking the exact threshold for shrinking. For example, if the growth strategy is “double capacity but grow by at least 4 elements” the shrink check could be something like |
Proposal
Problem statement
Stealing from steffahn in https://internals.rust-lang.org/t/vec-is-asymmetric-with-memory-handling/22449/31?u=scottmcm:
Motivating examples or use cases
After a
.collect()
or.retain(…)
, where you know you're not going to be pushing for a while so you'd like to not be wasting too much capacity, but aren't worried about it being exactly anything (since reallocating to save one or two items is probably a waste of time, and doesn't actually reduce memory usage).Having something to write after the warning in https://doc.rust-lang.org/std/vec/struct.Vec.html#impl-FromIterator%3CT%3E-for-Vec%3CT%3E, as something like "You can call the ______ method after doing the operation if you want to avoid excessive remaining capacity".
Solution sketch
Alternatives
shrink
might be too short a name for something that can still be very costly if misused, soshrink_to_reasonable
orshrink_if_profitable
or something might be better. But at the same time since it's less risky than a fullshrink_to_fit
, giving it a more convenient name might be better.Links and related work
rust-lang/rust#120091
https://internals.rust-lang.org/t/vec-is-asymmetric-with-memory-handling/22449/31?u=scottmcm
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
Second, if there's a concrete solution:
The text was updated successfully, but these errors were encountered: