fix: path mapping lru cache grows indefinitely #239
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.



Fixes: #230
What was the problem/requirement? (What/Why)
We haven't seen this in the wild and while a pretty minimal risk for our usage (path mapping rules on machines that already requires lots of memory), unbounded lru_caches aren't generally great and will increase memory usage overtime
What was the solution? (How)
Add a limit to at least limit the unbounded growth. 100,000 is a LOT of path mapping rule applications. First entry is basically removed after the 100,001st unique path to map. As far as I understand, the actual memory footprint would vary based on the paths being returned from the function, since the args are hashed into a key. Back of napkin math for something like 256 character path (ie. reasonable) ends up being around 30MB (Python doesn't make this easy to find out).
path_mapping_rulescould probably just be capped to something like 128 (the default if maxsize is not specified) since there shouldn't be a reason to continually request that info.An ideal follow-up would track cache usage to see how much is actually being used and adjust. But it's super easy to add this large upperbound and move from there.
What is the impact of this change?
memory footprint of the adaptor will not grow indefinitely due to the lru_cache
How was this change tested?
hatch run testWas this change documented?
N/A
Is this a breaking change?
I'd consider no. Everything continues to work. If a user had a massive cache miss percentage increase from this then they'd see some slowdown where the client requests the path from the background adaptor via IPC. But at that point they were already missing the cache a lot unless it then went back to files they previously mapped.
Does this change impact security?
Nope
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.