-
Notifications
You must be signed in to change notification settings - Fork 309
perf: optimize memory usage in denoised-rows query by conditionally including processedRows #1177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ncluding processedRows
🦋 Changeset detectedLatest commit: 10560c9 The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Stably Runner - Test Suite - 'Smoke Test'Test Suite Run Result: 🔴 Failure (1/4 tests failed) [dashboard] Failed Tests: This comment was generated from stably-runner-action |
} | ||
return undefined; | ||
}, | ||
gcTime: isLive ? ms('30s') : ms('5m'), // more aggressive gc for live data, since it can end up holding lots of data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this is neat. theoretically we don't need to keep the old page if live tail is enabled
denoiseResults, | ||
// Only include processed rows if denoising is enabled | ||
// This helps prevent the queryKey from getting extremely large | ||
// and causing memory issues, when it's not used. | ||
...(denoiseResults ? [processedRows] : []), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m scratching my head as to why this is relevant. even if denoise is disabled, react-query still tries to cache the key that blows up the memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, exactly!
You'd think the enabled
flag being false would turn all this off, I can file a bug in the tanstack query library when I get back from vacation (and see if one already exists) if we want to help provide feedback.
denoiseResults, | ||
// Only include processed rows if denoising is enabled | ||
// This helps prevent the queryKey from getting extremely large | ||
// and causing memory issues, when it's not used. | ||
...(denoiseResults ? [processedRows] : []), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perf nit: Instead of passing all the rows, one idea is to generate an ID for processedRows. For example, we can take a fixed step to sample the rows and then compute a hash from those samples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good idea, I spent a large portion of time messing around with their queryKeyHashFn
but I couldn't get it to have a clean & fast solution on the full processedRows dataset, but if we're sampling the results and generating a hash on those then I can definitely improve the perf on that front. This should reduce memory when denoising is enabled.
Let me take a look at this next week when I get back from vacation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to go too deep down optimizing this path - my preference is if scaling issues continue to come up for this feature, we re-evaluate pushing down denoising into the clickhouse query itself as opposed to incremental improvements to the current implementation.
When the searching row limits is set very high (ex the max of 100k) the app quickly consumes all available memory and crashes.
This adds some improvements to help mitigate the problem:
queryKey
is generating a ton of extra entries every time theprocessedRows
changes (which is every 5s when in live mode). The queryKey and result is cached regardless of if enabled is true or false. The base hashFn strategy is to stringify the objects which creates a very large string to be stored in memory. I tried to fix this by providing a customqueryKeyHashFn
touseQuery
but it was too slow, and the faster browser based hashing fns return a promise which isn't supported byuseQuery
at this time. The easiest solution I found was to short circuit the hash generation if we are not denoising.gcTime
- We already setgcTime
inuseOffsetPaginatedQuery
so I added that field here too, this helps keep the memory usage lower while denoising rows (but the memory still is much higher).The app still uses very high memory usage, just from the sheer number of rows being captured and processed, but it doesn't crash anymore. There is definitely further optimizations we could make to reduce this. One solution that comes to mind is storing a hash/unique id of each row server side before sending to the client, then our app can leverage this key instead of a stringified object.
Before (after 1 min):

After (after 5 mins):

Fixes: HDX-2409