Skip to content

Conversation

brandon-pereira
Copy link
Contributor

@brandon-pereira brandon-pereira commented Sep 17, 2025

When the searching row limits is set very high (ex the max of 100k) the app quickly consumes all available memory and crashes.

This adds some improvements to help mitigate the problem:

  1. QueryKey Issues - The queryKey is generating a ton of extra entries every time the processedRows changes (which is every 5s when in live mode). The queryKey and result is cached regardless of if enabled is true or false. The base hashFn strategy is to stringify the objects which creates a very large string to be stored in memory. I tried to fix this by providing a custom queryKeyHashFn to useQuery but it was too slow, and the faster browser based hashing fns return a promise which isn't supported by useQuery at this time. The easiest solution I found was to short circuit the hash generation if we are not denoising.
  2. Sync gcTime - We already set gcTime in useOffsetPaginatedQuery so I added that field here too, this helps keep the memory usage lower while denoising rows (but the memory still is much higher).

The app still uses very high memory usage, just from the sheer number of rows being captured and processed, but it doesn't crash anymore. There is definitely further optimizations we could make to reduce this. One solution that comes to mind is storing a hash/unique id of each row server side before sending to the client, then our app can leverage this key instead of a stringified object.

Before (after 1 min):
Screenshot 2025-09-17 at 4 05 59 PM

After (after 5 mins):
Screenshot 2025-09-17 at 3 52 23 PM

Fixes: HDX-2409

Copy link

changeset-bot bot commented Sep 17, 2025

🦋 Changeset detected

Latest commit: 10560c9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
@hyperdx/app Patch
@hyperdx/api Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link

vercel bot commented Sep 17, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
hyperdx-v2-oss-app Ready Ready Preview Comment Sep 17, 2025 10:12pm

Copy link
Contributor

github-actions bot commented Sep 17, 2025

Stably Runner - Test Suite - 'Smoke Test'

Test Suite Run Result: 🔴 Failure (1/4 tests failed) [dashboard]

Failed Tests:


This comment was generated from stably-runner-action

}
return undefined;
},
gcTime: isLive ? ms('30s') : ms('5m'), // more aggressive gc for live data, since it can end up holding lots of data
Copy link
Member

@wrn14897 wrn14897 Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this is neat. theoretically we don't need to keep the old page if live tail is enabled

Comment on lines +1271 to +1275
denoiseResults,
// Only include processed rows if denoising is enabled
// This helps prevent the queryKey from getting extremely large
// and causing memory issues, when it's not used.
...(denoiseResults ? [processedRows] : []),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m scratching my head as to why this is relevant. even if denoise is disabled, react-query still tries to cache the key that blows up the memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly!

You'd think the enabled flag being false would turn all this off, I can file a bug in the tanstack query library when I get back from vacation (and see if one already exists) if we want to help provide feedback.

Comment on lines +1271 to +1275
denoiseResults,
// Only include processed rows if denoising is enabled
// This helps prevent the queryKey from getting extremely large
// and causing memory issues, when it's not used.
...(denoiseResults ? [processedRows] : []),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perf nit: Instead of passing all the rows, one idea is to generate an ID for processedRows. For example, we can take a fixed step to sample the rows and then compute a hash from those samples

Copy link
Contributor Author

@brandon-pereira brandon-pereira Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good idea, I spent a large portion of time messing around with their queryKeyHashFn but I couldn't get it to have a clean & fast solution on the full processedRows dataset, but if we're sampling the results and generating a hash on those then I can definitely improve the perf on that front. This should reduce memory when denoising is enabled.

Let me take a look at this next week when I get back from vacation!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to go too deep down optimizing this path - my preference is if scaling issues continue to come up for this feature, we re-evaluate pushing down denoising into the clickhouse query itself as opposed to incremental improvements to the current implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants