fix: remove bank_id from OTel metric attributes to prevent unbounded memory growth#857
Open
octo-patch wants to merge 1 commit intovectorize-io:mainfrom
Conversation
…memory growth bank_id is a per-user high-cardinality label that caused unbounded OTel histogram growth in-process. Each unique bank_id creates a permanent time series in OTel's in-memory aggregation buffers, leading to ~3 GB memory growth after 15 days with ~50 users. Fixes vectorize-io#850
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #850
Problem
MetricsCollector.record_operation()includesbank_idas an attribute in OpenTelemetry histogram and counter recordings. Sincebank_idis a per-user value (e.g.,user-123), every unique user creates a permanent, never-evicted time series in the OTel SDK's in-memory aggregation buffers, causing unbounded memory growth proportional tounique_users × operations × budgets × statuses.After 15 days of normal usage with ~50 users, this was observed to cause ~3 GB physical memory footprint due to millions of never-freed allocations in
MALLOC_SMALL.Solution
Remove
bank_idfrom metric attributes inrecord_operation(). Thebank_idparameter is kept in the method signature for API compatibility, but is no longer added to the OTel attributes dict.bank_idbelongs in tracing spans (which are exported and evicted), not in metrics (which accumulate in process for the lifetime of the SDK).Updated the test assertion to verify
bank_idis NOT present in the attributes.Testing
test_metrics.pyto assert"bank_id" not in attributes