Memory leak: bank_id as metric label causes unbounded OTel histogram growth (~3 GB after 15 days)

### Bug Description

`MetricsCollector.record_operation()` in `hindsight_api/metrics.py` (line ~333) includes `bank_id` as an attribute in OpenTelemetry histogram and counter recordings. Since `bank_id` is a per-user value (e.g., user-123), every unique user creates a permanent, never-evicted time series in the OTel SDK's in-memory aggregation buffers.

Over time this causes unbounded memory growth proportional to unique_users × operations × budgets × statuses.

### Steps to Reproduce

1. Run `hindsight-api` with default configuration (metrics enabled)
2. Issue recall/retain/reflect requests for multiple distinct `bank_id` values
3. Observe memory growth via `vmmap --summary <pid>` (macOS) or `/proc/<pid>/smaps` (Linux)

After 15 days of normal usage on a dev instance with ~50 users, we observed:

-  Physical footprint: 3.1 GB (peak 3.2 GB)
-  MALLOC_SMALL: 1.7 GB virtual, 1.6 GB dirty, 15.2 million allocations
-  RSS reported by ps was only ~15 MB because macOS compressed the allocations, which also caused PM2's `max_memory_restart` (RSS-based) to never trigger

**Root cause**

```
# hindsight_api/metrics.py, MetricsCollector.record_operation()
  attributes = {
      "operation": operation,
      "bank_id": bank_id,      # <-- HIGH CARDINALITY: one series per user
      "source": source,
      "tenant": _get_tenant(),
  }
```
The OTel SDK's ExplicitBucketHistogramAggregation stores a full bucket array per unique attribute set. With default 16 histogram buckets, each unique `{operation, bank_id, source, tenant, budget, max_tokens, success}` tuple allocates ~400 bytes that are never freed. The combinatorial explosion creates millions of allocations.

**Suggested fix**

Remove `bank_id` from metric attributes. It belongs in tracing spans (which are exported and evicted), not in metrics (which accumulate in process for the lifetime of the SDK).

```
  attributes = {
      "operation": operation,
      # bank_id removed — high-cardinality label causes unbounded memory growth
      "source": source,
      "tenant": _get_tenant(),
  }
```

If per bank observability is needed in metrics, perhaps consider a bounded approach like hashing bank_id into a small number of buckets (for example, `bank_id_bucket: str(hash(bank_id) % 64)`).

**Environment**

- hindsight-api-slim v0.4.18
- Python 3.13.3
- macOS 26.1 (ARM64)
- OpenTelemetry SDK (via opentelemetry-sdk)

**Workaround**

Patch hindsight_api/metrics.py locally to remove `bank_id` from the attributes dict in `record_operation()`.

### Expected Behavior

`hindsight-api` should maintain stable memory usage over time when serving a fixed number of users. The OTel metrics subsystem should use bounded, low cardinality labels so that memory consumption is proportional to the number of distinct metric dimensions (operation types, sources, statuses) instead of the number of distinct users.

### Actual Behavior

`MetricsCollector.record_operation()` includes `bank_id` (a per user value like user-123) as an OpenTelemetry metric attribute. Every unique user creates never evicted time series in the OTel SDK's in-memory histogram aggregation buffers. Memory grows linearly with the number of distinct users over the process lifetime.

### Version

0.4.18

### LLM Provider

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak: bank_id as metric label causes unbounded OTel histogram growth (~3 GB after 15 days) #850

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Version

LLM Provider

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory leak: bank_id as metric label causes unbounded OTel histogram growth (~3 GB after 15 days) #850

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Version

LLM Provider

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions