Skip to content

fix: correct KV cache memory stats for K/V metadata and fp16 baseline #53

Open
dipeshbabu wants to merge 1 commit intoTheTom:mainfrom
dipeshbabu:fix/kv-cache-memory-stats-accounting
Open

fix: correct KV cache memory stats for K/V metadata and fp16 baseline #53
dipeshbabu wants to merge 1 commit intoTheTom:mainfrom
dipeshbabu:fix/kv-cache-memory-stats-accounting

Conversation

@dipeshbabu
Copy link
Copy Markdown

Fix KVCacheCompressor.memory_stats() so it matches the actual storage layout used by compress().

The previous implementation undercounted memory in two ways:

  • it treated the original fp16 baseline as a single tensor instead of combined K+V
  • it omitted stored norms from the compressed accounting, including the V-side norm and one of the K-side norms

What changed

  • count original KV cache size as fp16 K + fp16 V
  • count K compressed storage as:
    • d * k_bits
    • vector_norm
    • residual_norm
  • count V compressed storage as:
    • d * v_bits
    • vector_norm

Tests

Added an exact regression test in tests/test_kv_cache.py that checks the byte math for:

  • head_dim=128
  • k_bits=3
  • v_bits=3
  • seq_len=1
  • num_layers=1
  • num_heads=1

Also verified the full KV cache test file passes.

@TheTom
Copy link
Copy Markdown
Owner

TheTom commented Apr 2, 2026

hey there. thank you for the contribution. i'll be getting to them. I apologize for the delay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants