Pomai Cache v1.1.0 transforms the project from a Redis-compatible KV cache into a purpose-built AI-first cache for inference pipelines. This release adds semantic similarity search, pipeline-aware cascade invalidation, token-economics-aware eviction, streaming response caching, a compression/quantization engine, a Python SDK, and comprehensive benchmarks -- all implemented as zero-dependency native C++ with no external library requirements.

New Features

1. Semantic Similarity Cache

VectorIndex with brute-force ANN search supporting Cosine, L2, and DotProduct distance metrics
SSE2 SIMD-accelerated distance computation (~0.7 GFLOP/s single-thread)
Int8 vector quantization with 4x memory reduction (MAE < 0.0005/dim)
Float16 quantization with 2x memory reduction (MAE < 0.00003/dim)
New commands: AI.SIM.PUT, AI.SIM.GET
Enables cache hits on paraphrased prompts -- benchmarks show 87-100% hit rate where exact-match caches get 0%

2. Pipeline-Aware Cascade Invalidation

DepGraph (directed acyclic graph) tracks parent-child artifact dependencies
Invalidating a parent embedding automatically cascades to all downstream responses, rerank buffers, and RAG chunks
New commands: AI.PUT ... DEPENDS_ON parent1 parent2, AI.INVALIDATE CASCADE <key>
Prevents stale derived artifacts from surviving source invalidation

3. Token-Economics-Aware Eviction

Extended ArtifactMeta with inference_tokens, inference_latency_ms, dollar_cost
PomaiCostPolicy now factors inference cost into eviction benefit scoring
Budget management: set a $/hour spend cap and the cache prioritizes high-cost artifacts
New commands: AI.COST.REPORT, AI.BUDGET <dollars_per_hour>
Cost tracking reports total dollars saved, tokens saved, and latency saved

4. Streaming & Chunked Response Caching

Incrementally store LLM streaming responses as they arrive token-by-token
Retrieve partial or completed streams
New commands: AI.STREAM.BEGIN, AI.STREAM.APPEND, AI.STREAM.END, AI.STREAM.GET
Prevents re-inference when identical streaming requests arrive concurrently

5. Compression & Quantization Engine

CompressionEngine with Run-Length Encoding (RLE) and Delta compression
Auto-selects best compression strategy per payload
Float32-to-Float16 and Float32-to-Int8 embedding quantization
Prefix deduplication for prompt families
Compression ratio tracked in AI.STATS

6. Python SDK (`pomai-cache`)

Synchronous (PomaiCache) and async (AsyncPomaiCache) clients
Full coverage of all AI-first commands: sim_put, sim_get, stream_begin/append/end/get, cost_report, budget, invalidate_cascade
Native NumPy and PyTorch vector conversion
@memoize decorator for transparent function-level caching
OpenTelemetry integration for distributed tracing
Installable via pip: pip install pomai-cache

Ported & Rebranded Data Structures (from Dragonfly)

These were rewritten from scratch as zero-dependency Pomai Cache native code:

BloomFilter / ScalableBloomFilter -- probabilistic membership testing for content-based deduplication fast-paths
CountMinSketch / MultiWindowSketch -- space-efficient frequency estimation integrated into the eviction policy's p_reuse calculation
FNV-1a hashing -- custom hash function replacing all external hash dependencies

The third_party/dragonfly directory has been removed. All functionality is native.

Benchmarks

New: Vector Cache Benchmark (`pomai_cache_vector_bench`)

Comprehensive benchmark suite comparing Pomai Cache against Redis+RediSearch, Milvus, Qdrant, Weaviate, and Pinecone across:

Metric	dim128 (1K)	dim768 (10K)	dim1536 (10K)
Insert throughput	28K vec/s	3K vec/s	3K vec/s
Search p50 latency	410 us	22 ms	45 ms
Memory per vector	615 B	3.2 KB	6.2 KB

End-to-end similarity cache vs exact-match cache:

Scenario	Exact-Match Hit%	Similarity Hit%	Boost
Identical prompts	0%	100%	+100%
Slight rephrase	0%	100%	+100%
Moderate rephrase	0%	100%	+100%
Heavy rephrase	0%	87.8%	+87.8%

Outputs JSON results to vector_bench_results.json for CI integration.

Server Improvements

INFO command now aggregates engine stats from all shards as a proper RESP bulk string
SET command correctly parses EX (seconds) and PX (milliseconds) TTL options
SET propagates engine errors (e.g., oversized values) instead of always returning +OK
CONFIG GET POLICY returns the active policy name
AiArtifactCache is now a persistent member of EngineShard (share-nothing per-core), fixing statelessness bugs with AI.STATS and AI.INVALIDATE
AI.STATS reports all new metrics: similarity queries/hits, cascade invalidations, stream counts, compression ratio, cost savings

Bug Fixes

Fixed sim_put hardcoding artifact type to "embedding", causing type mismatch rejection when storing other artifact types via similarity API
Fixed integration test failures from missing/incorrect RESP formatting in INFO, CONFIG GET POLICY, and AI.GET handlers
Fixed AI.INVALIDATE not cleaning up vector index and dependency graph entries for removed keys

Test Coverage

35+ test cases across 4 test suites, all passing
21 new test cases for AI-first features covering:
- VectorIndex insert/search/remove, cosine/L2/dot-product metrics, Int8 quantization
- DepGraph edges, cascade traversal, node removal
- CompressionEngine RLE, Delta, Float16/Int8 quantization round-trips
- Streaming begin/append/end/get lifecycle
- Token economics cost tracking and budget enforcement
- Pipeline cascade invalidation end-to-end
- AI.STATS metric reporting completeness

File Summary

New files (17):

include/pomai_cache/bloom_filter.hpp, src/ds/bloom_filter.cpp
include/pomai_cache/count_min_sketch.hpp, src/ds/count_min_sketch.cpp
include/pomai_cache/vector_index.hpp, src/ds/vector_index.cpp
include/pomai_cache/dep_graph.hpp, src/ds/dep_graph.cpp
include/pomai_cache/compression.hpp, src/ds/compression.cpp
bench/vector_cache_bench.cpp
sdk/python/pyproject.toml
sdk/python/pomai_cache/__init__.py
sdk/python/pomai_cache/client.py
sdk/python/pomai_cache/resp.py
sdk/python/pomai_cache/decorators.py

Modified files (7):

CMakeLists.txt -- added new source files and vector benchmark target
include/pomai_cache/ai_cache.hpp -- extended with similarity, streaming, cost, cascade APIs
include/pomai_cache/engine_shard.hpp -- persistent AiArtifactCache per shard
include/pomai_cache/policy.hpp -- frequency estimate in CandidateView
src/server/ai_cache.cpp -- implemented all new AI-first methods
src/server/server_main.cpp -- new command handlers, fixed RESP formatting
src/policy/policies.cpp -- CMS frequency in benefit scoring
tests/test_ai_cache.cpp -- 21 new test cases

Removed:

third_party/dragonfly/ -- entire directory removed; all useful code rewritten natively

Upgrade Notes

Backward compatible -- all v1.0.0 RESP commands and AI commands work unchanged
New commands are additive; existing clients require no changes
Python SDK is a new optional component; install with pip install pomai-cache
The pomai_cost eviction policy now uses frequency estimates from CountMinSketch; behavior is improved but slightly different from v1.0.0's pure heuristic scoring

Assets 2

11 Feb 01:03

AutoCookies

pomaicache-v1.0.0

2c46d82

Pomai Cache V1.0.0

Motivation

Provide a local, best-effort AI Artifact Cache layer for embeddings, prompts, RAG chunks, rerank buffers and responses with cache semantics (lossy, TTL/capacity-driven, warm restart).
Use deterministic canonical keys and a content-addressed blob indirection to enable deduplication and compact metadata handling.
Expose AI-native commands over RESP so existing clients like redis-cli can store, fetch, invalidate, and introspect AI artifacts.

Description

Added a new AiArtifactCache layer (include/pomai_cache/ai_cache.hpp, src/server/ai_cache.cpp) that stores typed ArtifactMeta + payloads with a blob:<content_hash> indirection, best-effort refcounts, per-epoch/model/prefix bounded indexes, and introspection (stats, top_hot, top_costly, explain).
Extended the RESP server (src/server/server_main.cpp) with AI commands: AI.PUT, AI.GET, AI.MGET, AI.EMB.PUT, AI.EMB.GET, AI.INVALIDATE EPOCH|MODEL|PREFIX, AI.STATS, AI.TOP HOT|COSTLY, and AI.EXPLAIN, and instantiated AiArtifactCache in the server loop.
Implemented deterministic canonical key helpers (emb/prm/rag/rrk/rsp) and owner TTL defaults + type-based miss_cost guidance; updated engine owner miss-cost defaults in src/engine/engine.cpp to bias policy for AI artifact owners.
Added AI benchmark bench/ai_artifact_bench.cpp and wired up CMake (CMakeLists.txt) plus tests (tests/test_ai_cache.cpp, extended tests/test_integration.cpp) and docs (docs/AI_CACHE.md, docs/AI_COMMANDS.md, docs/INVALIDATION.md, docs/BLOB_DEDUP.md), and updated README.md with quickstart examples and recommended AI config.

Testing

Built the project (cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug && cmake --build build -j) and all test targets compiled successfully. ✅
Ran the full test suite (ctest --test-dir build --output-on-failure), and all tests including the new AI unit and integration tests passed. ✅
Built and executed the AI benchmark target (pomai_cache_ai_bench) which produced a JSON summary (ops/s, p50/p95/p99/p999, hit_rate) for the embedding workload; the bench target compiles and runs but longer runs may require tuning in constrained CI environments (a longer timeout was observed during an extended run). ⚠️

What's Changed

Codex-generated pull request by @AutoCookies in #1
Phase 2 hardening: netbench, robust tests, latency controls, and CI upgrades by @AutoCookies in #2
Codex-generated pull request by @AutoCookies in #4
Add AI artifact cache layer and AI.* RESP commands (embeddings/prompts/RAG/dedup/invalidation) by @AutoCookies in #5

New Contributors

@AutoCookies made their first contribution in #1

Full Changelog: https://github.com/AutoCookies/pomaicache/commits/pomaicache-v1.0.0

Contributors

AutoCookies

Assets 2

Releases: AutoCookies/pomaicache

Pomai Cache V1.3.1 Release

Uh oh!

Pomai Cache V1.3.0

Uh oh!

Pomai Cache V1.2.0 Release

Uh oh!

Pomai Cache V1.1.0 Release

Pomai Cache v1.1.0 Release Notes

Summary

New Features

1. Semantic Similarity Cache

2. Pipeline-Aware Cascade Invalidation

3. Token-Economics-Aware Eviction

4. Streaming & Chunked Response Caching

5. Compression & Quantization Engine

6. Python SDK (pomai-cache)

Ported & Rebranded Data Structures (from Dragonfly)

Benchmarks

New: Vector Cache Benchmark (pomai_cache_vector_bench)

Server Improvements

Bug Fixes

Test Coverage

File Summary

Upgrade Notes

Uh oh!

Pomai Cache V1.0.0

Motivation

Description

Testing

What's Changed

New Contributors

Contributors

Uh oh!

6. Python SDK (`pomai-cache`)

New: Vector Cache Benchmark (`pomai_cache_vector_bench`)