Skip to content

Releases: AutoCookies/pomaicache

Pomai Cache V1.3.1 Release

05 Mar 14:29

Choose a tag to compare

Pomai Cache V1.3.0

04 Mar 08:37

Choose a tag to compare

Add token cache

Pomai Cache V1.2.0 Release

04 Mar 06:46

Choose a tag to compare

Using http custom, using palloc to memory allocation, single threaeded

Pomai Cache V1.1.0 Release

28 Feb 02:40
044cb65

Choose a tag to compare

Pomai Cache v1.1.0 Release Notes

Release Date: February 28, 2026
Tag: pomaicache-v1.1.0
Codename: AI-First


Summary

Pomai Cache v1.1.0 transforms the project from a Redis-compatible KV cache into a purpose-built AI-first cache for inference pipelines. This release adds semantic similarity search, pipeline-aware cascade invalidation, token-economics-aware eviction, streaming response caching, a compression/quantization engine, a Python SDK, and comprehensive benchmarks -- all implemented as zero-dependency native C++ with no external library requirements.


New Features

1. Semantic Similarity Cache

  • VectorIndex with brute-force ANN search supporting Cosine, L2, and DotProduct distance metrics
  • SSE2 SIMD-accelerated distance computation (~0.7 GFLOP/s single-thread)
  • Int8 vector quantization with 4x memory reduction (MAE < 0.0005/dim)
  • Float16 quantization with 2x memory reduction (MAE < 0.00003/dim)
  • New commands: AI.SIM.PUT, AI.SIM.GET
  • Enables cache hits on paraphrased prompts -- benchmarks show 87-100% hit rate where exact-match caches get 0%

2. Pipeline-Aware Cascade Invalidation

  • DepGraph (directed acyclic graph) tracks parent-child artifact dependencies
  • Invalidating a parent embedding automatically cascades to all downstream responses, rerank buffers, and RAG chunks
  • New commands: AI.PUT ... DEPENDS_ON parent1 parent2, AI.INVALIDATE CASCADE <key>
  • Prevents stale derived artifacts from surviving source invalidation

3. Token-Economics-Aware Eviction

  • Extended ArtifactMeta with inference_tokens, inference_latency_ms, dollar_cost
  • PomaiCostPolicy now factors inference cost into eviction benefit scoring
  • Budget management: set a $/hour spend cap and the cache prioritizes high-cost artifacts
  • New commands: AI.COST.REPORT, AI.BUDGET <dollars_per_hour>
  • Cost tracking reports total dollars saved, tokens saved, and latency saved

4. Streaming & Chunked Response Caching

  • Incrementally store LLM streaming responses as they arrive token-by-token
  • Retrieve partial or completed streams
  • New commands: AI.STREAM.BEGIN, AI.STREAM.APPEND, AI.STREAM.END, AI.STREAM.GET
  • Prevents re-inference when identical streaming requests arrive concurrently

5. Compression & Quantization Engine

  • CompressionEngine with Run-Length Encoding (RLE) and Delta compression
  • Auto-selects best compression strategy per payload
  • Float32-to-Float16 and Float32-to-Int8 embedding quantization
  • Prefix deduplication for prompt families
  • Compression ratio tracked in AI.STATS

6. Python SDK (pomai-cache)

  • Synchronous (PomaiCache) and async (AsyncPomaiCache) clients
  • Full coverage of all AI-first commands: sim_put, sim_get, stream_begin/append/end/get, cost_report, budget, invalidate_cascade
  • Native NumPy and PyTorch vector conversion
  • @memoize decorator for transparent function-level caching
  • OpenTelemetry integration for distributed tracing
  • Installable via pip: pip install pomai-cache

Ported & Rebranded Data Structures (from Dragonfly)

These were rewritten from scratch as zero-dependency Pomai Cache native code:

  • BloomFilter / ScalableBloomFilter -- probabilistic membership testing for content-based deduplication fast-paths
  • CountMinSketch / MultiWindowSketch -- space-efficient frequency estimation integrated into the eviction policy's p_reuse calculation
  • FNV-1a hashing -- custom hash function replacing all external hash dependencies

The third_party/dragonfly directory has been removed. All functionality is native.


Benchmarks

New: Vector Cache Benchmark (pomai_cache_vector_bench)

Comprehensive benchmark suite comparing Pomai Cache against Redis+RediSearch, Milvus, Qdrant, Weaviate, and Pinecone across:

Metric dim128 (1K) dim768 (10K) dim1536 (10K)
Insert throughput 28K vec/s 3K vec/s 3K vec/s
Search p50 latency 410 us 22 ms 45 ms
Memory per vector 615 B 3.2 KB 6.2 KB

End-to-end similarity cache vs exact-match cache:

Scenario Exact-Match Hit% Similarity Hit% Boost
Identical prompts 0% 100% +100%
Slight rephrase 0% 100% +100%
Moderate rephrase 0% 100% +100%
Heavy rephrase 0% 87.8% +87.8%

Outputs JSON results to vector_bench_results.json for CI integration.


Server Improvements

  • INFO command now aggregates engine stats from all shards as a proper RESP bulk string
  • SET command correctly parses EX (seconds) and PX (milliseconds) TTL options
  • SET propagates engine errors (e.g., oversized values) instead of always returning +OK
  • CONFIG GET POLICY returns the active policy name
  • AiArtifactCache is now a persistent member of EngineShard (share-nothing per-core), fixing statelessness bugs with AI.STATS and AI.INVALIDATE
  • AI.STATS reports all new metrics: similarity queries/hits, cascade invalidations, stream counts, compression ratio, cost savings

Bug Fixes

  • Fixed sim_put hardcoding artifact type to "embedding", causing type mismatch rejection when storing other artifact types via similarity API
  • Fixed integration test failures from missing/incorrect RESP formatting in INFO, CONFIG GET POLICY, and AI.GET handlers
  • Fixed AI.INVALIDATE not cleaning up vector index and dependency graph entries for removed keys

Test Coverage

  • 35+ test cases across 4 test suites, all passing
  • 21 new test cases for AI-first features covering:
    • VectorIndex insert/search/remove, cosine/L2/dot-product metrics, Int8 quantization
    • DepGraph edges, cascade traversal, node removal
    • CompressionEngine RLE, Delta, Float16/Int8 quantization round-trips
    • Streaming begin/append/end/get lifecycle
    • Token economics cost tracking and budget enforcement
    • Pipeline cascade invalidation end-to-end
    • AI.STATS metric reporting completeness

File Summary

New files (17):

  • include/pomai_cache/bloom_filter.hpp, src/ds/bloom_filter.cpp
  • include/pomai_cache/count_min_sketch.hpp, src/ds/count_min_sketch.cpp
  • include/pomai_cache/vector_index.hpp, src/ds/vector_index.cpp
  • include/pomai_cache/dep_graph.hpp, src/ds/dep_graph.cpp
  • include/pomai_cache/compression.hpp, src/ds/compression.cpp
  • bench/vector_cache_bench.cpp
  • sdk/python/pyproject.toml
  • sdk/python/pomai_cache/__init__.py
  • sdk/python/pomai_cache/client.py
  • sdk/python/pomai_cache/resp.py
  • sdk/python/pomai_cache/decorators.py

Modified files (7):

  • CMakeLists.txt -- added new source files and vector benchmark target
  • include/pomai_cache/ai_cache.hpp -- extended with similarity, streaming, cost, cascade APIs
  • include/pomai_cache/engine_shard.hpp -- persistent AiArtifactCache per shard
  • include/pomai_cache/policy.hpp -- frequency estimate in CandidateView
  • src/server/ai_cache.cpp -- implemented all new AI-first methods
  • src/server/server_main.cpp -- new command handlers, fixed RESP formatting
  • src/policy/policies.cpp -- CMS frequency in benefit scoring
  • tests/test_ai_cache.cpp -- 21 new test cases

Removed:

  • third_party/dragonfly/ -- entire directory removed; all useful code rewritten natively

Upgrade Notes

  • Backward compatible -- all v1.0.0 RESP commands and AI commands work unchanged
  • New commands are additive; existing clients require no changes
  • Python SDK is a new optional component; install with pip install pomai-cache
  • The pomai_cost eviction policy now uses frequency estimates from CountMinSketch; behavior is improved but slightly different from v1.0.0's pure heuristic scoring

Pomai Cache V1.0.0

11 Feb 01:03

Choose a tag to compare

Motivation

Provide a local, best-effort AI Artifact Cache layer for embeddings, prompts, RAG chunks, rerank buffers and responses with cache semantics (lossy, TTL/capacity-driven, warm restart).
Use deterministic canonical keys and a content-addressed blob indirection to enable deduplication and compact metadata handling.
Expose AI-native commands over RESP so existing clients like redis-cli can store, fetch, invalidate, and introspect AI artifacts.

Description

Added a new AiArtifactCache layer (include/pomai_cache/ai_cache.hpp, src/server/ai_cache.cpp) that stores typed ArtifactMeta + payloads with a blob:<content_hash> indirection, best-effort refcounts, per-epoch/model/prefix bounded indexes, and introspection (stats, top_hot, top_costly, explain).
Extended the RESP server (src/server/server_main.cpp) with AI commands: AI.PUT, AI.GET, AI.MGET, AI.EMB.PUT, AI.EMB.GET, AI.INVALIDATE EPOCH|MODEL|PREFIX, AI.STATS, AI.TOP HOT|COSTLY, and AI.EXPLAIN, and instantiated AiArtifactCache in the server loop.
Implemented deterministic canonical key helpers (emb/prm/rag/rrk/rsp) and owner TTL defaults + type-based miss_cost guidance; updated engine owner miss-cost defaults in src/engine/engine.cpp to bias policy for AI artifact owners.
Added AI benchmark bench/ai_artifact_bench.cpp and wired up CMake (CMakeLists.txt) plus tests (tests/test_ai_cache.cpp, extended tests/test_integration.cpp) and docs (docs/AI_CACHE.md, docs/AI_COMMANDS.md, docs/INVALIDATION.md, docs/BLOB_DEDUP.md), and updated README.md with quickstart examples and recommended AI config.

Testing

Built the project (cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug && cmake --build build -j) and all test targets compiled successfully. ✅
Ran the full test suite (ctest --test-dir build --output-on-failure), and all tests including the new AI unit and integration tests passed. ✅
Built and executed the AI benchmark target (pomai_cache_ai_bench) which produced a JSON summary (ops/s, p50/p95/p99/p999, hit_rate) for the embedding workload; the bench target compiles and runs but longer runs may require tuning in constrained CI environments (a longer timeout was observed during an extended run). ⚠️

What's Changed

  • Codex-generated pull request by @AutoCookies in #1
  • Phase 2 hardening: netbench, robust tests, latency controls, and CI upgrades by @AutoCookies in #2
  • Codex-generated pull request by @AutoCookies in #4
  • Add AI artifact cache layer and AI.* RESP commands (embeddings/prompts/RAG/dedup/invalidation) by @AutoCookies in #5

New Contributors

Full Changelog: https://github.com/AutoCookies/pomaicache/commits/pomaicache-v1.0.0