This app leverages Semantic Caching to minimize inference latency and reduce API costs by reusing semantically similar prompt responses.
-
Updated
Jul 4, 2025 - Python
This app leverages Semantic Caching to minimize inference latency and reduce API costs by reusing semantically similar prompt responses.
Official implementation of "SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching" (COLM 2025). A novel KV cache compression method that organizes cache at sentence level using semantic similarity.
This is a hands-on workshop to create the application Redis Movies Searcher
Add a description, image, and links to the semantic-caching topic page so that developers can more easily learn about it.
To associate your repository with the semantic-caching topic, visit your repo's landing page and select "manage topics."