Skip to content

juhun32/bucket

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

distributed caching layer designed to optimize LLM inference. By moving vector embedding generation to the client side and synchronizing a shared semantic cache via AWS S3, it eliminates redundant cloud computation, reduce latency and API costs for repeated queries.

About

semantic caching system for LLMs. performs edge-based vectorization in the browser and uses a go backend synchronized with AWS S3 to serve cached AI responses, cutting API costs, latency, and potential carbon emissions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors