prefix-caching

Here are 2 public repositories matching this topic...

ruipeterpan / marconi

Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]

llm-inference mamba-state-space-models prefix-caching hybrid-llm

Updated Mar 5, 2025
Python

rohanarcot / ECUA-OSWorld-OpenCUA

Star

Edge-optimized OpenCUA-7B computer-use agent evaluated on OSWorld, exploring systematic vLLM inference optimizations across CPU and GPU, including precision tuning, image history management, speculative decoding, and prefix caching.

quantization agents multimodal inference-optimization edge-ai vllm speculative-decoding gui-agents prefix-caching osworld opencua

Updated Dec 18, 2025
Python

Improve this page

Add a description, image, and links to the prefix-caching topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the prefix-caching topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly