Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]
-
Updated
Mar 5, 2025 - Python
Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]
Edge-optimized OpenCUA-7B computer-use agent evaluated on OSWorld, exploring systematic vLLM inference optimizations across CPU and GPU, including precision tuning, image history management, speculative decoding, and prefix caching.
Add a description, image, and links to the prefix-caching topic page so that developers can more easily learn about it.
To associate your repository with the prefix-caching topic, visit your repo's landing page and select "manage topics."