You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GhostCacher is a distributed Key-Value (KV) prompt caching orchestrator that dramatically reduces LLM inference latency and cost by storing and reusing the computed attention states of frequently used prompt prefixes across a distributed GPU cluster.
Computational comparison of GPU Inference Latency for Automatic Speech Recognition (ASR) model: Whisper large-v3 before and after fine-tuning on German corpus.