Idea or Feature Request: simple prefill function for KV cache creation or update #964

gitkaz · 2026-03-08T06:07:08Z

gitkaz
Mar 8, 2026

Dear mlx_lm developers,

Thank you for your wonderful work on this project. I especially want to thank @angeloskath for implementing the "Better caching in the server" (I call it "prompt_checkpoint" feature) in the 0.31.0 update. This is a fantastic improvement.

Since Qwen-Next models were released, I have been struggling with how to handle KV caches that cannot be trimmed. After reading the prompt_checkpoint implementation, I understood that it's possible to stop KV cache updates at the prefill stage. This is exactly what I needed.

So, how about adding a standalone function to mlx_lm that performs prefill only, without text generation? In other words, a function that, just processes (evaluate) prompt them through the model to build the KV cache and return the KV cache. No text-generation.

I think it is useful for various cases. Not only handles non-trimmable kv cache models, but also useful for applications that need to process large system prompts or document contexts ahead of time, this would allow pre-building caches that can be reused across multiple sessions.

I have currently forked some parts from generate_step function to create a more simple function to work just above in my project. While this seems working well, I believe this functionality would be valuable as an official API.(I have no knowledge of directly manipulating the tensor of llms, so I worry whether I'm doing it correctly. (lol))

(just for your information, here is my code: https://github.com/gitkaz/mlx_gguf_server/blob/experimental_feature/prompt_checkpoint_based_stream_generate/worker/task/completions_stream/fork_from_mlx_lm.py)

Thank you,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea or Feature Request: simple prefill function for KV cache creation or update #964

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Idea or Feature Request: simple prefill function for KV cache creation or update #964

Uh oh!

Uh oh!

gitkaz Mar 8, 2026

Replies: 0 comments

gitkaz
Mar 8, 2026