[Vulkan] Add native decode hot path for attention and KV cache update

## Problem

Decode still pays too much for generic primitive execution around attention and KV cache maintenance. Queue and scheduler fixes alone will not close the decode gap if the backend keeps using generic hot-path kernels and generic KV update flows.

This is not a duplicate of #6. Issue #6 focuses on matmul tuning tables and vendor-specific kernel selection. This issue focuses on the decode hot path itself: attention plus KV cache update/append behavior.

## Why This Matters

The reference runtimes both invest heavily in inference-shaped hot paths:

- ggml-vulkan has substantial attention specialization and decode-oriented KV behavior
- Zinc hand-codes the token loop around attention, KV write, and immediate consumption

MLX needs a native decode hot path rather than paying repeated generic primitive overhead around these operations.

## Tasks

- [ ] Add a native or fused decode attention path for the common autoregressive token case
- [ ] Add a native KV cache append/update path optimized for decode
- [ ] Keep small latency-sensitive decode updates on the compute queue unless bulk transfer overlap is measurably better
- [ ] Benchmark Qwen3 decode before/after and validate generation correctness

## Acceptance Criteria

- Qwen3 decode throughput improves materially
- Decode traces show fewer copy/sync boundaries around attention + KV work
- No correctness regressions on causal/GQA decode shapes

## References

- `mlx-vulkan-reference-conclusions.md`
- `references/ggml-vulkan-findings.md`
- `references/zinc-findings.md`
- `references/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp`
- `references/llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/flash_attn*.comp`
- `references/zinc/src/compute/attention.zig`
- `references/zinc/src/compute/forward.zig` (attention + KV write sequencing)

## Related

- #6
- #12


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Vulkan] Add native decode hot path for attention and KV cache update #16

Problem

Why This Matters

Tasks

Acceptance Criteria

References

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Vulkan] Add native decode hot path for attention and KV cache update #16

Description

Problem

Why This Matters

Tasks

Acceptance Criteria

References

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions