-
Notifications
You must be signed in to change notification settings - Fork 0
[Vulkan] Add multi-queue decode correctness and queue-ownership stress suite #18
Description
Problem
As the Vulkan backend becomes more asynchronous and decode-oriented, generation correctness needs dedicated stress coverage for queue ownership, timeline sequencing, and cross-queue handoff. The current multi-queue work is not sufficiently protected by targeted decode-focused correctness tests.
This is not a duplicate of #1. Issue #1 added the multi-queue submission model. This issue is about proving it stays correct under real decode-like workloads and future performance changes.
Why This Matters
The reference review reinforced that MLX should keep a stronger correctness model than the reference runtimes, especially for cross-queue synchronization and queue-family ownership. If we reduce synchronization to improve performance, we need tests that catch subtle generation regressions early.
Tasks
- Add targeted tests for same-family dual-queue setups and separate-family compute/transfer setups
- Stress upload -> compute -> KV append -> readback ordering in decode-like workloads
- Add assertions or trace validation around timeline values and queue-affinity transitions where practical
- Add short-timeout reproduction harnesses for generation correctness under multi-queue execution
Acceptance Criteria
- Dedicated multi-queue decode correctness coverage exists in CI or reproducible local tests
- Same-family and separate-family queue topologies are both exercised where available
- Future sync/perf changes can be validated without relying only on large manual benchmark runs
References
mlx-vulkan-reference-conclusions.mdreferences/ggml-vulkan-findings.mdreferences/zinc-findings.mdreferences/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cppreferences/zinc/src/vulkan/command.zigreferences/zinc/src/compute/forward.zig