Skip to content

[Vulkan] Add multi-queue decode correctness and queue-ownership stress suite #18

@goniz

Description

@goniz

Problem

As the Vulkan backend becomes more asynchronous and decode-oriented, generation correctness needs dedicated stress coverage for queue ownership, timeline sequencing, and cross-queue handoff. The current multi-queue work is not sufficiently protected by targeted decode-focused correctness tests.

This is not a duplicate of #1. Issue #1 added the multi-queue submission model. This issue is about proving it stays correct under real decode-like workloads and future performance changes.

Why This Matters

The reference review reinforced that MLX should keep a stronger correctness model than the reference runtimes, especially for cross-queue synchronization and queue-family ownership. If we reduce synchronization to improve performance, we need tests that catch subtle generation regressions early.

Tasks

  • Add targeted tests for same-family dual-queue setups and separate-family compute/transfer setups
  • Stress upload -> compute -> KV append -> readback ordering in decode-like workloads
  • Add assertions or trace validation around timeline values and queue-affinity transitions where practical
  • Add short-timeout reproduction harnesses for generation correctness under multi-queue execution

Acceptance Criteria

  • Dedicated multi-queue decode correctness coverage exists in CI or reproducible local tests
  • Same-family and separate-family queue topologies are both exercised where available
  • Future sync/perf changes can be validated without relying only on large manual benchmark runs

References

  • mlx-vulkan-reference-conclusions.md
  • references/ggml-vulkan-findings.md
  • references/zinc-findings.md
  • references/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp
  • references/zinc/src/vulkan/command.zig
  • references/zinc/src/compute/forward.zig

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions