[Vulkan] Implement vendor/architecture-specific matmul tuning

## Problem
MLX uses one default `MLX_VULKAN_MATMUL_SPEC` tuple unless manually overridden by env var. ggml has a much richer architecture/tuning matrix including vendor detection, subgroup characteristics, cooperative matrix probing, and many precompiled matmul/FA variants.

## Tasks
- [ ] Build vendor/architecture tuning table with:
  - matmul tile sizes
  - subgroup sizes
  - aligned vs unaligned variants
  - f16acc vs f32acc
  - split-K thresholds
  - small/medium/large kernel families
- [ ] Add subgroup-size control and cooperative matrix capability detection
- [ ] Add integer-dot-product support detection
- [ ] Add architecture-specific kernel selection

## Related
See Tier 2 item 1 in performance analysis report.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Vulkan] Implement vendor/architecture-specific matmul tuning #6

Problem

Tasks

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Vulkan] Implement vendor/architecture-specific matmul tuning #6

Description

Problem

Tasks

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions