Skip to content

Fix conv_general output offset overflow in Metal writeback#3320

Closed
Aristide021 wants to merge 2 commits intoml-explore:mainfrom
Aristide021:conv-general-offset-overflow
Closed

Fix conv_general output offset overflow in Metal writeback#3320
Aristide021 wants to merge 2 commits intoml-explore:mainfrom
Aristide021:conv-general-offset-overflow

Conversation

@Aristide021
Copy link
Copy Markdown
Contributor

@Aristide021 Aristide021 commented Mar 25, 2026

Fixes incorrect output writes in Metal conv_general when output offsets exceed signed 32-bit range.

Root cause

In Metal steel conv writeback paths, output pointer offset arithmetic used 32-bit int intermediates. For large outputs, this can overflow and corrupt writes.

What this changes

  • Switches output offset arithmetic to 64-bit (size_t) in:
    • mlx/backend/metal/kernels/steel/conv/kernels/steel_conv.h
    • mlx/backend/metal/kernels/steel/conv/kernels/steel_conv_general.h
  • Keeps the change scoped to kernel writeback offset calculation (no API/semantic changes)

Tests

  • Adds C++ regression test in tests/ops_tests.cpp:
    • test conv2d output offset uses 64-bit arithmetic
    • Verifies offset math past int32 max
    • Verifies mismatch vs prior int32 wraparound behavior

Validation run

  • cmake --build build --target mlx-metallib
  • cmake --build build --target tests
  • build/tests/tests --test-case "test conv2d output offset uses 64-bit arithmetic"
  • Reproducer threshold checks (pointwise conv_general vs matmul) now pass for cases above 2^31

Closes #3248.

@Aristide021 Aristide021 marked this pull request as draft March 25, 2026 14:04
@Aristide021 Aristide021 force-pushed the conv-general-offset-overflow branch from 0085a3b to 0b12f2c Compare March 25, 2026 14:43
@Aristide021 Aristide021 marked this pull request as ready for review March 25, 2026 14:44
Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added test passes without the change.

@Aristide021 Aristide021 marked this pull request as draft March 26, 2026 16:10
@Aristide021
Copy link
Copy Markdown
Contributor Author

The added test passes without the change.

I tried to build the e2e test but found a blocker unrelated to this fix. I've filed a separate issue: #3327.

I'm closing this PR for now since I can't meaningfully verify the fix with a reliable regression test. Once #3327 is resolved, this can be revisited with proper coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] mx.conv_general produces wrong results when total output elements exceed 2^31 (~2.15 billion).

2 participants