Skip to content

Conversation

@Coco58323
Copy link

Fixed illegal memory access in _rms_norm_fwd_fused, _layer_norm_param_fwd_fused, and _layer_norm_noparam_fwd_fused kernels.

The kernels were missing row boundary checks (rows < M) which caused out-of-bounds memory access when the number of rows M is not divisible by BLOCK_M (32).

Changes:

  • Added M parameter to all three kernels
  • Added row_mask = rows < M
  • Changed mask from 1D (cols only) to 2D (rows & cols)
  • Applied proper masking to all tl.load and tl.store operations

Fixed illegal memory access in _rms_norm_fwd_fused, _layer_norm_param_fwd_fused,
and _layer_norm_noparam_fwd_fused kernels.

The kernels were missing row boundary checks (rows < M) which caused
out-of-bounds memory access when the number of rows M is not divisible
by BLOCK_M (32).

Changes:
- Added M parameter to all three kernels
- Added row_mask = rows < M
- Changed mask from 1D (cols only) to 2D (rows & cols)
- Applied proper masking to all tl.load and tl.store operations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant