Modify tile size for DMA operations to use Y_SIZE#1319
Modify tile size for DMA operations to use Y_SIZE#1319wypku wants to merge 1 commit intoaws-neuron:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the tiling overview documentation example to size SBUF tiles and DMA slices based on Y_SIZE (512) instead of a hard-coded 256, matching the stated expected tensor shape (128, 512).
Changes:
- Update
in_tile/out_tileallocations to use(P_DIM, Y_SIZE). - Update DMA copy source/destination slicing to copy
0:Y_SIZEinstead of0:256.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| nki.isa.dma_copy(dst=in_tile, src=in_tensor[0:P_DIM, 0:Y_SIZE]) | ||
| nki.isa.reciprocal(dst=out_tile, data=in_tile) | ||
| nki.isa.dma_copy(dst=out_tensor[0:P_DIM, 0:256], src=out_tile) | ||
| nki.isa.dma_copy(dst=out_tensor[0:P_DIM, 0:Y_SIZE], src=out_tile) |
There was a problem hiding this comment.
The surrounding prose/output describe an exponential example (e.g., output values ~2.7188), but this code calls nki.isa.reciprocal. Please align the example so the operation matches the narrative/output (either switch the ISA op to an exponential variant, or update the text/output to describe reciprocal).
|
|
||
| # Process first tile | ||
| nki.isa.dma_copy(dst=in_tile, src=in_tensor[0:P_DIM, 0:256]) | ||
| nki.isa.dma_copy(dst=in_tile, src=in_tensor[0:P_DIM, 0:Y_SIZE]) |
There was a problem hiding this comment.
This snippet imports nki.isa as nisa earlier in the same code block, but uses nki.isa.* here. To avoid confusing readers (and an otherwise-unused alias), either use the nisa alias consistently in these calls or change/remove the alias import.
IMPORTANT! If this is a documentation PR for a specific release, this PR must go the corresponding release branch (
release-X.XX.X). If it is an "out-of-band" doc update, the PR must go to themasterbranch.Required PR information
To expedite approvals and merges for releases, provide the following information (select the
...button to the right at the top of your PR message to edit it):Before you request approvals
Approvers
We require 3-4 approvers to merge for non-trivial content changes (where a "trivial" change is a typo/grammar fix or a minor update to the format syntax):
Make sure you get a commitment from these reviewers in advance! It's hard to get good quality doc reviews in order in the 11th hour of a release.
Note: For trivial changes, you only need @erickson-doug's approval. He will merge your content once he's confirmed the fixes on staging.
Doc review checklist
Engineering reviewer checklist
For PRs that include code or notebook examples
MANDATORY: PR must include test run output
Provide this information for the QA reviewer in order to expedite their review.
Test run output:
Specify the release version, instance size and type, OS type and test output.
For Training tutorials:
{Convergence graph for training tutorials}
{Performance metrics
average_throughput,latency_p50,latency_p99and MFU% if available}Make sure this PR contains correct classification terms (Alpha, Beta, and Stable).
If possible, provide your results or a link to them for the reviewer to check your work.
Code example/notebook content PR checklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.