-
Notifications
You must be signed in to change notification settings - Fork 879
fix: guard short token datasets in train_large_ane and dynamic pipeline #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
log-wade
wants to merge
1
commit into
maderix:main
Choose a base branch
from
log-wade:contribution/benchmark-m5-and-fixes
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| # Contribution submission guide | ||
|
|
||
| This file summarizes what was done on branch `contribution/benchmark-m5-and-fixes` and how to submit it. | ||
|
|
||
| --- | ||
|
|
||
| ## 1. Benchmark (submit to Issue #3) | ||
|
|
||
| **Link:** https://github.com/maderix/ANE/issues/3 | ||
|
|
||
| **Post this as a new comment:** | ||
|
|
||
| ``` | ||
| ## M5 MacBook Pro benchmark (static pipeline, 20 steps) | ||
|
|
||
| - **Chip:** Apple M5, 10-core (4P+6E) | ||
| - **RAM:** 24 GB | ||
| - **macOS:** 26.3 (Build 25D125) | ||
| - **Run:** `./train_large --data ./tinystories_data00.bin --steps 20 --lr 1e-4` | ||
|
|
||
| ### Efficiency report | ||
| - Total steps: 20 | ||
| - Wall time: 10423 ms (10.4 s) | ||
| - Compile time: 7187 ms (69.0%) | ||
| - Train time: 2542 ms (24.4%) | ||
| - **Avg train: 127.1 ms/step** | ||
| - ANE TFLOPS: 0.73 sustained | ||
| - ANE utilization: 4.6% of 15.8 TFLOPS | ||
|
|
||
| Full output with JSON lines is in `benchmarks/my_m5_benchmark_output.txt` (or paste the contents below). | ||
| ``` | ||
|
|
||
| Then paste the contents of `benchmarks/my_m5_benchmark_output.txt` in the same comment, or attach it. | ||
|
|
||
| --- | ||
|
|
||
| ## 2. Bug fix (PR) | ||
|
|
||
| **Fix:** Guard short token datasets in `train_large_ane.m` and `training/training_dynamic/train.m`. | ||
|
|
||
| **Why:** When `n_tokens <= SEQ + 1`, the expression `max_pos = n_tokens - SEQ - 1` underflows (unsigned), leading to a huge random range and possible out-of-bounds reads. `train_large.m` already had this guard; the other two pipelines did not. | ||
|
|
||
| **Changes:** | ||
| - `training/train_large_ane.m`: After `n_tokens = data_len / 2`, add a check that fails early with a clear error, munmap and close the fd, and return 1. | ||
| - `training/training_dynamic/train.m`: Same guard added. | ||
|
|
||
| **Suggested PR title:** `fix: guard short token datasets in train_large_ane and dynamic pipeline` | ||
|
|
||
| **Suggested PR description:** | ||
|
|
||
| ```markdown | ||
| ## Summary | ||
| - Add a token dataset length guard in `training/train_large_ane.m` | ||
| - Add the same guard in `training/training_dynamic/train.m` | ||
| - Fail early with a clear error when the dataset is too short for one (input, target) window | ||
|
|
||
| ## Why | ||
| Both paths use `max_pos = n_tokens - SEQ - 1`. When `n_tokens <= SEQ + 1`, this unsigned subtraction underflows, producing a huge range and potentially out-of-bounds reads. `train_large.m` already had this guard (lines 299–304); this PR aligns the other two pipelines. | ||
|
|
||
| ## Validation | ||
| - `make -C training train_large_ane` — builds | ||
| - `make -C training/training_dynamic train` — builds | ||
| - With a too-short data file, both binaries exit with the new error message. | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## 3. Optional: benchmark data in repo | ||
|
|
||
| Branch also adds: | ||
| - `benchmarks/my_m5_benchmark_output.txt` — full benchmark log | ||
| - One new entry in `benchmarks/community_results.json` for this M5 run (contributor: `log-wade`) | ||
|
|
||
| You can either: | ||
| - Include the `community_results.json` update in the same PR as the bug fix, or | ||
| - Omit it and only post the benchmark to Issue #3 (maintainer may update the report from the issue). | ||
|
|
||
| --- | ||
|
|
||
| ## 4. Before opening the PR | ||
|
|
||
| 1. **Fork the repo** on GitHub (if you haven’t): https://github.com/maderix/ANE → Fork. | ||
| 2. **Add your fork as a remote and push:** | ||
| ```bash | ||
| git remote add myfork git@github.com:YOUR_USERNAME/ANE.git | ||
| git push myfork contribution/benchmark-m5-and-fixes | ||
| ``` | ||
| 3. Open a PR from `myfork/contribution/benchmark-m5-and-fixes` to `maderix/ANE` main. | ||
| 4. Post the benchmark comment to Issue #3 (link above). | ||
|
|
||
| --- | ||
|
|
||
| ## 5. Replace contributor name | ||
|
|
||
| In `benchmarks/community_results.json`, the new entry uses `"contributor": "log-wade"`. Change that to your GitHub username if different. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new guard rejects datasets with exactly
SEQ + 1tokens, but that case is still valid for one(input,target)window and does not underflowmax_pos = n_tokens - SEQ - 1(it becomes 0, soposis 0). As written, both this file andtraining/training_dynamic/train.mnow fail valid minimal datasets and smoke tests with the misleading message “need at leastSEQ + 2tokens.”Useful? React with 👍 / 👎.