Align training recipe and add validation logging#15
Align training recipe and add validation logging#15FrankHui wants to merge 1 commit intokyegomez:mainfrom
Conversation
Add a float16 GradScaler path for single-GPU training, include periodic validation loss/perplexity estimation, and update README training notes to match the current optimizer and precision behavior. Made-with: Cursor
|
Hi, Validation data sharding depends on Severity: action required | Category: correctness How to fix: Decouple sharding from workers Agent prompt to fix - you can give this to your LLM of choice:
We noticed a couple of other issues in this PR as well - happy to share if helpful. Spotted by Qodo code review - free for open-source projects. |
Summary
GradScalerpath for single-GPU training when bf16 is unavailableTest plan
python3 -m py_compile training/3b_fine_web_edu.pyval_everystepsMade with Cursor