feat(training): add military strikes forecasting notebook#45
Conversation
|
cc @bartolomej — training is working end to end, but crashed at step 52. You can use this to test. Dataset to use: dataset = lr.datasets.get("8841b3cb-35c5-496f-a217-a3ec50f0bdca") |
|
cc @bartolomej — training is working end to end but crashed at step 52. You can use this to test. Dataset: dataset = lr.datasets.get("8841b3cb-35c5-496f-a217-a3ec50f0bdca")Also parallelized the eval cell — Tinker RL, Tinker base, and GPT-5.4 benchmark now run concurrently instead of sequentially. |
Training Complete — military-strikes-v2 (111/111 steps)cc @bartolomej v1 crashed at step 52. Restarted with the same dataset and config — v2 completed all 111 steps. IDs & Checkpoints
Reward Progress
Eval Results (993 test samples)Evaluated via
Cost
Notebook updateCleared all outputs and removed cells 17-21 (the direct Tinker workaround from the v1 crash). The notebook now runs end-to-end through the SDK. |
E2E SDK example: data gen, training (gpt-oss-120b), eval vs GPT-5.4. Uses client-approved prompts covering global strike types and actors.
# Conflicts: # notebooks/fine_tuning/04_military_strikes.ipynb
Training completed successfully (111/111 steps), so the direct Tinker evaluation cells (17-21) are no longer needed. Also fix GPT-5.1 typo to GPT-5.4 in eval section header.
0db6bec to
e4b3d3d
Compare
This stack of pull requests is managed by Graphite. Learn more about stacking. |
bartolomej
left a comment
There was a problem hiding this comment.
Awesome stuff!
I've opened this PR with eval API changes (to support evals on multiple models and intermediate checkpoints): https://app.graphite.com/github/pr/lightning-rod-labs/lightningrod-python-sdk/49

Summary
notebooks/fine_tuning/04_military_strikes.ipynbcc @bartolomej — FYI, not ready for review yet