Skip to content

feat(training): add military strikes forecasting notebook#45

Merged
bturtel merged 5 commits intomainfrom
feat/military-strikes-notebook
Mar 27, 2026
Merged

feat(training): add military strikes forecasting notebook#45
bturtel merged 5 commits intomainfrom
feat/military-strikes-notebook

Conversation

@bturtel
Copy link
Copy Markdown
Contributor

@bturtel bturtel commented Mar 24, 2026

Summary

  • New SDK example notebook: notebooks/fine_tuning/04_military_strikes.ipynb
  • E2E flow: data gen → train (gpt-oss-120b, RL) → eval vs GPT-5.4
  • Uses client-approved prompts covering global strike types (air, missile, drone, naval) and state/non-state actors
  • Training params match golf/WWTD experiments (lora_rank=32, batch_size=32, lr=4e-5)

cc @bartolomej — FYI, not ready for review yet

@bturtel
Copy link
Copy Markdown
Contributor Author

bturtel commented Mar 25, 2026

cc @bartolomej — training is working end to end, but crashed at step 52. You can use this to test.

Dataset to use:

dataset = lr.datasets.get("8841b3cb-35c5-496f-a217-a3ec50f0bdca")

@bturtel
Copy link
Copy Markdown
Contributor Author

bturtel commented Mar 25, 2026

cc @bartolomej — training is working end to end but crashed at step 52. You can use this to test.

Dataset:

dataset = lr.datasets.get("8841b3cb-35c5-496f-a217-a3ec50f0bdca")

Also parallelized the eval cell — Tinker RL, Tinker base, and GPT-5.4 benchmark now run concurrently instead of sequentially.

@bturtel
Copy link
Copy Markdown
Contributor Author

bturtel commented Mar 27, 2026

Training Complete — military-strikes-v2 (111/111 steps)

cc @bartolomej

v1 crashed at step 52. Restarted with the same dataset and config — v2 completed all 111 steps.


IDs & Checkpoints

Dataset ID 8841b3cb-35c5-496f-a217-a3ec50f0bdca
Training Job ID ee725279-f982-4ec5-bcb3-45318db1bb43
LR Model ID checkpoint:ee725279-f982-4ec5-bcb3-45318db1bb43
Eval Job ID b4a2edb6-ee56-4e48-9d13-fc066495ca6a
Base Model openai/gpt-oss-120b

Note: lr.predict() is currently returning 500 on the /openai endpoint (affects all models, not just checkpoints). Use lr.evals.run() for batch inference. Investigate /openai endpoint separately.


Reward Progress

Reward
Step 1 -0.9344
Step 52 (v1 crash point) -0.6380
Step 99 (best) -0.4879
Step 111 (final) -0.5644

Eval Results (993 test samples)

Evaluated via lr.evals.run() — all models get identical prompts and parsing, temperature=0.0.

Model Brier Score ECE BSS
gpt-oss-120b RL (v2, 111 steps) 0.2205 0.0991 +11.8%
GPT-5.4 (benchmark) 0.2466 0.2055 +1.4%
gpt-oss-120b (base) 0.2580 0.2130 -3.2%
  • RL vs GPT-5.4: 10.6% better Brier, 2× better calibration (ECE)
  • RL vs base: 14.5% better Brier
  • All models: 100% parse rate, 993/993 valid

Cost

  • Training: $35.22
  • Eval: $0.89

Notebook update

Cleared all outputs and removed cells 17-21 (the direct Tinker workaround from the v1 crash). The notebook now runs end-to-end through the SDK.

bturtel added 5 commits March 27, 2026 17:42
E2E SDK example: data gen, training (gpt-oss-120b), eval vs GPT-5.4.
Uses client-approved prompts covering global strike types and actors.
# Conflicts:
#	notebooks/fine_tuning/04_military_strikes.ipynb
Training completed successfully (111/111 steps), so the direct
Tinker evaluation cells (17-21) are no longer needed. Also fix
GPT-5.1 typo to GPT-5.4 in eval section header.
@bartolomej bartolomej force-pushed the feat/military-strikes-notebook branch from 0db6bec to e4b3d3d Compare March 27, 2026 16:50
Copy link
Copy Markdown
Collaborator

@bartolomej bartolomej changed the title Add military strikes forecasting notebook feat(training): add military strikes forecasting notebook Mar 27, 2026
Copy link
Copy Markdown
Collaborator

@bartolomej bartolomej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff!

I've opened this PR with eval API changes (to support evals on multiple models and intermediate checkpoints): https://app.graphite.com/github/pr/lightning-rod-labs/lightningrod-python-sdk/49

@bturtel bturtel merged commit 4cdc12f into main Mar 27, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants