Skip to content

Add Tinker SDK LLM router tutorial for Nemotron Nano 30B#155

Open
vikalluru wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
vikalluru:llm-router-tinker-tutorial
Open

Add Tinker SDK LLM router tutorial for Nemotron Nano 30B#155
vikalluru wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
vikalluru:llm-router-tinker-tutorial

Conversation

@vikalluru
Copy link
Copy Markdown

Summary

  • Adds a Jupyter notebook tutorial: DPO + RLVR fine-tuning of Nemotron Nano 30B as a production LLM router using Tinker SDK
  • Based on Glean's real production deployment: LoRA rank 32, RouterSearchEnv, reward = 0.7·R_search + 0.3·R_termination, vLLM on Vertex AI
  • Uses open-source training data: Salesforce/xlam-function-calling-60k
  • Covers full workflow: architecture overview → DPO fine-tuning → RLVR fine-tuning → vLLM serving config → offline eval
  • Production latency achieved: P50=250ms, P95=2.5s on H100 single-replica

Location

usage-cookbook/Nemotron-3-Nano/tinker_llm_router_tutorial.ipynb

Test plan

  • Notebook runs end-to-end on H100 with Tinker SDK installed
  • DPO training cell completes with LoRA rank 32 config
  • RLVR training cell completes with RouterSearchEnv reward function
  • vLLM serving config verified against production params
  • Offline eval cell produces router accuracy metrics

🤖 Generated with Claude Code

Joint NVIDIA × Glean tutorial demonstrating two-phase DPO + RLVR
fine-tuning of Nemotron Nano 30B as a production LLM router using
Tinker SDK. Based on Glean's production deployment (Salesforce/xlam
dataset, vLLM on Vertex AI, 250ms P50 / 2.5s P95 latency).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant