Skip to content

Comments

feat: Adds intelligent tiered model routing#47

Merged
veerareddyvishal144 merged 17 commits intomainfrom
model-registry
Feb 22, 2026
Merged

feat: Adds intelligent tiered model routing#47
veerareddyvishal144 merged 17 commits intomainfrom
model-registry

Conversation

@vishalveerareddy123
Copy link
Collaborator

@vishalveerareddy123 vishalveerareddy123 commented Feb 12, 2026

Summary

This PR adds intelligent tiered model routing, new provider integrations (Moonshot AI), and significant improvements to routing infrastructure, documentation, and DevOps tooling.

New Providers

  • Moonshot AI (Kimi) — Full provider support via OpenAI-compatible API (invokeMoonshot). Includes model mapping, native system role support, tool calling, thinking model support (kimi-k2-thinking), and non-streaming mode.

4-Tier Intelligent Routing System

  • Complexity scoring (0-100) with 4-phase analysis: basic scoring, advanced classification, metrics tracking, optional embeddings
  • 4 tiers: SIMPLE (0-25), MEDIUM (26-50), COMPLEX (51-75), REASONING (76-100)
  • TIER_* env vars (TIER_SIMPLE, TIER_MEDIUM, TIER_COMPLEX, TIER_REASONING) in provider:model format override MODEL_PROVIDER for routing
  • Agentic workflow detection — identifies SINGLE_SHOT, TOOL_CHAIN, ITERATIVE, AUTONOMOUS patterns with automatic tier upgrades
  • Cost optimization — multi-source pricing (LiteLLM, models.dev, Databricks fallback) with automatic cheaper-model selection
  • 15-dimension weighted scoring mode (optional) for fine-grained complexity analysis

Bug Fixes

  • stop_reason detection — Check for actual tool_calls array presence instead of finish_reason string. Fixes tool calls not executing with Moonshot (and potentially other providers that return finish_reason: "stop" with tool_calls)
  • Streaming format mismatch — Force stream: false for OpenAI-format providers (Moonshot, Azure OpenAI) since OpenAI SSE to Anthropic SSE conversion is not implemented
  • Reasoning content handling — Use content field directly, fall back to reasoning_content only when content is empty. Fixes thinking model chain-of-thought leaking into CLI output
  • Orchestrator double-conversion — Add dedicated Moonshot/Z.AI cases in orchestrator to prevent re-converting already-converted Anthropic responses
  • Force-local routing — Respect TIER_SIMPLE config instead of hardcoding Ollama when force-local pattern matches
  • Duplicate tool calls — Fix duplicate tool call handling in message processing
  • Tier config crash — Fix crash when tier configuration is missing
  • IDE client tool filtering — Fix tool filtering for Codex CLI and IDE clients
  • Null-safety — Fix debug logging crash on null response fields

Routing Precedence (Documented)

Configuration Behavior
All 4 TIER_* set Tier routing active. MODEL_PROVIDER ignored for routing.
1-3 TIER_* set Tier routing disabled. MODEL_PROVIDER used.
No TIER_* set Static routing via MODEL_PROVIDER.
PREFER_OLLAMA Deprecated, no effect. Use TIER_SIMPLE=ollama:<model>.

Code Cleanup

  • Removed determineProviderSync() — dead code, no call sites
  • Deprecated PREFER_OLLAMA with runtime warning pointing to TIER_* vars

New Routing Modules

File Purpose
src/routing/model-tiers.js Tier definitions, TIER_* env var parsing, model selection
src/routing/agentic-detector.js Agentic workflow detection and classification
src/routing/cost-optimizer.js Cost tracking, cheapest model finder, savings calculation
src/routing/model-registry.js Multi-source pricing (LiteLLM, models.dev, Databricks)
src/routing/complexity-analyzer.js 4-phase complexity analysis, 15-dimension weighted scoring

Documentation

  • routing.md — New comprehensive routing docs with precedence hierarchy, scoring algorithm, agentic detection, cost optimization, decision flow
  • providers.md — Added Moonshot section (claude code >= 2.1.9 no longer works #10), updated configuration methods with clear TIER_* vs MODEL_PROVIDER explanation
  • troubleshooting.md — Added Moonshot troubleshooting (rate limits, auth, reasoning content)
  • installation.md — Added Moonshot quick start
  • faq.md — Updated provider counts, added Moonshot recommendations
  • .env.example — Added Moonshot config, expanded MODEL_PROVIDER comments explaining its role with tier routing
  • All docs — Updated provider counts from 9+ to 12+

DevOps

  • Synced Dockerfile and docker-compose.yml with all env vars
  • Added pino-roll file logging
  • Moved dockerode to optionalDependencies

Config Files

  • config/model-tiers.json — Tier preferences for all providers including Moonshot (kimi-k2-thinking for REASONING)
  • .env.example — Full Moonshot section, expanded routing documentation in comments

Test Plan

  • Server starts with MODEL_PROVIDER=moonshot and valid MOONSHOT_API_KEY
  • Moonshot handles simple text requests ("Hi", "23+45")
  • Moonshot tool calls execute correctly (Bash, Read, Search, Glob, Grep)
  • stop_reason: "tool_use" set correctly when tool_calls present
  • No streaming format mismatch (garbled terminal output)
  • Thinking model (kimi-k2-thinking) returns clean output without chain-of-thought
  • Tier routing overrides MODEL_PROVIDER when all 4 TIER_* set
  • Force-local patterns use TIER_SIMPLE config
  • Existing providers (Ollama, OpenRouter, Azure, etc.) unaffected

vishal veerareddy and others added 13 commits February 11, 2026 15:47
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix Codex Bash mapping: shell_command → shell (array format for command)
- Add missing Codex mappings: TodoWrite → update_plan, WebSearch → web_search
- Add two-layer tool filtering for IDE clients:
  Layer 1: IDE_SAFE_TOOLS removes AskUserQuestion (can't work through proxy)
  Layer 2: CLIENT_TOOL_MAPPINGS per-client filter ensures each client only
  sees tools it supports (e.g. Codex gets 8, Claude Code gets 14)
- Add tool name mapping to chat/completions response paths (streaming + non-streaming)
- Add missing Claude Code tools: MultiEdit, LS, NotebookRead
- Inject filtered tools in openai-router.js before orchestrator call to
  prevent providers from injecting full STANDARD_TOOLS

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Demote 22 info→debug in openai-router.js (request previews, tool injection, streaming chunks, intermediate conversions)
- Demote 39 info→debug in databricks.js (tool injection, request construction, response parsing across all providers)
- Clean up orchestrator/index.js: consolidate Ollama conversational check (6→1 log), headroom compression (4→1), tool execution mode (4→1); remove 4 console.log artifacts and [CONTEXT_FLOW] scaffolding
- Fix tier config: change hard throw to graceful warn when TIER_* env vars missing (was crashing CI)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds optional persistent log file rotation via pino-roll (LOG_FILE_ENABLED=true)
and expands the Structured Logging section in production.md with file logging
config, log level philosophy, and querying examples.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds missing sections to all config files: file logging (LOG_FILE_*),
rate limiting, policy, agents, token optimization, smart tool selection,
prompt/semantic cache, tiered routing, and provider configs (LM Studio,
Z.AI, Vertex AI). Adds /app/logs volume for persistent log rotation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
vishal veerareddy and others added 2 commits February 22, 2026 19:23
- Add Moonshot AI as first-class provider (invokeMoonshot, config, orchestrator, provider discovery)
- Fix stop_reason detection: check tool_calls presence instead of finish_reason string
- Fix streaming format mismatch: force non-streaming for OpenAI-format providers
- Fix reasoning content handling: use content field, fallback to reasoning_content
- Fix orchestrator double-conversion for Moonshot responses
- Fix force-local routing to respect TIER_SIMPLE config instead of hardcoding Ollama
- Remove dead code: determineProviderSync (unused sync routing fallback)
- Update routing docs: clear precedence hierarchy for TIER_* vs MODEL_PROVIDER vs PREFER_OLLAMA
- Add comprehensive Moonshot documentation across all doc files
- Add Moonshot to model-tiers.json (kimi-k2-thinking for REASONING tier)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ineProviderSync

Replace all determineProviderSync() calls in tests with async
determineProviderSmart() since the sync function was removed as dead code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@veerareddyvishal144 veerareddyvishal144 changed the title Model registry (feat) Adds intelligent tiered model routing Feb 22, 2026
@veerareddyvishal144 veerareddyvishal144 changed the title (feat) Adds intelligent tiered model routing feat: Adds intelligent tiered model routing Feb 22, 2026
vishal veerareddy and others added 2 commits February 22, 2026 20:01
…chestrator)

- Add missing logger require in src/api/router.js (used in streaming error handling)
- Fix clean.model → cleanPayload.model in orchestrator hybrid mode response

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@veerareddyvishal144 veerareddyvishal144 merged commit 2f6319b into main Feb 22, 2026
6 checks passed
@veerareddyvishal144 veerareddyvishal144 deleted the model-registry branch February 22, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants