feat: Adds intelligent tiered model routing by vishalveerareddy123 · Pull Request #47 · Fast-Editor/Lynkr

vishalveerareddy123 · 2026-02-12T11:23:49Z

Summary

This PR adds intelligent tiered model routing, new provider integrations (Moonshot AI), and significant improvements to routing infrastructure, documentation, and DevOps tooling.

New Providers

Moonshot AI (Kimi) — Full provider support via OpenAI-compatible API (invokeMoonshot). Includes model mapping, native system role support, tool calling, thinking model support (kimi-k2-thinking), and non-streaming mode.

4-Tier Intelligent Routing System

Complexity scoring (0-100) with 4-phase analysis: basic scoring, advanced classification, metrics tracking, optional embeddings
4 tiers: SIMPLE (0-25), MEDIUM (26-50), COMPLEX (51-75), REASONING (76-100)
TIER_* env vars (TIER_SIMPLE, TIER_MEDIUM, TIER_COMPLEX, TIER_REASONING) in provider:model format override MODEL_PROVIDER for routing
Agentic workflow detection — identifies SINGLE_SHOT, TOOL_CHAIN, ITERATIVE, AUTONOMOUS patterns with automatic tier upgrades
Cost optimization — multi-source pricing (LiteLLM, models.dev, Databricks fallback) with automatic cheaper-model selection
15-dimension weighted scoring mode (optional) for fine-grained complexity analysis

Bug Fixes

stop_reason detection — Check for actual tool_calls array presence instead of finish_reason string. Fixes tool calls not executing with Moonshot (and potentially other providers that return finish_reason: "stop" with tool_calls)
Streaming format mismatch — Force stream: false for OpenAI-format providers (Moonshot, Azure OpenAI) since OpenAI SSE to Anthropic SSE conversion is not implemented
Reasoning content handling — Use content field directly, fall back to reasoning_content only when content is empty. Fixes thinking model chain-of-thought leaking into CLI output
Orchestrator double-conversion — Add dedicated Moonshot/Z.AI cases in orchestrator to prevent re-converting already-converted Anthropic responses
Force-local routing — Respect TIER_SIMPLE config instead of hardcoding Ollama when force-local pattern matches
Duplicate tool calls — Fix duplicate tool call handling in message processing
Tier config crash — Fix crash when tier configuration is missing
IDE client tool filtering — Fix tool filtering for Codex CLI and IDE clients
Null-safety — Fix debug logging crash on null response fields

Routing Precedence (Documented)

Configuration	Behavior
All 4 `TIER_*` set	Tier routing active. `MODEL_PROVIDER` ignored for routing.
1-3 `TIER_*` set	Tier routing disabled. `MODEL_PROVIDER` used.
No `TIER_*` set	Static routing via `MODEL_PROVIDER`.
`PREFER_OLLAMA`	Deprecated, no effect. Use `TIER_SIMPLE=ollama:<model>`.

Code Cleanup

Removed determineProviderSync() — dead code, no call sites
Deprecated PREFER_OLLAMA with runtime warning pointing to TIER_* vars

New Routing Modules

File	Purpose
`src/routing/model-tiers.js`	Tier definitions, `TIER_*` env var parsing, model selection
`src/routing/agentic-detector.js`	Agentic workflow detection and classification
`src/routing/cost-optimizer.js`	Cost tracking, cheapest model finder, savings calculation
`src/routing/model-registry.js`	Multi-source pricing (LiteLLM, models.dev, Databricks)
`src/routing/complexity-analyzer.js`	4-phase complexity analysis, 15-dimension weighted scoring

Documentation

routing.md — New comprehensive routing docs with precedence hierarchy, scoring algorithm, agentic detection, cost optimization, decision flow
providers.md — Added Moonshot section (claude code >= 2.1.9 no longer works #10), updated configuration methods with clear TIER_* vs MODEL_PROVIDER explanation
troubleshooting.md — Added Moonshot troubleshooting (rate limits, auth, reasoning content)
installation.md — Added Moonshot quick start
faq.md — Updated provider counts, added Moonshot recommendations
.env.example — Added Moonshot config, expanded MODEL_PROVIDER comments explaining its role with tier routing
All docs — Updated provider counts from 9+ to 12+

DevOps

Synced Dockerfile and docker-compose.yml with all env vars
Added pino-roll file logging
Moved dockerode to optionalDependencies

Config Files

config/model-tiers.json — Tier preferences for all providers including Moonshot (kimi-k2-thinking for REASONING)
.env.example — Full Moonshot section, expanded routing documentation in comments

Test Plan

Server starts with MODEL_PROVIDER=moonshot and valid MOONSHOT_API_KEY
Moonshot handles simple text requests ("Hi", "23+45")
Moonshot tool calls execute correctly (Bash, Read, Search, Glob, Grep)
stop_reason: "tool_use" set correctly when tool_calls present
No streaming format mismatch (garbled terminal output)
Thinking model (kimi-k2-thinking) returns clean output without chain-of-thought
Tier routing overrides MODEL_PROVIDER when all 4 TIER_* set
Force-local patterns use TIER_SIMPLE config
Existing providers (Ollama, OpenRouter, Azure, etc.) unaffected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix Codex Bash mapping: shell_command → shell (array format for command) - Add missing Codex mappings: TodoWrite → update_plan, WebSearch → web_search - Add two-layer tool filtering for IDE clients: Layer 1: IDE_SAFE_TOOLS removes AskUserQuestion (can't work through proxy) Layer 2: CLIENT_TOOL_MAPPINGS per-client filter ensures each client only sees tools it supports (e.g. Codex gets 8, Claude Code gets 14) - Add tool name mapping to chat/completions response paths (streaming + non-streaming) - Add missing Claude Code tools: MultiEdit, LS, NotebookRead - Inject filtered tools in openai-router.js before orchestrator call to prevent providers from injecting full STANDARD_TOOLS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Demote 22 info→debug in openai-router.js (request previews, tool injection, streaming chunks, intermediate conversions) - Demote 39 info→debug in databricks.js (tool injection, request construction, response parsing across all providers) - Clean up orchestrator/index.js: consolidate Ollama conversational check (6→1 log), headroom compression (4→1), tool execution mode (4→1); remove 4 console.log artifacts and [CONTEXT_FLOW] scaffolding - Fix tier config: change hard throw to graceful warn when TIER_* env vars missing (was crashing CI) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds optional persistent log file rotation via pino-roll (LOG_FILE_ENABLED=true) and expands the Structured Logging section in production.md with file logging config, log level philosophy, and querying examples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds missing sections to all config files: file logging (LOG_FILE_*), rate limiting, policy, agents, token optimization, smart tool selection, prompt/semantic cache, tiered routing, and provider configs (LM Studio, Z.AI, Vertex AI). Adds /app/logs volume for persistent log rotation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add Moonshot AI as first-class provider (invokeMoonshot, config, orchestrator, provider discovery) - Fix stop_reason detection: check tool_calls presence instead of finish_reason string - Fix streaming format mismatch: force non-streaming for OpenAI-format providers - Fix reasoning content handling: use content field, fallback to reasoning_content - Fix orchestrator double-conversion for Moonshot responses - Fix force-local routing to respect TIER_SIMPLE config instead of hardcoding Ollama - Remove dead code: determineProviderSync (unused sync routing fallback) - Update routing docs: clear precedence hierarchy for TIER_* vs MODEL_PROVIDER vs PREFER_OLLAMA - Add comprehensive Moonshot documentation across all doc files - Add Moonshot to model-tiers.json (kimi-k2-thinking for REASONING tier) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ineProviderSync Replace all determineProviderSync() calls in tests with async determineProviderSmart() since the sync function was removed as dead code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…chestrator) - Add missing logger require in src/api/router.js (used in streaming error handling) - Fix clean.model → cleanPayload.model in orchestrator hybrid mode response Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vishal veerareddy and others added 13 commits February 11, 2026 15:47

chore: move dockerode to optionalDependencies, wrap require

9dc8924

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Added tiered System

526fa30

Added docs

c5fc75d

remove unused

53b1047

Added gpt

f2be41d

Added latest ollama support

3d174ea

increased token limit

35ffa23

Fix duplicate tool calls

23e01e3

Updated branches and merge conflict

dcbc7d6

vishalveerareddy123 mentioned this pull request Feb 20, 2026

Ollama: long running requests without informing user #51

Open

vishal veerareddy and others added 2 commits February 22, 2026 19:23

fix: update tests to use determineProviderSmart after removing determ…

2ff889c

…ineProviderSync Replace all determineProviderSync() calls in tests with async determineProviderSmart() since the sync function was removed as dead code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

veerareddyvishal144 changed the title ~~Model registry~~ (feat) Adds intelligent tiered model routing Feb 22, 2026

veerareddyvishal144 changed the title ~~(feat) Adds intelligent tiered model routing~~ feat: Adds intelligent tiered model routing Feb 22, 2026

vishal veerareddy and others added 2 commits February 22, 2026 20:01

Result

6747ce6

veerareddyvishal144 merged commit 2f6319b into main Feb 22, 2026
6 checks passed

veerareddyvishal144 deleted the model-registry branch February 22, 2026 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

feat: Adds intelligent tiered model routing#47

feat: Adds intelligent tiered model routing#47
veerareddyvishal144 merged 17 commits intomainfrom
model-registry

vishalveerareddy123 commented Feb 12, 2026 •

edited by veerareddyvishal144

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

vishalveerareddy123 commented Feb 12, 2026 • edited by veerareddyvishal144 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Providers

4-Tier Intelligent Routing System

Bug Fixes

Routing Precedence (Documented)

Code Cleanup

New Routing Modules

Documentation

DevOps

Config Files

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vishalveerareddy123 commented Feb 12, 2026 •

edited by veerareddyvishal144

Loading