fix(copilot): reduce premium request inflation, enable thinking, and use dynamic API limits by kunish · Pull Request #485 · router-for-me/CLIProxyAPIPlus

kunish · 2026-04-03T11:24:11Z

Summary

Reduce premium request inflation by correctly detecting agent-initiated requests (tool loops, continuations) vs user-initiated requests, setting X-Initiator: agent to avoid consuming premium quota
Enable thinking/reasoning for Copilot Claude models (Opus 4.5/4.6, Sonnet 4/4.5/4.6) with low/medium/high effort levels
Use dynamic API limits from the Copilot /models endpoint to prevent "prompt token count exceeds the limit of 128000" errors — the proxy now reads max_prompt_tokens per account type (128K individual, 168K business) instead of hardcoding 200K
Fix GitLab Duo context-1m beta header not being applied when routing through the Anthropic gateway
Fix flaky parallel tests that shared global model registry state

Key changes

Premium request inflation (`X-Initiator`)

Replace naive last-message role check with isAgentInitiated() that detects tool_result content, preceding assistant tool_use, and Responses API function calls
Correctly marks tool loop continuations as agent even when Claude Code wraps tool results in role: "user" messages

Thinking support

Add Thinking: &ThinkingSupport{Levels: []string{"low", "medium", "high"}} to Copilot Claude model definitions
Set reasoning.summary: "auto" default when reasoning.effort is present (matches vscode-copilot-chat behavior)
Include reasoning.encrypted_content for reasoning content reuse across turns

Dynamic context limits

Add CopilotModelLimits struct and Limits() method to extract capabilities.limits from the Copilot /models API response
Override static ContextLength with max_prompt_tokens from the API — this is the hard limit that triggers 400 errors
Override static MaxCompletionTokens with max_output_tokens from the API
Fallback to static values when API limits are unavailable

Other fixes

Strip context-1m-2025-08-07 beta (unsupported by Copilot)
Normalize reasoning_text → reasoning_content field mapping for streaming/non-streaming
Local token counting via tiktoken for CountTokens endpoint
Fix openai-intent header to conversation-edits
Fix GitLab Duo gitlab_duo_force_context_1m attribute not being read by Claude executor

Test plan

go build ./... compiles successfully
go test ./internal/auth/copilot/ ./internal/runtime/executor/ ./internal/registry/ — all pass
TestCopilotModelEntry_Limits — 7 scenarios covering nil/missing/partial/individual/business limits
TestGitLabExecutorExecuteStreamUsesAnthropicGateway — previously failing, now fixed
TestUseGitHubCopilotResponsesEndpoint_RegistryResponsesOnlyModel — flaky test fixed

gemini-code-assist

Code Review

This pull request enhances the GitHub Copilot executor by implementing local token counting using tiktoken, refining the detection of agent-initiated requests for accurate billing headers, and adding support for reasoning models. It also introduces a new management endpoint for quotas, strips unsupported Anthropic beta headers, and ensures default settings for the Responses API. Feedback was provided regarding the use of the passed context in the CountTokens method to ensure proper cancellation and tracing propagation.

internal/runtime/executor/github_copilot_executor.go

Copilot

Pull request overview

This PR adjusts the GitHub Copilot integration to reduce premium request inflation (via corrected headers/defaults), prevent context overflow confusion from unsupported Anthropic betas, and enable “thinking” support for Copilot-hosted Claude models.

Changes:

Add Responses API defaults, forward upstream response headers for non-streaming calls, and implement local tiktoken-based token counting for Copilot.
Strip unsupported Anthropic betas (notably context-1m-2025-08-07) from translated request bodies and refine X-Initiator detection logic.
Enable level-based thinking support for Copilot Claude models and add reasoning_text fallback in the OpenAI→Claude response translator.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
internal/translator/openai/claude/openai_claude_response.go	Adds fallback to `reasoning_text` when `reasoning_content` is absent/empty during translation.
internal/runtime/executor/github_copilot_executor.go	Applies Responses defaults, strips unsupported betas, forwards non-stream response headers, adds local token counting, and changes initiator detection.
internal/runtime/executor/github_copilot_executor_test.go	Expands/adjusts tests for initiator behavior, defaults, token counting, and beta stripping.
internal/registry/model_definitions.go	Enables level-based thinking support (`low/medium/high`) for Copilot Claude models.
internal/api/server.go	Registers `GET /copilot-quota` management route.
.github/workflows/fork-docker.yml	Adds a branch-scoped Docker build/push workflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

internal/runtime/executor/github_copilot_executor.go

.github/workflows/fork-docker.yml

This commit addresses three issues with Claude Code through GitHub Copilot: 1. **Premium request inflation**: Responses API requests were missing Openai-Intent headers and proper defaults, causing Copilot to bill each tool-loop continuation as a new premium request. Fixed by adding isAgentInitiated() heuristic (checks for tool_result content or preceding assistant tool_use), applying Responses API defaults (store, include, reasoning.summary), and local tiktoken-based token counting to avoid extra API calls. 2. **Context overflow**: Claude Code's modelSupports1M() hardcodes opus-4-6 as 1M-capable, but Copilot only supports ~128K-200K. Fixed by stripping the context-1m-2025-08-07 beta from translated request bodies. Also forwards response headers in non-streaming Execute() and registers the GET /copilot-quota management API route. 3. **Thinking not working**: Add ThinkingSupport with level-based reasoning to Claude models in the static definitions. Normalize Copilot's non-standard 'reasoning_text' response field to 'reasoning_content' before passing to the SDK translator. Use caller-provided context in CountTokens instead of Background().

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

internal/runtime/executor/github_copilot_executor.go

…t detection - Strip SSE `data:` prefix before normalizing reasoning_text→reasoning_content in streaming mode; re-wrap afterward for the translator - Iterate all choices in normalizeGitHubCopilotReasoningField (not just choices[0]) to support n>1 requests - Remove over-broad tool-role fallback in isAgentInitiated that scanned all messages for role:"tool", aligning with opencode's approach of only detecting active tool loops — genuine user follow-ups after tool use are no longer mis-classified as agent-initiated - Add 5 reasoning normalization tests; update 2 X-Initiator tests to match refined semantics

The Copilot API enforces per-account prompt token limits (128K individual, 168K business) that differ from the static 200K context length advertised by the proxy. This mismatch caused Claude Code to accumulate context beyond the actual limit, triggering "prompt token count exceeds the limit of 128000" errors. Changes: - Extract max_prompt_tokens and max_output_tokens from the Copilot /models API response (capabilities.limits) and use them as the authoritative ContextLength and MaxCompletionTokens values - Add CopilotModelLimits struct and Limits() helper to parse limits from the existing Capabilities map - Fix GitLab Duo context-1m beta header not being set when routing through the Anthropic gateway (gitlab_duo_force_context_1m attr was set but only gin headers were checked) - Fix flaky parallel tests that shared global model registry state

Copilot AI review requested due to automatic review settings April 3, 2026 11:24

Copilot started reviewing on behalf of kunish April 3, 2026 11:24 View session

gemini-code-assist bot reviewed Apr 3, 2026

View reviewed changes

internal/runtime/executor/github_copilot_executor.go Outdated Show resolved Hide resolved

kunish mentioned this pull request Apr 3, 2026

feat: add GitHub Copilot premium request quota display router-for-me/Cli-Proxy-API-Management-Center#189

Closed

4 tasks

Copilot AI reviewed Apr 3, 2026

View reviewed changes

internal/runtime/executor/github_copilot_executor.go Outdated Show resolved Hide resolved

internal/runtime/executor/github_copilot_executor.go Outdated Show resolved Hide resolved

.github/workflows/fork-docker.yml Outdated Show resolved Hide resolved

kunish force-pushed the fix/copilot-premium-request-inflation branch 2 times, most recently from b078e5d to 0cbd45e Compare April 3, 2026 11:48

kunish force-pushed the fix/copilot-premium-request-inflation branch from 0cbd45e to 59af2c5 Compare April 3, 2026 12:24

kunish requested a review from Copilot April 3, 2026 12:31

Copilot started reviewing on behalf of kunish April 3, 2026 12:32 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

internal/runtime/executor/github_copilot_executor.go Outdated Show resolved Hide resolved

internal/runtime/executor/github_copilot_executor.go Outdated Show resolved Hide resolved

internal/runtime/executor/github_copilot_executor.go Outdated Show resolved Hide resolved

kunish added 2 commits April 3, 2026 20:51

kunish changed the title ~~fix(copilot): reduce premium request inflation and enable thinking~~ fix(copilot): reduce premium request inflation, enable thinking, and use dynamic API limits Apr 3, 2026

luispater merged commit 98509f6 into router-for-me:main Apr 3, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(copilot): reduce premium request inflation, enable thinking, and use dynamic API limits#485

fix(copilot): reduce premium request inflation, enable thinking, and use dynamic API limits#485
luispater merged 3 commits intorouter-for-me:mainfrom
kunish:fix/copilot-premium-request-inflation

kunish commented Apr 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

kunish commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes

Premium request inflation (X-Initiator)

Thinking support

Dynamic context limits

Other fixes

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kunish commented Apr 3, 2026 •

edited

Loading

Premium request inflation (`X-Initiator`)