fix(copilot): reduce premium request inflation, enable thinking, and use dynamic API limits#485
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances the GitHub Copilot executor by implementing local token counting using tiktoken, refining the detection of agent-initiated requests for accurate billing headers, and adding support for reasoning models. It also introduces a new management endpoint for quotas, strips unsupported Anthropic beta headers, and ensures default settings for the Responses API. Feedback was provided regarding the use of the passed context in the CountTokens method to ensure proper cancellation and tracing propagation.
There was a problem hiding this comment.
Pull request overview
This PR adjusts the GitHub Copilot integration to reduce premium request inflation (via corrected headers/defaults), prevent context overflow confusion from unsupported Anthropic betas, and enable “thinking” support for Copilot-hosted Claude models.
Changes:
- Add Responses API defaults, forward upstream response headers for non-streaming calls, and implement local tiktoken-based token counting for Copilot.
- Strip unsupported Anthropic betas (notably
context-1m-2025-08-07) from translated request bodies and refine X-Initiator detection logic. - Enable level-based thinking support for Copilot Claude models and add
reasoning_textfallback in the OpenAI→Claude response translator.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/translator/openai/claude/openai_claude_response.go | Adds fallback to reasoning_text when reasoning_content is absent/empty during translation. |
| internal/runtime/executor/github_copilot_executor.go | Applies Responses defaults, strips unsupported betas, forwards non-stream response headers, adds local token counting, and changes initiator detection. |
| internal/runtime/executor/github_copilot_executor_test.go | Expands/adjusts tests for initiator behavior, defaults, token counting, and beta stripping. |
| internal/registry/model_definitions.go | Enables level-based thinking support (low/medium/high) for Copilot Claude models. |
| internal/api/server.go | Registers GET /copilot-quota management route. |
| .github/workflows/fork-docker.yml | Adds a branch-scoped Docker build/push workflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b078e5d to
0cbd45e
Compare
This commit addresses three issues with Claude Code through GitHub Copilot: 1. **Premium request inflation**: Responses API requests were missing Openai-Intent headers and proper defaults, causing Copilot to bill each tool-loop continuation as a new premium request. Fixed by adding isAgentInitiated() heuristic (checks for tool_result content or preceding assistant tool_use), applying Responses API defaults (store, include, reasoning.summary), and local tiktoken-based token counting to avoid extra API calls. 2. **Context overflow**: Claude Code's modelSupports1M() hardcodes opus-4-6 as 1M-capable, but Copilot only supports ~128K-200K. Fixed by stripping the context-1m-2025-08-07 beta from translated request bodies. Also forwards response headers in non-streaming Execute() and registers the GET /copilot-quota management API route. 3. **Thinking not working**: Add ThinkingSupport with level-based reasoning to Claude models in the static definitions. Normalize Copilot's non-standard 'reasoning_text' response field to 'reasoning_content' before passing to the SDK translator. Use caller-provided context in CountTokens instead of Background().
0cbd45e to
59af2c5
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…t detection - Strip SSE `data:` prefix before normalizing reasoning_text→reasoning_content in streaming mode; re-wrap afterward for the translator - Iterate all choices in normalizeGitHubCopilotReasoningField (not just choices[0]) to support n>1 requests - Remove over-broad tool-role fallback in isAgentInitiated that scanned all messages for role:"tool", aligning with opencode's approach of only detecting active tool loops — genuine user follow-ups after tool use are no longer mis-classified as agent-initiated - Add 5 reasoning normalization tests; update 2 X-Initiator tests to match refined semantics
The Copilot API enforces per-account prompt token limits (128K individual, 168K business) that differ from the static 200K context length advertised by the proxy. This mismatch caused Claude Code to accumulate context beyond the actual limit, triggering "prompt token count exceeds the limit of 128000" errors. Changes: - Extract max_prompt_tokens and max_output_tokens from the Copilot /models API response (capabilities.limits) and use them as the authoritative ContextLength and MaxCompletionTokens values - Add CopilotModelLimits struct and Limits() helper to parse limits from the existing Capabilities map - Fix GitLab Duo context-1m beta header not being set when routing through the Anthropic gateway (gitlab_duo_force_context_1m attr was set but only gin headers were checked) - Fix flaky parallel tests that shared global model registry state
Summary
X-Initiator: agentto avoid consuming premium quota/modelsendpoint to prevent"prompt token count exceeds the limit of 128000"errors — the proxy now readsmax_prompt_tokensper account type (128K individual, 168K business) instead of hardcoding 200Kcontext-1mbeta header not being applied when routing through the Anthropic gatewayKey changes
Premium request inflation (
X-Initiator)isAgentInitiated()that detects tool_result content, preceding assistant tool_use, and Responses API function callsagenteven when Claude Code wraps tool results inrole: "user"messagesThinking support
Thinking: &ThinkingSupport{Levels: []string{"low", "medium", "high"}}to Copilot Claude model definitionsreasoning.summary: "auto"default whenreasoning.effortis present (matches vscode-copilot-chat behavior)reasoning.encrypted_contentfor reasoning content reuse across turnsDynamic context limits
CopilotModelLimitsstruct andLimits()method to extractcapabilities.limitsfrom the Copilot/modelsAPI responseContextLengthwithmax_prompt_tokensfrom the API — this is the hard limit that triggers 400 errorsMaxCompletionTokenswithmax_output_tokensfrom the APIOther fixes
context-1m-2025-08-07beta (unsupported by Copilot)reasoning_text→reasoning_contentfield mapping for streaming/non-streamingCountTokensendpointopenai-intentheader toconversation-editsgitlab_duo_force_context_1mattribute not being read by Claude executorTest plan
go build ./...compiles successfullygo test ./internal/auth/copilot/ ./internal/runtime/executor/ ./internal/registry/— all passTestCopilotModelEntry_Limits— 7 scenarios covering nil/missing/partial/individual/business limitsTestGitLabExecutorExecuteStreamUsesAnthropicGateway— previously failing, now fixedTestUseGitHubCopilotResponsesEndpoint_RegistryResponsesOnlyModel— flaky test fixed