-
Notifications
You must be signed in to change notification settings - Fork 34
refactor: Move Anthropic translation layer to library #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…atibility - Add /v1/messages endpoint with Anthropic-format request/response - Support both x-api-key and Bearer token authentication - Implement Anthropic <-> OpenAI format translation for messages, tools, and responses - Add streaming wrapper converting OpenAI SSE to Anthropic SSE events - Handle tool_use blocks with proper stop_reason detection - Fix NoneType iteration bug in tool_calls handling
- Add AnthropicThinkingConfig model and thinking parameter to request - Translate Anthropic thinking config to reasoning_effort for providers - Handle reasoning_content in streaming wrapper (thinking_delta events) - Convert reasoning_content to thinking blocks in non-streaming responses
When no thinking config is provided in the request, Opus models now automatically use reasoning_effort=high with custom_reasoning_budget=True. This ensures Opus 4.5 uses the full 32768 token thinking budget instead of the backend's auto mode (thinkingBudget: -1) which may use less. Opus always uses the -thinking variant regardless, but this change guarantees maximum thinking capacity for better reasoning quality.
…ling - Add validation to ensure maxOutputTokens > thinkingBudget for Claude extended thinking (prevents 400 INVALID_ARGUMENT API errors) - Improve streaming error handling to send proper message_start and content blocks before error event for better client compatibility - Minor code formatting improvements
Track each tool_use block index separately and emit content_block_stop for all blocks (thinking, text, and each tool_use) when stream ends. Fixes Claude Code stopping mid-action due to malformed streaming events.
…nabled - Fixed bug where budget_tokens between 10000-32000 would get ÷4 reduction - Now any explicit thinking request sets custom_reasoning_budget=True - Added logging to show thinking budget, effort level, and custom_budget flag - Simplified budget tier logic (removed redundant >= 32000 check) Before: 31999 tokens requested → 8192 tokens actual (÷4 applied) After: 31999 tokens requested → 32768 tokens actual (full "high" budget)
When using /v1/chat/completions with Opus and reasoning_effort="high" or "medium", automatically set custom_reasoning_budget=true to get full thinking tokens instead of the ÷4 reduced default. This makes the OpenAI endpoint behave consistently with the Anthropic endpoint for Opus models - if you're using Opus with high reasoning, you want the full thinking budget. Adds logging: "🧠 Thinking: auto-enabled custom_reasoning_budget for Opus"
…treaming Claude Code and other Anthropic SDK clients require message_start to be sent before any other SSE events. When a stream completed quickly without content chunks, the wrapper would send message_stop without message_start, causing clients to silently discard all output.
Signed-off-by: Moeeze Hassan <fammas.maz@gmail.com>
This reverts commit e80645e.
…ing is enabled" This reverts commit 2ee549d.
Extract Anthropic API models and format translation functions from main.py into reusable library module: - models.py: Pydantic models for Anthropic Messages API (request/response) - translator.py: Functions to convert between Anthropic and OpenAI formats - anthropic_to_openai_messages() - anthropic_to_openai_tools() - anthropic_to_openai_tool_choice() - openai_to_anthropic_response() - translate_anthropic_request() - High-level request translation This is part of the refactoring to make Anthropic compatibility a proper library feature.
Add framework-agnostic streaming wrapper for Anthropic format: - streaming.py: Converts OpenAI SSE format to Anthropic SSE format - Handles message_start, content_block_start/delta/stop, message_delta, message_stop - Supports text, thinking, and tool_use content blocks - Uses callback-based disconnect detection instead of FastAPI Request - Proper error handling with client-visible error blocks - __init__.py: Export all models, translator functions, and streaming wrapper The streaming wrapper is now reusable outside of FastAPI.
…ethods Add high-level Anthropic API methods to RotatingClient: - anthropic_messages(): Handle Anthropic Messages API requests - Accepts AnthropicMessagesRequest, translates to OpenAI format - Routes through existing acompletion() with full retry/rotation logic - Returns response in Anthropic format (streaming or non-streaming) - anthropic_count_tokens(): Handle token counting for Anthropic requests - Counts tokens for messages and tools - Returns count in Anthropic format These methods enable any application using rotator_library to support Anthropic clients without needing to implement format translation.
…rary Add lazy-loaded exports for Anthropic API models: - AnthropicMessagesRequest - AnthropicMessagesResponse - AnthropicCountTokensRequest - AnthropicCountTokensResponse These can now be imported directly from rotator_library: from rotator_library import AnthropicMessagesRequest
Refactor main.py to use the anthropic_compat library module: - Import models from rotator_library.anthropic_compat instead of local definitions - Import translation functions from the library - Remove ~130 lines of local Anthropic model definitions - Keep endpoints functional by importing needed functions The endpoints still work the same way, but now use the library's reusable components. This is a stepping stone - the full client methods (anthropic_messages, anthropic_count_tokens) can be used for even simpler endpoints in the future. Net reduction: ~130 lines removed from main.py
|
Time to examine my own handiwork! Let's see what past-me was thinking when moving 1,300+ lines of Anthropic compatibility code into the library. This should be interesting... 🔍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-Review Assessment
Well, well, well... time to grade my own homework! 📝
Overall, this is a solid refactoring that moves the Anthropic compatibility layer from proxy_app into rotator_library as a reusable module. The architecture follows the existing library patterns, and the separation of concerns is clear. Past-me did a decent job here... mostly.
Architectural Reflections
The module structure (models.py, translator.py, streaming.py) is clean and intuitive. I particularly like that:
- The streaming wrapper is framework-agnostic (accepts an optional disconnect callback rather than a FastAPI Request)
- The lazy-loading in
__init__.pymaintains fast import times - The new
RotatingClientmethods provide a clean interface for Anthropic requests
Key Fixes I Should Make
-
Code duplication in main.py: I created a nice
translate_anthropic_request()function but then... didn't use it in the endpoint? Classic past-me. -
Type validation in models: The
AnthropicThinkingConfig.typefield accepts any string. Should useLiteral["enabled", "disabled"]for proper validation. -
Silent JSON failures: When parsing tool call arguments fails, I silently return an empty dict. Adding a debug log would make troubleshooting much easier.
Things That Actually Look Good
- The
antigravity_provider.pyaddition for validatingmax_tokens > thinkingBudgetis a good defensive check - Error handling in the streaming wrapper properly ensures
message_startis sent before any error events - The token counting endpoint correctly handles both messages and tool definitions
Overall, this PR is ready to merge once the minor improvements above are addressed. Future-me thanks past-me for the decent documentation and clean module structure!
This self-review was generated by an AI assistant.
| # Convert Anthropic request to OpenAI format | ||
| anthropic_request = body.model_dump(exclude_none=True) | ||
|
|
||
| openai_messages = anthropic_to_openai_messages( | ||
| anthropic_request.get("messages", []), anthropic_request.get("system") | ||
| ) | ||
|
|
||
| openai_tools = anthropic_to_openai_tools(anthropic_request.get("tools")) | ||
| openai_tool_choice = anthropic_to_openai_tool_choice( | ||
| anthropic_request.get("tool_choice") | ||
| ) | ||
|
|
||
| # Build OpenAI-compatible request | ||
| openai_request = { | ||
| "model": body.model, | ||
| "messages": openai_messages, | ||
| "max_tokens": body.max_tokens, | ||
| "stream": body.stream or False, | ||
| } | ||
|
|
||
| if body.temperature is not None: | ||
| openai_request["temperature"] = body.temperature | ||
| if body.top_p is not None: | ||
| openai_request["top_p"] = body.top_p | ||
| if body.stop_sequences: | ||
| openai_request["stop"] = body.stop_sequences | ||
| if openai_tools: | ||
| openai_request["tools"] = openai_tools | ||
| if openai_tool_choice: | ||
| openai_request["tool_choice"] = openai_tool_choice | ||
|
|
||
| # Handle Anthropic thinking config -> reasoning_effort translation | ||
| if body.thinking: | ||
| if body.thinking.type == "enabled": | ||
| # Map budget_tokens to reasoning_effort level | ||
| # Default to "medium" if enabled but budget not specified | ||
| budget = body.thinking.budget_tokens or 10000 | ||
| if budget >= 32000: | ||
| openai_request["reasoning_effort"] = "high" | ||
| openai_request["custom_reasoning_budget"] = True | ||
| elif budget >= 10000: | ||
| openai_request["reasoning_effort"] = "high" | ||
| elif budget >= 5000: | ||
| openai_request["reasoning_effort"] = "medium" | ||
| else: | ||
| openai_request["reasoning_effort"] = "low" | ||
| elif body.thinking.type == "disabled": | ||
| openai_request["reasoning_effort"] = "disable" | ||
| elif "opus" in body.model.lower(): | ||
| # Force high thinking for Opus models when no thinking config is provided | ||
| # Opus 4.5 always uses the -thinking variant, so we want maximum thinking budget | ||
| # Without this, the backend defaults to thinkingBudget: -1 (auto) instead of high | ||
| openai_request["reasoning_effort"] = "high" | ||
| openai_request["custom_reasoning_budget"] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it seems past-me got a bit ahead of himself! I specifically created translate_anthropic_request() in the library to handle this translation, but then proceeded to duplicate the logic here manually.
The thinking budget mapping logic (lines 1067-1089) is nearly identical to what I put in translator.py (lines 283-304). This could lead to drift if one is updated without the other.
I should either:
- Use
translate_anthropic_request()directly, or - Document why the endpoint needs different handling than the library method
| class AnthropicThinkingConfig(BaseModel): | ||
| """Anthropic thinking configuration.""" | ||
|
|
||
| type: str # "enabled" or "disabled" | ||
| budget_tokens: Optional[int] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Past-me was feeling a bit lazy here. The type field should be constrained to valid values ("enabled" or "disabled"). Using a plain str allows invalid values to slip through validation.
Consider using:
from typing import Literal
# ...
type: Literal["enabled", "disabled"]This gives proper IDE autocomplete and catches typos at validation time.
| try: | ||
| input_data = json.loads(func.get("arguments", "{}")) | ||
| except json.JSONDecodeError: | ||
| input_data = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My past-self used a silent fallback here - while pragmatic, it could mask issues where tool arguments are malformed. When tools mysteriously receive empty inputs, this will be hard to debug.
A quick logging statement would help future-me:
except json.JSONDecodeError as e:
logging.debug(f"Failed to parse tool arguments: {e}")
input_data = {}
Summary
This PR refactors the Anthropic endpoint support from PR #45 by moving the translation layer into the
rotator_libraryas a proper, reusable module.Related to: #45
Changes
New Library Module:
rotator_library/anthropic_compat/models.py: Pydantic models for Anthropic API (requests, responses, content blocks)translator.py: Format translation functions between Anthropic and OpenAI formatsstreaming.py: Framework-agnostic streaming wrapper that converts OpenAI SSE to Anthropic SSE__init__.py: Public exportsUpdated
rotator_library/client.pyAdded two new methods to
RotatingClient:anthropic_messages()- Handle Anthropic Messages API requestsanthropic_count_tokens()- Handle token countingSimplified
proxy_app/main.pyrotator_library.anthropic_compatverify_anthropic_api_keyto support open access modeBenefits
rotator_libraryFiles Changed
Important
Refactor Anthropic translation layer into a reusable library module for improved code reusability and maintainability.
rotator_library/anthropic_compat/models.py: Pydantic models for Anthropic API requests and responses.translator.py: Functions for translating between Anthropic and OpenAI formats.streaming.py: Converts OpenAI SSE to Anthropic SSE.client.py:anthropic_messages()andanthropic_count_tokens()methods toRotatingClient.main.py:rotator_library.anthropic_compatfor Anthropic functionality.verify_anthropic_api_key()to support open access mode.This description was created by
for 9d30ea6. You can customize this summary. It will automatically update as commits are pushed.