Skip to content

Conversation

@mirrobot-agent
Copy link

@mirrobot-agent mirrobot-agent bot commented Dec 19, 2025

Summary

This PR refactors the Anthropic endpoint support from PR #45 by moving the translation layer into the rotator_library as a proper, reusable module.

Related to: #45

Changes

New Library Module: rotator_library/anthropic_compat/

  • models.py: Pydantic models for Anthropic API (requests, responses, content blocks)
  • translator.py: Format translation functions between Anthropic and OpenAI formats
  • streaming.py: Framework-agnostic streaming wrapper that converts OpenAI SSE to Anthropic SSE
  • __init__.py: Public exports

Updated rotator_library/client.py

Added two new methods to RotatingClient:

  • anthropic_messages() - Handle Anthropic Messages API requests
  • anthropic_count_tokens() - Handle token counting

Simplified proxy_app/main.py

  • Removed ~663 lines of local Anthropic code
  • Now imports models and functions from rotator_library.anthropic_compat
  • Endpoints work the same way but use library components
  • Fixed verify_anthropic_api_key to support open access mode

Benefits

  1. Reusability: The Anthropic translation layer can now be used by other applications using rotator_library
  2. Maintainability: Clear separation between library code and application code
  3. Testability: Library components can be unit tested independently
  4. Consistency: Follows the existing library architecture patterns

Files Changed

src/rotator_library/
├── anthropic_compat/
│   ├── __init__.py          (NEW)
│   ├── models.py            (NEW)
│   ├── translator.py        (NEW)
│   └── streaming.py         (NEW)
├── client.py                (MODIFIED)
└── __init__.py              (MODIFIED)

src/proxy_app/
└── main.py                  (MODIFIED)

Important

Refactor Anthropic translation layer into a reusable library module for improved code reusability and maintainability.

  • New Library Module: rotator_library/anthropic_compat/
    • models.py: Pydantic models for Anthropic API requests and responses.
    • translator.py: Functions for translating between Anthropic and OpenAI formats.
    • streaming.py: Converts OpenAI SSE to Anthropic SSE.
  • Client Updates in client.py:
    • Added anthropic_messages() and anthropic_count_tokens() methods to RotatingClient.
  • Main Application Simplification in main.py:
    • Removed ~663 lines of Anthropic-specific code.
    • Now uses rotator_library.anthropic_compat for Anthropic functionality.
    • Updated verify_anthropic_api_key() to support open access mode.

This description was created by Ellipsis for 9d30ea6. You can customize this summary. It will automatically update as commits are pushed.

FammasMaz and others added 18 commits December 19, 2025 15:03
…atibility

- Add /v1/messages endpoint with Anthropic-format request/response
- Support both x-api-key and Bearer token authentication
- Implement Anthropic <-> OpenAI format translation for messages, tools, and responses
- Add streaming wrapper converting OpenAI SSE to Anthropic SSE events
- Handle tool_use blocks with proper stop_reason detection
- Fix NoneType iteration bug in tool_calls handling
- Add AnthropicThinkingConfig model and thinking parameter to request
- Translate Anthropic thinking config to reasoning_effort for providers
- Handle reasoning_content in streaming wrapper (thinking_delta events)
- Convert reasoning_content to thinking blocks in non-streaming responses
When no thinking config is provided in the request, Opus models now
automatically use reasoning_effort=high with custom_reasoning_budget=True.

This ensures Opus 4.5 uses the full 32768 token thinking budget instead
of the backend's auto mode (thinkingBudget: -1) which may use less.

Opus always uses the -thinking variant regardless, but this change
guarantees maximum thinking capacity for better reasoning quality.
…ling

- Add validation to ensure maxOutputTokens > thinkingBudget for Claude
  extended thinking (prevents 400 INVALID_ARGUMENT API errors)
- Improve streaming error handling to send proper message_start and
  content blocks before error event for better client compatibility
- Minor code formatting improvements
Track each tool_use block index separately and emit content_block_stop
for all blocks (thinking, text, and each tool_use) when stream ends.
Fixes Claude Code stopping mid-action due to malformed streaming events.
…nabled

- Fixed bug where budget_tokens between 10000-32000 would get ÷4 reduction
- Now any explicit thinking request sets custom_reasoning_budget=True
- Added logging to show thinking budget, effort level, and custom_budget flag
- Simplified budget tier logic (removed redundant >= 32000 check)

Before: 31999 tokens requested → 8192 tokens actual (÷4 applied)
After:  31999 tokens requested → 32768 tokens actual (full "high" budget)
When using /v1/chat/completions with Opus and reasoning_effort="high" or
"medium", automatically set custom_reasoning_budget=true to get full
thinking tokens instead of the ÷4 reduced default.

This makes the OpenAI endpoint behave consistently with the Anthropic
endpoint for Opus models - if you're using Opus with high reasoning,
you want the full thinking budget.

Adds logging: "🧠 Thinking: auto-enabled custom_reasoning_budget for Opus"
…treaming

Claude Code and other Anthropic SDK clients require message_start to be
sent before any other SSE events. When a stream completed quickly without
content chunks, the wrapper would send message_stop without message_start,
causing clients to silently discard all output.
Signed-off-by: Moeeze Hassan <fammas.maz@gmail.com>
Extract Anthropic API models and format translation functions from main.py into reusable library module:

- models.py: Pydantic models for Anthropic Messages API (request/response)
- translator.py: Functions to convert between Anthropic and OpenAI formats
  - anthropic_to_openai_messages()
  - anthropic_to_openai_tools()
  - anthropic_to_openai_tool_choice()
  - openai_to_anthropic_response()
  - translate_anthropic_request() - High-level request translation

This is part of the refactoring to make Anthropic compatibility a proper library feature.
Add framework-agnostic streaming wrapper for Anthropic format:

- streaming.py: Converts OpenAI SSE format to Anthropic SSE format
  - Handles message_start, content_block_start/delta/stop, message_delta, message_stop
  - Supports text, thinking, and tool_use content blocks
  - Uses callback-based disconnect detection instead of FastAPI Request
  - Proper error handling with client-visible error blocks

- __init__.py: Export all models, translator functions, and streaming wrapper

The streaming wrapper is now reusable outside of FastAPI.
…ethods

Add high-level Anthropic API methods to RotatingClient:

- anthropic_messages(): Handle Anthropic Messages API requests
  - Accepts AnthropicMessagesRequest, translates to OpenAI format
  - Routes through existing acompletion() with full retry/rotation logic
  - Returns response in Anthropic format (streaming or non-streaming)

- anthropic_count_tokens(): Handle token counting for Anthropic requests
  - Counts tokens for messages and tools
  - Returns count in Anthropic format

These methods enable any application using rotator_library to support
Anthropic clients without needing to implement format translation.
…rary

Add lazy-loaded exports for Anthropic API models:

- AnthropicMessagesRequest
- AnthropicMessagesResponse
- AnthropicCountTokensRequest
- AnthropicCountTokensResponse

These can now be imported directly from rotator_library:
  from rotator_library import AnthropicMessagesRequest
Refactor main.py to use the anthropic_compat library module:

- Import models from rotator_library.anthropic_compat instead of local definitions
- Import translation functions from the library
- Remove ~130 lines of local Anthropic model definitions
- Keep endpoints functional by importing needed functions

The endpoints still work the same way, but now use the library's
reusable components. This is a stepping stone - the full client
methods (anthropic_messages, anthropic_count_tokens) can be used
for even simpler endpoints in the future.

Net reduction: ~130 lines removed from main.py
@mirrobot-agent
Copy link
Author

Time to examine my own handiwork! Let's see what past-me was thinking when moving 1,300+ lines of Anthropic compatibility code into the library. This should be interesting... 🔍

Copy link
Author

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-Review Assessment

Well, well, well... time to grade my own homework! 📝

Overall, this is a solid refactoring that moves the Anthropic compatibility layer from proxy_app into rotator_library as a reusable module. The architecture follows the existing library patterns, and the separation of concerns is clear. Past-me did a decent job here... mostly.

Architectural Reflections

The module structure (models.py, translator.py, streaming.py) is clean and intuitive. I particularly like that:

  • The streaming wrapper is framework-agnostic (accepts an optional disconnect callback rather than a FastAPI Request)
  • The lazy-loading in __init__.py maintains fast import times
  • The new RotatingClient methods provide a clean interface for Anthropic requests

Key Fixes I Should Make

  1. Code duplication in main.py: I created a nice translate_anthropic_request() function but then... didn't use it in the endpoint? Classic past-me.

  2. Type validation in models: The AnthropicThinkingConfig.type field accepts any string. Should use Literal["enabled", "disabled"] for proper validation.

  3. Silent JSON failures: When parsing tool call arguments fails, I silently return an empty dict. Adding a debug log would make troubleshooting much easier.

Things That Actually Look Good

  • The antigravity_provider.py addition for validating max_tokens > thinkingBudget is a good defensive check
  • Error handling in the streaming wrapper properly ensures message_start is sent before any error events
  • The token counting endpoint correctly handles both messages and tool definitions

Overall, this PR is ready to merge once the minor improvements above are addressed. Future-me thanks past-me for the decent documentation and clean module structure!

This self-review was generated by an AI assistant.

Comment on lines +1036 to +1089
# Convert Anthropic request to OpenAI format
anthropic_request = body.model_dump(exclude_none=True)

openai_messages = anthropic_to_openai_messages(
anthropic_request.get("messages", []), anthropic_request.get("system")
)

openai_tools = anthropic_to_openai_tools(anthropic_request.get("tools"))
openai_tool_choice = anthropic_to_openai_tool_choice(
anthropic_request.get("tool_choice")
)

# Build OpenAI-compatible request
openai_request = {
"model": body.model,
"messages": openai_messages,
"max_tokens": body.max_tokens,
"stream": body.stream or False,
}

if body.temperature is not None:
openai_request["temperature"] = body.temperature
if body.top_p is not None:
openai_request["top_p"] = body.top_p
if body.stop_sequences:
openai_request["stop"] = body.stop_sequences
if openai_tools:
openai_request["tools"] = openai_tools
if openai_tool_choice:
openai_request["tool_choice"] = openai_tool_choice

# Handle Anthropic thinking config -> reasoning_effort translation
if body.thinking:
if body.thinking.type == "enabled":
# Map budget_tokens to reasoning_effort level
# Default to "medium" if enabled but budget not specified
budget = body.thinking.budget_tokens or 10000
if budget >= 32000:
openai_request["reasoning_effort"] = "high"
openai_request["custom_reasoning_budget"] = True
elif budget >= 10000:
openai_request["reasoning_effort"] = "high"
elif budget >= 5000:
openai_request["reasoning_effort"] = "medium"
else:
openai_request["reasoning_effort"] = "low"
elif body.thinking.type == "disabled":
openai_request["reasoning_effort"] = "disable"
elif "opus" in body.model.lower():
# Force high thinking for Opus models when no thinking config is provided
# Opus 4.5 always uses the -thinking variant, so we want maximum thinking budget
# Without this, the backend defaults to thinkingBudget: -1 (auto) instead of high
openai_request["reasoning_effort"] = "high"
openai_request["custom_reasoning_budget"] = True
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it seems past-me got a bit ahead of himself! I specifically created translate_anthropic_request() in the library to handle this translation, but then proceeded to duplicate the logic here manually.

The thinking budget mapping logic (lines 1067-1089) is nearly identical to what I put in translator.py (lines 283-304). This could lead to drift if one is updated without the other.

I should either:

  1. Use translate_anthropic_request() directly, or
  2. Document why the endpoint needs different handling than the library method

Comment on lines +80 to +84
class AnthropicThinkingConfig(BaseModel):
"""Anthropic thinking configuration."""

type: str # "enabled" or "disabled"
budget_tokens: Optional[int] = None
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Past-me was feeling a bit lazy here. The type field should be constrained to valid values ("enabled" or "disabled"). Using a plain str allows invalid values to slip through validation.

Consider using:

from typing import Literal
# ...
type: Literal["enabled", "disabled"]

This gives proper IDE autocomplete and catches typos at validation time.

Comment on lines +246 to +249
try:
input_data = json.loads(func.get("arguments", "{}"))
except json.JSONDecodeError:
input_data = {}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My past-self used a silent fallback here - while pragmatic, it could mask issues where tool arguments are malformed. When tools mysteriously receive empty inputs, this will be hard to debug.

A quick logging statement would help future-me:

except json.JSONDecodeError as e:
    logging.debug(f"Failed to parse tool arguments: {e}")
    input_data = {}

@Mirrowel Mirrowel added enhancement New feature or request Agent Monitored Monitored for AI Agent to review PR's and commits Priority labels Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Agent Monitored Monitored for AI Agent to review PR's and commits enhancement New feature or request Priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants