HydraCodeDone: LLM Critique Proxy

This project provides a self-hosted FastAPI proxy server designed to integrate with any OpenAI-compatible API service. It implements a dual-model critique pipeline: an initial response is generated by a primary LLM, and then a second LLM (the critique model) refines or critiques this response. The proxy provides full OpenAI API compatibility for seamless integration with various IDEs and tools, allowing you to use local models (via Ollama), OpenAI's API, or any other OpenAI-compatible service.

Features

Universal OpenAI API Compatibility: Exposes a /v1/chat/completions endpoint that fully mirrors the OpenAI API, allowing easy integration with tools expecting this format. Works with Ollama, OpenAI, Azure OpenAI, Anthropic, and any other OpenAI-compatible service. Tested with Roo cline, continue.dev, open webui.
Dual-Model Critique Pipeline: An initial response is generated by a primary LLM. A second LLM (the critique model) then refines this response. The critique model is guided by a detailed system prompt and the full context of the original request, enabling it to improve accuracy, adhere to formatting, and produce output suitable for direct IDE/tool consumption.
Official OpenAI Pydantic Models: Leverages Pydantic models from the official openai Python library (v1.0+) for request validation and response structuring, ensuring type safety and compatibility.
Flexible Backend Support: Configure any OpenAI-compatible API endpoint - local Ollama, OpenAI's API, Azure OpenAI, or custom LLM services.
Dockerized: Runs as a Docker container using Docker Compose, with optional Ollama service for managing local LLMs.
Configurable: Uses environment variables for easy configuration of models, prompts, and logging.
Streaming Responses: Supports the OpenAI stream parameter and returns Server-Sent Events (SSE), enabling real-time token streaming in Continue.dev, OpenWebUI, and other clients.

Critique Model Behavior Note

It's important to note that the critique model (Model 2), guided by the CRITIQUE_SYSTEM_PROMPT, typically provides the core refined content. For instance, if Model 1's response includes conversational preamble or specific structural formatting (like markdown code blocks with file paths as seen in some IDE prompts), Model 2's refined output will often be the essential content itself (e.g., just the code), having stripped away the surrounding elements. This is by design to provide a clean, direct output for tools that consume the API.

Prerequisites

Python 3.11+ (pytest for testing)
Docker
Docker Compose (V2 recommended, i.e., docker compose command)

Environment Variables

Create a .env file in the project root by copying .env.example (if provided) or creating it from scratch. Populate it with the following variables:

OPENAI_BASE_URL: The base URL for any OpenAI-compatible service. This should be http://your_openai_service:11434/v1 (for Ollama) or https://api.openai.com/v1 (for OpenAI).
PRIMARY_MODEL_NAME: The name of the model to be used for generating the initial response (e.g., llama3.2, gpt-4).
CRITIQUE_MODEL_NAME: The name of the Ollama model to be used for critiquing the initial response (e.g., deepseek-r1:8b). If commented out or empty, the critique step will be skipped.
CRITIQUE_SYSTEM_PROMPT: The detailed system prompt used to guide the critique model. This prompt instructs Model 2 on how to analyze Model 1's response in the context of the entire original user request, focusing on correctness, completeness, and adherence to any implicit or explicit formatting requirements from the original request. The goal is for Model 2 to produce a polished, final response. The actual prompt is multi-line and should be defined in your .env file (see example below).
LOG_LEVEL: The logging level for the application (e.g., INFO, DEBUG). Defaults to INFO.

Example .env file:

OPENAI_BASE_URL="http://localhost:11434/v1"
PRIMARY_MODEL_NAME="llama3.2"
CRITIQUE_MODEL_NAME="deepseek-r1:8b"
CRITIQUE_SYSTEM_PROMPT="You are now in a critique and refinement phase.\nBased on the entire preceding conversation, including the user's original request and the last AI's response:\n1. Identify areas for improvement in the LAST AI's response. Focus on:\n   - Correcting bugs, syntax errors, and typos.\n   - Addressing logic issues.\n   - Enhancing clarity, conciseness, and overall quality.\n   - Ensuring the response fully addresses the user's original query.\n2. Provide a revised and improved response.\n3. CRUCIAL: Your revised response MUST strictly adhere to any output formatting, structural requirements, or specific instructions implied by the user's original request(s) earlier in the conversation.\nYour goal is to produce a polished version suitable for direct use by the user's IDE/tool.\nPlease provide ONLY the final, refined response according to these instructions."
LOG_LEVEL="INFO"

Backend Configuration Examples

Local Ollama

OPENAI_BASE_URL="http://localhost:11434/v1"
PRIMARY_MODEL_NAME="llama3.2"
CRITIQUE_MODEL_NAME="deepseek-r1:8b"

OpenAI API

OPENAI_BASE_URL="https://api.openai.com/v1"
PRIMARY_MODEL_NAME="gpt-4"
CRITIQUE_MODEL_NAME="gpt-4"

Azure OpenAI

OPENAI_BASE_URL="https://your-resource.openai.azure.com/openai/deployments/your-deployment/chat/completions?api-version=2023-12-01-preview"
PRIMARY_MODEL_NAME="gpt-4"
CRITIQUE_MODEL_NAME="gpt-4"

Anthropic Claude (via OpenAI-compatible wrapper)

OPENAI_BASE_URL="https://api.anthropic-proxy.com/v1"
PRIMARY_MODEL_NAME="claude-3-sonnet-20240229"
CRITIQUE_MODEL_NAME="claude-3-haiku-20240307"

Running the Application

Ensure Docker is running.
Navigate to the project root directory in your terminal.
Build and start the services using Docker Compose:
```
docker compose up -d --build
```
This command will build the llm_proxy_service image and start it.
The LLM proxy service will be available at http://localhost:3101 (or the port configured in docker-compose.yaml).

To stop the services:

docker compose down

API Endpoints

GET /health: Basic liveness probe.
GET /v1/health: OpenAI-style health probe used by Continue.dev (returns the same {"status":"ok"}).
GET /v1/models: Lists available model IDs in OpenAI List Models format.
POST /v1/chat/completions: OpenAI-compatible chat completions endpoint.
- Request Body: Follows the OpenAI ChatCompletion API schema (e.g., model, messages array).
- Response Body: Follows the OpenAI ChatCompletion API schema, including choices and usage (usage stats are currently placeholders).
POST /chat/completions: Alias without /v1 prefix for clients that call it directly.

Streaming Usage

The proxy fully supports streaming in the same way as the OpenAI API. Set "stream": true in your request and the response will be delivered as a text/event-stream where each line begins with data: followed by a JSON chunk. The stream terminates with data: [DONE].

Example curl request:

curl -X POST "http://localhost:3101/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hydracodedone",
    "stream": true,
    "messages": [
      {"role": "user", "content": "What is Ruby?"}
    ]
  }'

Testing

Unit and integration tests (covering both non-streaming and streaming code paths) are written using pytest. The suite mocks Ollama HTTP calls with respx and patches streaming generators for fast, isolated testing.

Ensure your virtual environment is activated and development dependencies are installed:
```
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
Run tests from the project root:
```
.venv/bin/python -m pytest
```
Or, if your PATH is set up correctly after activating the venv:
```
pytest
```

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HydraCodeDone: LLM Critique Proxy

Features

Critique Model Behavior Note

Prerequisites

Environment Variables

Backend Configuration Examples

Local Ollama

OpenAI API

Azure OpenAI

Anthropic Claude (via OpenAI-compatible wrapper)

Running the Application

API Endpoints

Streaming Usage

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HydraCodeDone: LLM Critique Proxy

Features

Critique Model Behavior Note

Prerequisites

Environment Variables

Backend Configuration Examples

Local Ollama

OpenAI API

Azure OpenAI

Anthropic Claude (via OpenAI-compatible wrapper)

Running the Application

API Endpoints

Streaming Usage

Testing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages