Skip to content

swein/hydracodedone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HydraCodeDone: LLM Critique Proxy

This project provides a self-hosted FastAPI proxy server designed to integrate with any OpenAI-compatible API service. It implements a dual-model critique pipeline: an initial response is generated by a primary LLM, and then a second LLM (the critique model) refines or critiques this response. The proxy provides full OpenAI API compatibility for seamless integration with various IDEs and tools, allowing you to use local models (via Ollama), OpenAI's API, or any other OpenAI-compatible service.

Features

  • Universal OpenAI API Compatibility: Exposes a /v1/chat/completions endpoint that fully mirrors the OpenAI API, allowing easy integration with tools expecting this format. Works with Ollama, OpenAI, Azure OpenAI, Anthropic, and any other OpenAI-compatible service. Tested with Roo cline, continue.dev, open webui.
  • Dual-Model Critique Pipeline: An initial response is generated by a primary LLM. A second LLM (the critique model) then refines this response. The critique model is guided by a detailed system prompt and the full context of the original request, enabling it to improve accuracy, adhere to formatting, and produce output suitable for direct IDE/tool consumption.
  • Official OpenAI Pydantic Models: Leverages Pydantic models from the official openai Python library (v1.0+) for request validation and response structuring, ensuring type safety and compatibility.
  • Flexible Backend Support: Configure any OpenAI-compatible API endpoint - local Ollama, OpenAI's API, Azure OpenAI, or custom LLM services.
  • Dockerized: Runs as a Docker container using Docker Compose, with optional Ollama service for managing local LLMs.
  • Configurable: Uses environment variables for easy configuration of models, prompts, and logging.
  • Streaming Responses: Supports the OpenAI stream parameter and returns Server-Sent Events (SSE), enabling real-time token streaming in Continue.dev, OpenWebUI, and other clients.

Critique Model Behavior Note

It's important to note that the critique model (Model 2), guided by the CRITIQUE_SYSTEM_PROMPT, typically provides the core refined content. For instance, if Model 1's response includes conversational preamble or specific structural formatting (like markdown code blocks with file paths as seen in some IDE prompts), Model 2's refined output will often be the essential content itself (e.g., just the code), having stripped away the surrounding elements. This is by design to provide a clean, direct output for tools that consume the API.

Prerequisites

  • Python 3.11+ (pytest for testing)
  • Docker
  • Docker Compose (V2 recommended, i.e., docker compose command)

Environment Variables

Create a .env file in the project root by copying .env.example (if provided) or creating it from scratch. Populate it with the following variables:

  • OPENAI_BASE_URL: The base URL for any OpenAI-compatible service. This should be http://your_openai_service:11434/v1 (for Ollama) or https://api.openai.com/v1 (for OpenAI).
  • PRIMARY_MODEL_NAME: The name of the model to be used for generating the initial response (e.g., llama3.2, gpt-4).
  • CRITIQUE_MODEL_NAME: The name of the Ollama model to be used for critiquing the initial response (e.g., deepseek-r1:8b). If commented out or empty, the critique step will be skipped.
  • CRITIQUE_SYSTEM_PROMPT: The detailed system prompt used to guide the critique model. This prompt instructs Model 2 on how to analyze Model 1's response in the context of the entire original user request, focusing on correctness, completeness, and adherence to any implicit or explicit formatting requirements from the original request. The goal is for Model 2 to produce a polished, final response. The actual prompt is multi-line and should be defined in your .env file (see example below).
  • LOG_LEVEL: The logging level for the application (e.g., INFO, DEBUG). Defaults to INFO.

Example .env file:

OPENAI_BASE_URL="http://localhost:11434/v1"
PRIMARY_MODEL_NAME="llama3.2"
CRITIQUE_MODEL_NAME="deepseek-r1:8b"
CRITIQUE_SYSTEM_PROMPT="You are now in a critique and refinement phase.\nBased on the entire preceding conversation, including the user's original request and the last AI's response:\n1. Identify areas for improvement in the LAST AI's response. Focus on:\n   - Correcting bugs, syntax errors, and typos.\n   - Addressing logic issues.\n   - Enhancing clarity, conciseness, and overall quality.\n   - Ensuring the response fully addresses the user's original query.\n2. Provide a revised and improved response.\n3. CRUCIAL: Your revised response MUST strictly adhere to any output formatting, structural requirements, or specific instructions implied by the user's original request(s) earlier in the conversation.\nYour goal is to produce a polished version suitable for direct use by the user's IDE/tool.\nPlease provide ONLY the final, refined response according to these instructions."
LOG_LEVEL="INFO"

Backend Configuration Examples

Local Ollama

OPENAI_BASE_URL="http://localhost:11434/v1"
PRIMARY_MODEL_NAME="llama3.2"
CRITIQUE_MODEL_NAME="deepseek-r1:8b"

OpenAI API

OPENAI_BASE_URL="https://api.openai.com/v1"
PRIMARY_MODEL_NAME="gpt-4"
CRITIQUE_MODEL_NAME="gpt-4"

Azure OpenAI

OPENAI_BASE_URL="https://your-resource.openai.azure.com/openai/deployments/your-deployment/chat/completions?api-version=2023-12-01-preview"
PRIMARY_MODEL_NAME="gpt-4"
CRITIQUE_MODEL_NAME="gpt-4"

Anthropic Claude (via OpenAI-compatible wrapper)

OPENAI_BASE_URL="https://api.anthropic-proxy.com/v1"
PRIMARY_MODEL_NAME="claude-3-sonnet-20240229"
CRITIQUE_MODEL_NAME="claude-3-haiku-20240307"

Running the Application

  1. Ensure Docker is running.
  2. Navigate to the project root directory in your terminal.
  3. Build and start the services using Docker Compose:
    docker compose up -d --build
    This command will build the llm_proxy_service image and start it.
  4. The LLM proxy service will be available at http://localhost:3101 (or the port configured in docker-compose.yaml).

To stop the services:

docker compose down

API Endpoints

  • GET /health: Basic liveness probe.

  • GET /v1/health: OpenAI-style health probe used by Continue.dev (returns the same {"status":"ok"}).

  • GET /v1/models: Lists available model IDs in OpenAI List Models format.

  • POST /v1/chat/completions: OpenAI-compatible chat completions endpoint.

    • Request Body: Follows the OpenAI ChatCompletion API schema (e.g., model, messages array).
    • Response Body: Follows the OpenAI ChatCompletion API schema, including choices and usage (usage stats are currently placeholders).
  • POST /chat/completions: Alias without /v1 prefix for clients that call it directly.

Streaming Usage

The proxy fully supports streaming in the same way as the OpenAI API. Set "stream": true in your request and the response will be delivered as a text/event-stream where each line begins with data: followed by a JSON chunk. The stream terminates with data: [DONE].

Example curl request:

curl -X POST "http://localhost:3101/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hydracodedone",
    "stream": true,
    "messages": [
      {"role": "user", "content": "What is Ruby?"}
    ]
  }'

Testing

Unit and integration tests (covering both non-streaming and streaming code paths) are written using pytest. The suite mocks Ollama HTTP calls with respx and patches streaming generators for fast, isolated testing.

  1. Ensure your virtual environment is activated and development dependencies are installed:
    python -m venv .venv
    source .venv/bin/activate
    pip install -r requirements.txt
  2. Run tests from the project root:
    .venv/bin/python -m pytest
    Or, if your PATH is set up correctly after activating the venv:
    pytest

About

FastAPI proxy server designed to have multiple LLMs chain responses for refinement.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors