Skip to content

vcal-project/ai-firewall

AI Cost Firewall

Rust License Docker Status

OpenAI-compatible gateway for caching and cost control.

AI Cost Firewall is a lightweight OpenAI-compatible API gateway that reduces LLM API costs and latency by caching responses using exact matching and semantic similarity.

It sits between applications and LLM providers and forwards only necessary requests to the upstream API.

The project is developed and supported by the creators of VCAL Server.

https://vcal-project.com


Why AI Cost Firewall?

LLM APIs are expensive and often receive repeated or semantically similar prompts.

Without caching, every request results in:

  • unnecessary API calls
  • increased token usage
  • higher costs
  • additional latency

AI Cost Firewall solves this by introducing a two-layer cache:

  1. Exact cache (Redis) -- instant responses for identical prompts\
  2. Semantic cache (Qdrant) -- reuse answers for similar prompts

Only cache misses are forwarded to the upstream LLM provider.

The firewall behaves similarly to "nginx for LLM APIs".


Key Features

  • OpenAI-compatible /v1/chat/completions endpoint
  • Exact request caching (Redis)
  • Semantic cache (Qdrant)
  • Token and cost savings metrics
  • Prometheus observability
  • Docker deployment
  • nginx-style configuration
  • Hot configuration reload (SIGHUP)
  • Lightweight Rust + Axum implementation

Architecture Overview

Client applications send requests to the firewall instead of directly to the LLM provider.

flowchart TD
    C[Client / SDK] --> F[AI Cost Firewall<br/>OpenAI-compatible API gateway]

    F --> R[Redis / Valkey<br/>Exact cache]
    F --> Q[Qdrant<br/>Semantic cache]
    F --> U[Upstream LLM API<br/>OpenAI-compatible]

    F --> P[Prometheus]
    P --> G[Grafana]
Loading

Full architecture documentation:

docs/architecture.md


Quick Start (Docker)

The fastest way to try AI Cost Firewall is using Docker Compose.

Prerequisites

Install:

  • Docker
  • Docker Compose (included with Docker Desktop)

Verify installation:

docker --version
docker compose version

Download configuration

Download the example configuration:

curl -L https://raw.githubusercontent.com/vcal-project/ai-firewall/main/configs/ai-firewall.conf.example -o ai-firewall.conf

Edit the file and add your API keys:

nano ai-firewall.conf

You must specify the exact model names returned by the API, for example:

gpt-4o-mini-2024-07-18

Run Docker Compose

Download the Docker Compose file:

curl -L https://raw.githubusercontent.com/vcal-project/ai-firewall/main/docker-compose.yml -o docker-compose.yml

Start the stack:

```bash
docker compose pull
docker compose up -d

View logs:

docker compose logs -f firewall

Services

Service URL
Firewall API http://localhost:8080
Prometheus http://localhost:9090
Grafana http://localhost:3000

The stack includes:

  • AI Cost Firewall
  • Redis
  • Qdrant
  • Prometheus
  • Grafana

Example Request

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer <your-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini-2024-07-18",
    "messages": [
      {"role": "user", "content": "Explain Redis briefly."}
    ]
  }'

Metrics

Prometheus metrics are available at:

http://localhost:8080/metrics

Example metrics:

aif_requests_total
aif_cache_exact_hits
aif_cache_semantic_hits
aif_cache_misses
aif_tokens_saved
aif_cost_saved_micro_usd

Note

Token and cost savings are currently calculated only for:

/v1/chat/completions

Embedding requests used internally for semantic caching are not included in these metrics in the current version.


Build from Source

Clone the repository if you want to:

  • explore the code
  • modify configuration templates
  • build the firewall locally
  • contribute to the project
git clone https://github.com/vcal-project/ai-firewall.git
cd ai-firewall

Build the project:

cargo build --release

Run the firewall:

cargo run --release

Configuration

AI Cost Firewall uses a simple nginx-style configuration format.

Example configuration:

listen_addr 0.0.0.0:8080;

redis_url redis://redis:6379;

upstream_base_url https://api.openai.com;
upstream_api_key sk-your-api-key;

embedding_base_url https://api.openai.com;
embedding_api_key sk-your-api-key;
embedding_model text-embedding-3-small;

qdrant_url http://qdrant:6334;
qdrant_collection aif_semantic_cache;
qdrant_vector_size 1536;

cache_ttl_seconds 2592000;
request_timeout_seconds 120;

semantic_cache_enabled true;
semantic_similarity_threshold 0.92;

# Chat-completion pricing (USD per 1M tokens)
# model_price <model> <input_usd_per_1m_tokens> <output_usd_per_1m_tokens>;

model_price gpt-4o-mini-2024-07-18 0.15 0.60;
model_price gpt-4.1-mini-2025-04-14 0.30 1.20;

model_price matching is exact in v0.1.0.
If the API returns gpt-4o-mini-2024-07-18, the same name must appear in the configuration.

Full configuration reference:

docs/config-reference.md

docs/config-reference.md


Documentation

Document Description
docs/architecture.md System architecture
docs/config-reference.md Configuration directives
docs/faq.md Frequently asked questions
docs/how-it-works.md Request flow and caching logic
docs/quickstart.md Full setup guide

Contributing

Contributions are welcome.

If you would like to contribute to AI Cost Firewall — whether through bug reports, feature suggestions, documentation improvements, or code — please see:

CONTRIBUTING.md

Before submitting a pull request, please open an issue to discuss the change.

We welcome improvements in:

  • performance
  • documentation
  • testing
  • integrations with LLM providers
  • observability and metrics

Integration with VCAL Server

AI Cost Firewall can optionally integrate with VCAL Server for advanced semantic caching and distributed vector storage.

VCAL Server project:

https://vcal-project.com


License

Apache License 2.0

About

OpenAI-compatible LLM gateway that reduces API costs using Redis exact cache and Qdrant semantic cache.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors