AI Cost Firewall

OpenAI-compatible gateway for caching and cost control.

AI Cost Firewall is a lightweight OpenAI-compatible API gateway that reduces LLM API costs and latency by caching responses using exact matching and semantic similarity.

It sits between applications and LLM providers and forwards only necessary requests to the upstream API.

The project is developed and supported by the creators of VCAL Server.

https://vcal-project.com

Why AI Cost Firewall?

LLM APIs are expensive and often receive repeated or semantically similar prompts.

Without caching, every request results in:

unnecessary API calls
increased token usage
higher costs
additional latency

AI Cost Firewall solves this by introducing a two-layer cache:

Exact cache (Redis) -- instant responses for identical prompts\
Semantic cache (Qdrant) -- reuse answers for similar prompts

Only cache misses are forwarded to the upstream LLM provider.

The firewall behaves similarly to "nginx for LLM APIs".

Key Features

OpenAI-compatible /v1/chat/completions endpoint
Exact request caching (Redis)
Semantic cache (Qdrant)
Token and cost savings metrics
Prometheus observability
Docker deployment
nginx-style configuration
Hot configuration reload (SIGHUP)
Lightweight Rust + Axum implementation

Architecture Overview

Client applications send requests to the firewall instead of directly to the LLM provider.

flowchart TD
    C[Client / SDK] --> F[AI Cost Firewall<br/>OpenAI-compatible API gateway]

    F --> R[Redis / Valkey<br/>Exact cache]
    F --> Q[Qdrant<br/>Semantic cache]
    F --> U[Upstream LLM API<br/>OpenAI-compatible]

    F --> P[Prometheus]
    P --> G[Grafana]

Full architecture documentation:

docs/architecture.md

Quick Start (Docker)

The fastest way to try AI Cost Firewall is using Docker Compose.

Prerequisites

Install:

Docker
Docker Compose (included with Docker Desktop)

Verify installation:

docker --version
docker compose version

Download configuration

Download the example configuration:

curl -L https://raw.githubusercontent.com/vcal-project/ai-firewall/main/configs/ai-firewall.conf.example -o ai-firewall.conf

Edit the file and add your API keys:

nano ai-firewall.conf

You must specify the exact model names returned by the API, for example:

gpt-4o-mini-2024-07-18

Run Docker Compose

Download the Docker Compose file:

curl -L https://raw.githubusercontent.com/vcal-project/ai-firewall/main/docker-compose.yml -o docker-compose.yml

Start the stack:

```bash
docker compose pull
docker compose up -d

View logs:

docker compose logs -f firewall

Services

Service	URL
Firewall API	http://localhost:8080
Prometheus	http://localhost:9090
Grafana	http://localhost:3000

The stack includes:

AI Cost Firewall
Redis
Qdrant
Prometheus
Grafana

Example Request

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer <your-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini-2024-07-18",
    "messages": [
      {"role": "user", "content": "Explain Redis briefly."}
    ]
  }'

Metrics

Prometheus metrics are available at:

http://localhost:8080/metrics

Example metrics:

aif_requests_total
aif_cache_exact_hits
aif_cache_semantic_hits
aif_cache_misses
aif_tokens_saved
aif_cost_saved_micro_usd

Note

Token and cost savings are currently calculated only for:

/v1/chat/completions

Embedding requests used internally for semantic caching are not included in these metrics in the current version.

Build from Source

Clone the repository if you want to:

explore the code
modify configuration templates
build the firewall locally
contribute to the project

git clone https://github.com/vcal-project/ai-firewall.git
cd ai-firewall

Build the project:

cargo build --release

Run the firewall:

cargo run --release

Configuration

AI Cost Firewall uses a simple nginx-style configuration format.

Example configuration:

listen_addr 0.0.0.0:8080;

redis_url redis://redis:6379;

upstream_base_url https://api.openai.com;
upstream_api_key sk-your-api-key;

embedding_base_url https://api.openai.com;
embedding_api_key sk-your-api-key;
embedding_model text-embedding-3-small;

qdrant_url http://qdrant:6334;
qdrant_collection aif_semantic_cache;
qdrant_vector_size 1536;

cache_ttl_seconds 2592000;
request_timeout_seconds 120;

semantic_cache_enabled true;
semantic_similarity_threshold 0.92;

# Chat-completion pricing (USD per 1M tokens)
# model_price <model> <input_usd_per_1m_tokens> <output_usd_per_1m_tokens>;

model_price gpt-4o-mini-2024-07-18 0.15 0.60;
model_price gpt-4.1-mini-2025-04-14 0.30 1.20;

model_price matching is exact in v0.1.0.
If the API returns gpt-4o-mini-2024-07-18, the same name must appear in the configuration.

Full configuration reference:

docs/config-reference.md

docs/config-reference.md

Documentation

Document	Description
`docs/architecture.md`	System architecture
`docs/config-reference.md`	Configuration directives
`docs/faq.md`	Frequently asked questions
`docs/how-it-works.md`	Request flow and caching logic
`docs/quickstart.md`	Full setup guide

Contributing

Contributions are welcome.

If you would like to contribute to AI Cost Firewall — whether through bug reports, feature suggestions, documentation improvements, or code — please see:

CONTRIBUTING.md

Before submitting a pull request, please open an issue to discuss the change.

We welcome improvements in:

performance
documentation
testing
integrations with LLM providers
observability and metrics

Integration with VCAL Server

AI Cost Firewall can optionally integrate with VCAL Server for advanced semantic caching and distributed vector storage.

VCAL Server project:

https://vcal-project.com

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
deploy		deploy
docs		docs
security		security
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Cost Firewall

Why AI Cost Firewall?

Key Features

Architecture Overview

Quick Start (Docker)

Prerequisites

Download configuration

Run Docker Compose

Services

Example Request

Metrics

Note

Build from Source

Configuration

Documentation

Contributing

Integration with VCAL Server

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Cost Firewall

Why AI Cost Firewall?

Key Features

Architecture Overview

Quick Start (Docker)

Prerequisites

Download configuration

Run Docker Compose

Services

Example Request

Metrics

Note

Build from Source

Configuration

Documentation

Contributing

Integration with VCAL Server

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages