⚠️ Nightly release for early testing. Expect rough edges. Stable version coming out soon — please open an issue if you hit anything.
The open-source platform for shipping self-improving AI agents. Evaluations, tracing, simulations, guardrails, gateway, optimization. Everything runs on one platform and one feedback loop, from first prototype to live deployment.
Try Cloud (Free) · Self-Host · Docs · Blog · Discord · Discussions
Most AI agents fail in production, and teams end up stitching together evals, observability, and guardrails that never close the loop. FutureAGI collapses all of it into one platform and one feedback loop. Simulate edge cases before launch, evaluate what happens in production, protect users in real time, and turn every trace into signal for the next version. The result: agents that don't just get monitored, they self-improve.
|
No more stitching Langfuse + Braintrust + Helicone + Guardrails AI + a custom simulator. One platform covers the lifecycle: simulate → evaluate → protect → monitor → optimize, with data flowing back as a loop. |
Apache 2.0 core. Every evaluator, every prompt, every trace is inspectable — no black-box scoring. Self-host for data sovereignty or use our managed Cloud. Drop in your own stack at any layer via OTel / OpenAI-compatible HTTP. |
Go-based gateway with ~9.9 ns weighted routing, ~29 k req/s on t3.xlarge, P99 ≤ 21 ms with guardrails on. OpenTelemetry-native traces. 50+ framework instrumentors. Every claim reproducible via the committed benchmark harness. |
Three ways, picked by how much you want to install:
| Cloud (fastest) | Self-host (Docker) | Self-host (Kubernetes) |
|---|---|---|
|
No install. Free tier. # Sign up free:
# app.futureagi.com
pip install ai-evaluationSOC 2 Type II · HIPAA · data stays in your region. |
One command, full stack. git clone https://github.com/future-agi/future-agi.git
cd future-agi
cp futureagi/.env.example futureagi/.env
docker compose up -dOpen http://localhost:3031. |
Production-grade, HA. helm repo add futureagi \
helm install fagi futureagi/future-agiHelm chart — v1 in progress. Until then, kubectl manifests in |
|
Python from fi_instrumentation import register
from traceai_openai import OpenAIInstrumentor
register(project_name="my-agent")
OpenAIInstrumentor().instrument()
# Your existing OpenAI code is now traced.
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": query}],
) |
TypeScript import { register } from "@traceai/fi-core";
import { OpenAIInstrumentation } from "@traceai/openai";
register({ projectName: "my-agent" });
new OpenAIInstrumentation().instrument();
// Your existing OpenAI code is now traced.
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: query }],
}); |
Full docs → · Cookbooks → · API reference →
Six pillars. Each one replaces a tool you probably have.
|
Thousands of multi-turn conversations against realistic personas, adversarial inputs, and edge cases. Text and voice (LiveKit, VAPI, Retell, Pipecat). |
50+ metrics under one |
18 built-in scanners (PII, jailbreak, injection, …) + 15 vendor adapters (Lakera, Presidio, Llama Guard, …). Inline in gateway or standalone SDK. |
|
OpenTelemetry-native tracing across 50+ frameworks (LangChain, LlamaIndex, CrewAI, DSPy…). Span graphs, latency, token cost, live dashboards. Zero-config. |
OpenAI-compatible gateway. 100+ providers, 15 routing strategies, semantic caching, virtual keys, MCP, A2A. ~29k req/s, P99 ≤ 21ms with guardrails on. |
Six prompt-optimization algorithms (GEPA, PromptWizard, ProTeGi, Bayesian, Meta-Prompt, Random). Production traces feed back as training data. |
| Target | Status | Notes |
|---|---|---|
| Docker Compose | ✅ | docker compose up -d from a fresh clone |
| Kubernetes | ✅ | Plain manifests today; Helm chart v1 in progress |
| AWS / GCP / Azure | ✅ | Runs on any container runtime — ECS · Cloud Run · AKS · EKS · GKE |
| AWS Marketplace | ⏳ | Coming soon |
| Air-gapped / on-prem | ✅ | No phone-home — contact sales |
Every arrow is an open, documented interface: OpenTelemetry OTLP for traces, OpenAI-compatible HTTP for the gateway, Postgres / ClickHouse SQL for storage. Drop in your own stack at any layer.
Runtime: Python 3.11+ (Django 4.2 + Channels) · Go 1.23+ (gateway) · React 18 + Vite · Node 20+. Data: PostgreSQL (metadata) · ClickHouse (spans + time-series) · Redis (state) · RabbitMQ + Temporal (jobs).
Component breakdown (per-package)
| Layer | Component | Code |
|---|---|---|
| Edge | traceAI — OpenTelemetry instrumentation | future-agi/traceAI |
| Edge | Agent Command Center — OpenAI-compatible proxy | futureagi/agentcc-gateway/ |
| Platform | tracer — OTLP ingest, span graph | futureagi/tracer/ |
| Platform | agentic_eval — 50+ metrics, LLM-as-judge | futureagi/agentic_eval/ |
| Platform | simulate — persona-driven scenario generation | futureagi/simulate/ |
| Platform | model_hub — LLM routing, embeddings, datasets | futureagi/model_hub/ |
| Platform | accounts · usage · integrations — auth, orgs, metering, connectors | futureagi/accounts/ |
| Data | PostgreSQL · ClickHouse · Redis · RabbitMQ + Temporal | — |
Future AGI is an open-source ecosystem — each SDK is independently usable, independently packaged, Apache/MIT-licensed.
| Repo | Install | Languages | Purpose |
|---|---|---|---|
| traceAI | pip install fi-instrumentation-otelnpm i @traceai/fi-core |
Python · TS · Java · C# | Zero-config OTel tracing for 50+ AI frameworks |
| ai-evaluation | pip install ai-evaluationnpm i @future-agi/ai-evaluation |
Python · TS | 50+ evaluation metrics + guardrail scanners |
| futureagi | pip install futureagi |
Python | Platform SDK — datasets, prompts, KB, experiments |
| agent-opt | pip install agent-opt |
Python | 6 prompt-optimization algorithms (GEPA, PromptWizard, …) |
| simulate-sdk | pip install agent-simulate |
Python | Voice-agent simulation via LiveKit + Silero VAD |
| agentcc | pip install agentccnpm i @agentcc/client |
Python · TS (+ LangChain · LlamaIndex · React · Vercel) | Gateway client SDKs |
| LLM providers (100+) | OpenAI · Anthropic · Google Gemini · Vertex AI · AWS Bedrock · Azure OpenAI · Mistral · Groq · Cohere · Together · Perplexity · OpenRouter · Fireworks · xAI · Replicate · HuggingFace · + self-hosted Ollama · vLLM · LM Studio · TGI · Llamafile |
| Agent frameworks | LangChain · LangGraph · LlamaIndex · CrewAI · AutoGen · Phidata · PydanticAI · Claude SDK · LiteLLM · Haystack · DSPy · Instructor · Smol-agents |
| Voice platforms | VAPI · Retell · LiveKit · Pipecat |
| Vector DBs | Pinecone · Weaviate · Chroma · Milvus · Qdrant · pgvector |
| Tools & infra | Vercel AI SDK · n8n · MongoDB · MCP · A2A · Guardrails AI · Langfuse · HuggingFace Smol-agents |
| Future AGI | Langfuse | Phoenix | Braintrust | Helicone | |
|---|---|---|---|---|---|
| Open source | ✅ Apache 2.0 | ✅ MIT | ✅ Elastic v2 | ❌ | ✅ Apache 2.0 |
| Self-host | ✅ | ✅ | ✅ | ❌ | ✅ |
| LLM tracing (OpenTelemetry) | ✅ | ✅ | ✅ | ✅ | via OpenLLMetry |
| Evaluation suites | ✅ 50+ metrics | ✅ | ✅ | ✅ | Limited |
| Agent simulation | ✅ | ❌ | ❌ | ❌ | ❌ |
| Voice agent eval | ✅ | ❌ | Cookbook | ❌ | ❌ |
| LLM gateway built in | ✅ 100+ providers | ❌ | ❌ | ✅ | ✅ |
| Guardrails built in | ✅ 18 + 15 adapters | ❌ | ❌ | ❌ | ❌ |
| Prompt optimization | ✅ 6 algorithms | ❌ | ❌ | ❌ | ❌ |
| Prompt management | ✅ | ✅ | ✅ | ✅ | ✅ |
| Datasets & experiments | ✅ | ✅ | ✅ | ✅ | ✅ |
| No-code eval builder | ✅ |
Based on publicly-documented features as of April 2026. Corrections welcome — open a PR.
- Customer Support: Ship support AI that customers actually trust
- Voice Agents: Test, evaluate, and improve voice AI end-to-end
- Internal Tools: AI copilots your whole org can rely on
- RAG & Search: Every answer grounded, every citation verified
- Autonomous Agents: Multi-step agents you can actually trust in production
- Computer-Use Agents (CUA): Agents that click with confidence
- Coding Agents: AI that writes code you can actually ship
Vote on the public roadmap → · GitHub Discussions · Releases · Changelog
| Recently shipped | In progress | Coming up | Exploring |
|---|---|---|---|
|
|
|
|
We love contributions — bug fixes, new evaluators, framework integrations, docs, examples, anything.
- Browse
good first issue - Read the Contributing Guide
- Say hi on Discord or Discussions
- Sign the CLA on your first PR (automatic bot)
| 💬 Discord | Real-time help from the team and community |
| 🗨️ GitHub Discussions | Ideas, questions, roadmap input |
| 🐦 Twitter / X | Release announcements |
| 📝 Blog | Engineering & research posts |
| 📺 YouTube | Walkthroughs & demos |
| 📊 Status | Cloud uptime + incident history |
| 📧 support@futureagi.com | Cloud account / billing |
| 🔐 security@futureagi.com | Private vulnerability disclosure (24h ack — see SECURITY.md) |
Self-hosted Future AGI phones home anonymous usage counts only (version, instance ID, feature flags used) so we can size our release testing. No trace data, no prompts, no API keys, ever. Opt out via FUTURE_AGI_TELEMETRY_DISABLED=1.
Future AGI is licensed under the Apache License 2.0. See LICENSE and NOTICE.
You own your evaluation logic and your data. Inspect every evaluator, every prompt, every trace — no black-box scoring, no vendor lock-in.
Built with ❤️ by the Future AGI team and contributors worldwide.
If Future AGI helps you ship better AI, a ⭐ helps more teams find us.
🌐 futureagi.com · 📖 docs.futureagi.com · ☁️ app.futureagi.com · 📊 status.futureagi.com
