Full-model OpenAI-compatible Grok API gateway — single binary, ready to run
English · 简体中文
GrokForge wraps all Grok web capabilities (chat, reasoning, image generation/editing, video generation) into a standard OpenAI API format (a.k.a. 2api gateway). You can seamlessly connect any OpenAI-compatible client (ChatGPT Next Web, LobeChat, Open WebUI, Cursor, bots, etc.) to Grok models.
Go rewrite + Next.js admin panel, compiled into a single binary, SQLite out of the box, zero external dependencies.
- Single binary deployment — Frontend embedded via
go:embed, just copy and run - Modern admin panel — Next.js + shadcn/ui, one-stop Dashboard / Token / API Key / Settings / Usage / Cache management
- Multi-pool token routing — ssoBasic / ssoSuper / ssoHeavy routed by
pool_floor, with 3 selection algorithms + priority tiers - Static model catalog — Models defined in a TOML file embedded in the binary, overridable via external file
- Mode-based dynamic quotas — Quota windows driven by the model catalog;
image_wsuses transient cooldown only - SSE heartbeat — 2KB initial padding + 15s ping keeps connections alive through proxies and CDNs
- DeepSearch — Pass
deepsearchparameter to enable Grok's deep search capability - Hot-reload config — Admin panel changes take effect immediately, no restart needed
- Structured logging — slog + file rotation, JSON / Text formats
- Bilingual UI — Admin panel supports English and Chinese
- OpenAI Chat Completions API — Streaming / non-streaming, fully compatible
- Chain-of-thought reasoning —
<think>tag output,reasoning_effortcontrol - Tool Calling — Hermes-style tool calls with parallel_tool_calls support
- Multimodal input — Image URL / base64, auto download, decode and resize
- Image generation / editing — WebSocket channel, multiple images, various sizes
- Video generation — Multiple aspect ratios and resolutions
- Model listing —
GET /v1/modelsreturns enabled models from the static catalog - SSE heartbeat — 2KB padding + 15s ping prevents proxy/CDN timeout disconnections
- DeepSearch —
deepsearchparameter passthrough for Grok's deep search capability
- Multi-pool routing — ssoBasic / ssoSuper / ssoHeavy selected by
pool_floor - 3 selection algorithms — high_quota_first / random / round_robin
- Priority tiers — Higher priority tokens are selected first
- Mode-based quotas — chat / image_lite / image_edit / video share quota windows by catalog mode
-
image_wsexception — WebSocket image models stay outside quota sync and use transient token+model cooldown only - Auto refresh — Periodic session refresh, auto rebuild on failure
- Token state model — persisted
active / disabled / expired, derivedexhausteddisplay state
- Static TOML catalog — Models defined in
internal/modelconfig/models.toml, embedded in binary - External override — Set
models_fileinconfig.tomlto replace the default catalog entirely - Read-only admin view — Settings page displays the full model catalog (no editing)
- Registry-driven routing — O(1) request-name resolution via in-memory snapshot
- Per-mode pool_floor override — Expert → basic, Heavy → heavy, etc.
- API Key management — CRUD + model whitelist + daily limit + rate limit
- Exponential backoff retry — Jitter + budget control + session auto-reset
- Cloudflare defense — FlareSolverr integration, instant 403 refresh + debounce
- Secure authentication — Constant-time comparison; auto-generated bootstrap password when
app_keyis unset (process-local, logged on startup)
- Dashboard — Stats cards + quota progress + usage trend charts
- Token management — Batch import / enable / disable / delete, status filtering, health indicators
- API Key management — Create / disable / expire / regenerate keys
- System settings — General + Models dual tabs, hot-reload on save
- Usage stats — Aggregate overview + per-request logs (with TTFT)
- Cache management — Image / video stats, preview / download / batch cleanup
- Playground — Chat / Image / Video generation online, multi-turn conversation with Markdown rendering
Chat Models
| Model | Mode | Pool Floor | Description |
|---|---|---|---|
grok-4.20 |
default | basic | Default Grok 4.20 mode |
grok-4.20-fast |
fast | basic | Faster Grok 4.20 variant |
grok-4.20-think |
think (force_thinking) | basic | Deep reasoning mode |
grok-4.20-expert |
expert | basic | Expert mode |
grok-4.20-heavy |
heavy | heavy | Heavy pool only |
Media Models
| Model | Type | Pool Floor | Description |
|---|---|---|---|
grok-imagine-image |
image_ws (WebSocket) | super | Full image generation |
grok-imagine-image-pro |
image_ws + enable_pro | super | Pro image generation |
grok-imagine-image-lite |
image (HTTP) | basic | Lightweight image generation, Basic pool |
grok-imagine-image-edit |
image_edit | super | Image editing (supports reference images) |
grok-imagine-video |
video | super | Video generation |
All models are defined in
internal/modelconfig/models.toml(embedded in the binary). To customize, setmodels_fileinconfig.tomlto point to your own TOML catalog file — it replaces the default entirely.
# 1. Download & start
./grokforge -config config.toml
# If app_key is not set, a temporary admin password is printed in the startup log.
# 2. Open admin panel and add your Grok Token
# http://localhost:8080
# 3. Test
curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-4.20",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'Prerequisites: Go 1.25+, Node.js 18+
git clone https://github.com/crmmc/grokforge.git
cd grokforge
# Copy config template
cp config.defaults.toml config.toml
# One-command build (frontend + backend)
make build
# Run
./bin/grokforgeThe build output is a single binary with the frontend embedded via go:embed.
GrokForge uses TOML configuration. See config.defaults.toml for the full template.
[app]
app_key = "your-admin-password" # Admin password (omit to auto-generate a temporary one on startup)
port = 8080 # Server port
[proxy]
base_proxy_url = "" # Optional: proxy URLAfter startup, add Grok Tokens in the admin panel. All other settings can be modified online.
Admin panel (DB) > config.toml > Built-in defaults
Admin panel changes take effect immediately without restart.
Application [app]
| Key | Default | Description |
|---|---|---|
app_key |
"" |
Admin password (if empty, a temporary bootstrap password is auto-generated and logged on startup) |
port |
8080 |
Server port |
host |
"0.0.0.0" |
Listen address |
db_driver |
"sqlite" |
Database driver: sqlite / postgres |
db_path |
"data/grokforge.db" |
SQLite file path |
db_dsn |
"" |
PostgreSQL connection string |
log_level |
"info" |
Log level: debug/info/warn/error |
log_json |
false |
JSON format logs |
request_timeout |
60 |
Default timeout for non-LLM routes (seconds) |
temporary |
true |
Temporary conversation mode |
thinking |
true |
Enable chain-of-thought by default |
stream |
true |
Streaming response by default |
filter_tags |
[...] |
Special tags to filter |
Proxy [proxy]
| Key | Default | Description |
|---|---|---|
base_proxy_url |
"" |
Upstream proxy (HTTP/HTTPS/SOCKS5) |
asset_proxy_url |
"" |
Asset proxy (image downloads, etc.) |
cf_clearance |
"" |
Cloudflare clearance cookie |
browser |
"chrome_146" |
TLS fingerprint browser profile |
enabled |
false |
Enable CF auto-refresh |
flaresolverr_url |
"" |
FlareSolverr service URL |
refresh_interval |
3600 |
CF refresh interval (seconds) |
Retry Policy [retry]
| Key | Default | Description |
|---|---|---|
max_tokens |
5 |
Maximum tokens to try |
per_token_retries |
2 |
Maximum retries per token before switching |
reset_session_status_codes |
[403] |
Status codes that trigger session reset |
retry_backoff_base |
0.5 |
Backoff base delay (seconds) |
retry_backoff_factor |
2.0 |
Backoff multiplier |
retry_backoff_max |
20.0 |
Maximum single delay (seconds) |
retry_budget |
60.0 |
Total retry budget (seconds) |
Token Management [token]
| Key | Default | Description |
|---|---|---|
fail_threshold |
5 |
Consecutive failure threshold (marks disabled) |
usage_flush_interval_sec |
30 |
Interval for flushing usage stats to DB |
selection_algorithm |
"high_quota_first" |
Algorithm: high_quota_first / random / round_robin |
┌─────────────────────────────────────────────────┐
│ Client │
│ (ChatGPT Next Web / LobeChat / curl / ...) │
└─────────────────────┬───────────────────────────┘
│ OpenAI API
▼
┌─────────────────────────────────────────────────┐
│ GrokForge │
│ │
│ ┌───────────┐ ┌───────────┐ ┌────────────┐ │
│ │ httpapi │ │ Admin │ │ Static │ │
│ │ (OpenAI) │ │ API │ │ Frontend │ │
│ └─────┬─────┘ └─────┬─────┘ └────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ flow (orchestration) │ │
│ │ chat / image / video / model registry │ │
│ └──────┬──────────┬──────────┬────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌─────────┐ ┌──────────┐ │
│ │ token │ │ xai │ │ store │ │
│ │ (pool) │ │(upstream)│ │(persist) │ │
│ └──────────┘ └─────────┘ └──────────┘ │
│ │ │
└────────────────────┼────────────────────────────┘
│
▼
┌─────────────┐
│ grok.com │
│ (SSE / WS) │
└─────────────┘
Three-tier architecture: httpapi (protocol translation) → flow (business orchestration) → xai / token / store (infrastructure)
Unidirectional dependencies, no circular references.
sequenceDiagram
participant C as Client
participant H as httpapi
participant F as flow
participant T as token
participant X as xai
participant G as grok.com
C->>H: POST /v1/chat/completions
H->>H: Auth check + param parsing
H->>F: Route to chat/image/video flow
F->>T: Request available Token (`pool_floor` + algorithm selection)
T-->>F: Return Token
F->>X: Build upstream request
X->>G: SSE / WebSocket request
G-->>X: Streaming response
X-->>F: Parse + transform
F-->>H: OpenAI format output
H-->>C: SSE stream / JSON response
Note over F,T: Auto retry on failure<br/>Switch Token + Session reset
Admin panel URL:
http://your-host:8080
The admin panel includes:
- Dashboard — System status at a glance: Token count, API Key count, call volume, quota progress, trend charts
- Token Management — Batch import / enable / disable / delete, status filtering, quota editing, priority settings, manual refresh
- API Key — Create and manage API keys, model whitelist, daily limit, rate limit
- Model Catalog — Read-only view of the static model catalog with pool_floor and upstream details
- Settings — General config online editing (hot-reload) + read-only model catalog view
- Usage Stats — Aggregate overview + per-request logs (including TTFT, token consumption metrics)
- Cache — Image / video cache browsing, preview, download, cleanup
- Playground — Chat / Image / Video generation online, multi-turn conversation with Markdown rendering
curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-4.20",
"messages": [{"role": "user", "content": "Explain quantum computing in one sentence"}],
"stream": true
}'curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-4.20-expert",
"messages": [{"role": "user", "content": "Prove that √2 is irrational"}],
"reasoning_effort": "high"
}'curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-4.20",
"messages": [{"role": "user", "content": "What is the weather in Beijing today?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}]
}'curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-4.20",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}]
}'curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-imagine-image",
"messages": [{"role": "user", "content": "A shiba inu in a spacesuit walking on the moon"}]
}'curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-imagine-video",
"messages": [{"role": "user", "content": "A cat dancing on a piano"}]
}'GrokForge is compatible with all OpenAI API clients — just point the API URL to GrokForge:
| Client | Configuration |
|---|---|
| ChatGPT Next Web | Settings → API URL = http://your-host:8080 |
| LobeChat | Settings → OpenAI → API URL = http://your-host:8080/v1 |
| Open WebUI | Admin → Connections → OpenAI API = http://your-host:8080/v1 |
| Cursor | Settings → Models → OpenAI Base URL = http://your-host:8080/v1 |
| Any OpenAI SDK | base_url="http://your-host:8080/v1" |
How to get a Grok Token?
- Log in to grok.com
- Open browser DevTools (F12)
- Find the
ssoorsso-rwcookie value in Application → Cookies - Import it in the admin panel
What's the difference between Basic, Super, and Heavy pools?
- Basic pool (
ssoBasic): Lowest capability floor. Can serve models/modes whosepool_floorisbasic. - Super pool (
ssoSuper): Higher capability floor. Can servesuperandbasicmodels/modes. - Heavy pool (
ssoHeavy): Highest capability floor. Reserved forheavymodels/modes and can also serve lower-floor requests. - Model routing is defined in the static model catalog (
internal/modelconfig/models.toml), not by editing per-pool model lists. pool_flooris a hard requirement. If no eligible token exists in pools with level >= the model's floor, the request fails with no silent downgrade.
What to do about 403 errors?
Usually triggered by Cloudflare protection. Solutions:
- Configure proxy: Set
base_proxy_urlwith a clean IP - FlareSolverr: Configure
flaresolverr_url, GrokForge auto-refreshes CF cookies - Manual update: Update
cf_clearancecookie in the admin panel
How long until token quotas recover?
Quota-tracked models (chat, image_lite, image_edit, video) recover per mode window.
- The scheduler starts a token+mode window from the first successful upstream response after refresh.
- When that window expires, GrokForge refreshes the mode from
/rest/rate-limitsand learns both remaining quota and total quota from upstream. image_wsis not quota-tracked; it only uses transient token+model cooldown in memory.
How to share with multiple users?
- Create API Keys in the admin panel, assign one per user
- Set Model Whitelist to restrict available models
- Set Daily Limit to control daily usage per user
- Set Rate Limit to prevent burst requests
Which databases are supported?
- SQLite (default): Zero config, data stored in
data/grokforge.db - PostgreSQL: Recommended for production, set
db_driver = "postgres"anddb_dsn
Both databases have identical functionality with current-schema initialization on startup.
If the schema changes in local development, delete data/grokforge.db and rebuild instead of expecting in-place migration.
| Layer | Technology |
|---|---|
| Backend | Go 1.25 · chi · GORM · slog |
| Frontend | Next.js · shadcn/ui · Tailwind CSS · Recharts |
| Storage | SQLite (default) · PostgreSQL (optional) |
| Build | Make · go:embed (frontend embedded in binary) |
grokforge/
├── cmd/grokforge/ # Entry point
├── internal/
│ ├── httpapi/ # HTTP layer (OpenAI compat + Admin API)
│ │ └── openai/ # OpenAI protocol implementation
│ ├── flow/ # Business orchestration (chat / image / video)
│ ├── token/ # Token pool management (routing / selection / quota / refresh)
│ ├── xai/ # Upstream communication (SSE / WebSocket)
│ ├── store/ # Persistence (GORM + current schema/constraints)
│ ├── modelconfig/ # Static model catalog (TOML embedded + loader)
│ ├── config/ # Config management (TOML + DB override + hot-reload)
│ ├── cfrefresh/ # Cloudflare defense (FlareSolverr integration)
│ ├── cache/ # Cache management (image / video local cache)
│ └── logging/ # Log management (slog + file rotation)
├── web/ # Next.js frontend
│ └── src/app/ # Page routes
├── config.defaults.toml # Config template
└── Makefile # Build script
- grok2api — Original Python project that proved the concept
- chi — Lightweight HTTP router
- GORM — Go ORM
- shadcn/ui — UI component library
See CHANGELOG.md for release history and version details.
Disclaimer: This project is for educational and research purposes only. Users must comply with the terms of service of relevant platforms. Any consequences arising from the use of this project are the sole responsibility of the user.




