Text your repo. Come back to a PR.
A self-hosted async coding agent that runs on your own Orb Cloud account. Comment @orb <task> on any GitHub issue, and a persistent agent — warm checkout, warm deps, warm LSP — wakes in ~1 second, works for however long it needs, runs a 10-gate verifier on its own output, and replies with a PR.
Zero servers on our side. You own the keys, the code, and the bill.
Every dev on Twitter has a graveyard of half-built side projects. You wanted to ship that idea on the train home, but your laptop is closed, and the commit window is gone. Devin charges $500/mo and still needs babysitting. Cursor Background Agents burn money on idle compute. GitHub Actions can't hold state between jobs.
Orb Cloud's moat is precisely this: an agent runtime that checkpoints to NVMe while idle and wakes on webhook in ~1s. orb-async-dev is the thinnest possible layer on top:
- one HTTP listener (
/github/webhook,/health,/logs) - one orchestrator that spawns sub-agents per task
- one 10-gate verifier that blocks the PR unless the change actually works
Everything else is just plumbing.
GitHub issue / PR comment ──▶ Orb wakes computer (~1s)
"@orb fix the login bug" │
├─ verify HMAC
├─ parse @orb mention
├─ create branch + git worktree
├─ sub-agent runs (Claude tool-use loop)
├─ 10-gate verifier (build, tests,
│ mutation check, self-review, red team, ...)
├─ commit + push + open PR
└─ reply in the thread
Idle ──▶ checkpoint-to-NVMe ──▶ ~$0/hr
See learnings.txt for the detailed build log.
npx orb-async-dev initFour questions (GH repo, GH token, LLM key, cost cap), writes .env, registers an Orb computer, returns the live URL.
git clone https://github.com/AWLSEN/orb-async-dev
cd orb-async-dev
cp .env.example .env && $EDITOR .env
bun install
bun run deployEither path ends the same way:
deployprintshttps://{short_id}.orbcloud.dev- Open
https://github.com/<owner>/<repo>/settings/hooks - Add a webhook:
- Payload URL:
https://{short_id}.orbcloud.dev/github/webhook - Secret: the value of
WEBHOOK_SECRETfrom your.env - Events: Issues, Issue comments, Pull request review comments
- Payload URL:
- Comment
@orb try a quick teston any issue.
| Var | Purpose |
|---|---|
ORB_API_KEY |
Orb Cloud bearer (auto-registered via ORB_REGISTER_EMAIL if blank) |
GITHUB_TOKEN |
PAT with repo + issues + pull_requests scope |
GITHUB_REPO |
owner/name — the single target repo this agent watches |
WEBHOOK_SECRET |
Shared secret for HMAC-SHA256 webhook signing (auto-generated if blank) |
ANTHROPIC_API_KEY or ANTHROPIC_AUTH_TOKEN + ANTHROPIC_BASE_URL |
LLM. Proxy auth-token path supports Z.AI, OpenRouter, etc. |
DAILY_COST_CAP_USD (optional, default 5) |
Watchdog trip threshold |
ANTHROPIC_MODEL (optional) |
Defaults to claude-opus-4-7 |
Every PR clears this chain before it opens. Hard failures block the PR outright; soft failures become warnings in the PR body.
- scope (hard) — reject if the diff is > 20 files or > 500 LOC
- secret_scan (hard) — pattern + entropy detection of GH/AWS/Anthropic/OpenAI/Stripe/Slack/Google/JWT/PEM tokens in added lines
- new_tests (hard) — if the src diff is > 20 LOC, at least one test file must have added lines
- build (hard) — stack-aware build command must exit 0
- tests (hard) — existing + new tests must pass
- lint (soft) — warnings render as
⚠in the PR body - typecheck (hard) —
tsc --noEmit/mypy/ etc. must pass - mutation (hard) — revert src changes, re-run tests; they MUST go red (proves the new tests actually exercise the fix)
- self_review (hard) — second Claude call reads the diff cold, says whether it matches the task and what the top risk is
- red_team (hard) — third Claude call tries to find 3 substantive flaws; any real flaw blocks the PR
The orchestrator self-monitors via a jittered scheduler:
- canary (hourly) — clone + fetch + detect + test on the live repo. Failure notifies the operator; user tasks keep flowing.
- cost-watchdog (10 min) — polls
/v1/usage, trips sticky atDAILY_COST_CAP_USD, refuses new tasks until acknowledged. - reaper (5 min) — cancels any in-flight task older than 2 hours.
GET /logs returns a read-only HTML dashboard of the last 500 events, auto-refreshing every 5 seconds.
- Not a SaaS. No control plane, no ops page. If it breaks on your computer, the logs are on your computer.
- Not a replacement for code review. The verifier catches mechanical regressions; a human still reads the PR.
- No UI that isn't GitHub or
/logs. If you want a fancy dashboard, build one — the registry + log store both expose simple TS APIs.
MIT.