Keep Claude Code on a multi-hour task without it pinging you at 3am. Tianluo is the methodology that survives context compaction, plans every fork up front, diagnoses failures structurally, and knows when to escalate versus retry — so a single instruction starts an 18-hour run that ends with one report, not a wakeup call.
Five concrete capabilities. Each one fixes a specific failure mode that breaks long-running agent runs:
| Capability | Failure it fixes |
|---|---|
Persistent state survives context compaction — state.json on shared storage, atomically updated; cron prompts rebuild context from scratch on every wake-up |
Agent forgets the task_id it just submitted, leaks an unowned external job |
Plan-time fork enumeration — every future user-decision is listed as Q1..Qn before the run starts, answered once, then the run goes to terminal state without further questions |
Agent pauses at hour 4 asking permission for routine yes/no, idles until you wake up |
| 5-layer structured diagnosis — never decide "task is stuck" from one signal; check scheduler / container / log / progress / output before classifying | Agent triggers manual_hold on slices that already finished, or retries phantom failures |
| Budget-bounded retry with escalation ladder — transient → patch → replan → human; never infinite retry, never retry a code-level bug | Agent burns hours retrying a real bug, or gives up on a recoverable transient blip |
| File-role strict separation — machine state (JSON), human plan (frozen Markdown), failure artifacts (append-only), cross-session memory (curated) — never merged | State gets polluted by natural-language narrative, can't be grep'd or hook-validated |
These are five inversions of common multi-hour-agent failure modes, distilled into a single Claude Code skill plugin.
田螺姑娘 (Tianluo Guniang) is a Chinese folk tale: a fisherman returns home each day to find his hut tidied and dinner ready. A snail-spirit had been quietly doing the work while he was at sea — never asking, never interrupting, only present when something genuinely required him.
That's the contract: quiet progress while you're away, single-report on return, a whisper only when something is genuinely out of scope.
This is a methodology skill — a SKILL.md plus reference docs and templates that teach Claude Code how to design and drive long-horizon, cron-driven tasks. It is not a framework, runtime, or executable. Nothing to install beyond the plugin itself; you bring your own scheduler, storage, and task domain.
- In scope: persistent state design, self-contained cron prompts, retry-budget ladders, multi-layer failure diagnosis, plan-time fork enumeration, file-role separation
- Out of scope: any specific cloud, scheduler, framework, or task domain. The skill speaks in generic terms ("scheduler", "shared storage", "instance × stream × role"); you map them to your stack
Where it fits: any task that runs for hours under a cron-like trigger, has multiple tracked entities, has pollable state, has classifiable failure modes, and has an auto-decidable terminal state. Concrete examples: scheduled batch jobs, multi-stage build → test → deploy → verify pipelines, parameter sweeps, multi-target health monitors, long CI suites, data-pipeline runs, training/eval workflows, simulation sweeps. The skill doesn't care which.
Where it doesn't fit: sub-10-minute tasks (just run them yourself), tasks with no external state to poll (the agent has to hold the connection), tasks where every step needs human judgement (use a different pattern).
The methodology itself is agent-agnostic — SKILL.md, the four reference docs, and the four templates are plain Markdown / JSON with no Claude-specific code. Different agents reach it via different install paths:
| Agent | Install path | Notes |
|---|---|---|
| Claude Code (CLI) | /plugin marketplace add yeewangcn/tianluo then /plugin install tianluo@tianluo |
Native, one-step. Skill loads automatically when you ask about long-running orchestration. |
| Claude.ai (web) | Settings → Capabilities → Skills → upload plugins/tianluo/skills/tianluo/ as a custom skill |
Manual upload of the skills/tianluo/ directory. |
| Claude API | Reference SKILL.md content via the /v1/skills endpoint |
Paste the SKILL.md content into a skill resource. |
| Codex / GitHub Copilot / Cursor / Aider / Continue / Gemini CLI / OpenCode / Windsurf | git clone https://github.com/yeewangcn/tianluo and point your agent at plugins/tianluo/skills/tianluo/SKILL.md as a system-prompt reference or rules file |
The methodology is just Markdown — every agent that can read a .md rules / system prompt can use it. Format may need light adaptation per agent (e.g. Cursor's .cursor/rules, Aider's .aider.conf). |
Bottom line: install via Claude Code if you can; clone-and-reference works for everyone else. Either way you get the same SKILL.md, the same reference docs, the same templates.
/plugin marketplace add yeewangcn/tianluo
/plugin install tianluo@tianluo
The skill becomes available as tianluo and Claude Code will load SKILL.md automatically when you ask about long-running orchestration.
Ask Claude something like:
"I have a 14-hour multi-stage job to run overnight across 6 variants. Use the tianluo skill to design the orchestration."
Claude will then:
- Read
SKILL.md(the 8 invariants and 5 components) - Apply
templates/plan-doc-skeleton.mdto enumerate every future fork once, up front - Wait for you to answer all
Q1...Qndecisions - Build a self-contained cron prompt from
templates/cron-prompt-skeleton.md - Drive the run autonomously, escalating to
manual_holdonly on real ambiguity
The full SKILL.md lists 8 invariants and 5 components. If you read nothing else, the four rules below are the delta vs other long-running-agent patterns (Anthropic Effective Harnesses, Manus, Plan Mode, ReWOO, StateFlow, Cursor, etc.):
The very next tool call after submit must write task_id to state.json. They form an atomic pair. If the prompt is compacted between submit and record, you've leaked an unowned external job. (This is the most common sin in multi-hour runs and the hardest to recover from.)
Before the cron starts, walk the entire task chain and list every future user-decision-fork as Q1...Qn — branch decisions, resource thresholds, dependencies, fallbacks, output destinations. The user answers them all once. The cron then runs to terminal state with zero further interruptions, except for the explicit manual_hold triggers you declared.
This is what separates tianluo from "ask when stuck" patterns. An agent that mid-run pings the user at 3am for routine confirmation is broken design, not poor execution.
Four file types, four roles, never merged:
| File | Reader | Format | Mutability |
|---|---|---|---|
state.json |
machine (cron, hooks) | JSON | high-frequency overwrite |
<exp>_plan.md |
human | Markdown | frozen at plan-time |
failures/<entity>_<n>.md |
human (postmortem) | Markdown | append-only artifacts |
~/.claude/.../memory/feedback_*.md |
future agent sessions | Markdown | curated, cross-session |
The Manus 3-file pattern (task_plan + findings + log mono-file) looks tidy but breaks here: state mutates inside a markdown narrative, you can't grep it, you can't pre-commit-hook-validate it, and natural language pollution causes self-reference drift in cron.
Plan doc is the human contract at plan-time. Once cron is running, the agent never reads its own plan.md as runtime ground truth. Runtime ground truth is cron prompt + state.json only. This is the deliberate inverse of Claude Code's Plan Mode and Manus's PreToolUse re-read pattern: those approaches risk the agent narrating itself into drift.
| File | Topic |
|---|---|
SKILL.md |
Core methodology: 5 components + 8 invariants + when to use |
reference/5-layer-debugging.md |
Scheduler / container / log / progress / output — never diagnose from one layer |
reference/retry-budget-ladder.md |
StateFlow-style 4-level escalation: transient → patch → replan → human |
reference/plan-time-fork-types.md |
The 5 fork categories + anti-patterns |
reference/industry-comparison.md |
Side-by-side with 13 other frameworks |
| Template | Use |
|---|---|
templates/state-schema.json |
Skeleton for state.json |
templates/cron-prompt-skeleton.md |
Self-contained cron prompt (500-1500 words) |
templates/plan-doc-skeleton.md |
Plan doc with Q1...Qn and Success Gate |
templates/failure-analysis.md |
What to write at manual_hold |
docs/case-studies/overnight-18h-ml-run.md — an 18-hour autonomous overnight run across a 4-stage pipeline (data prep → main job → output selection → scoring), operator asleep the entire time. Single block of terminal report next morning. Zero mid-run interruptions. Three near-miss interruptions each prevented by a specific invariant. The example domain is ML, but the same five capabilities apply identically to any other long-running multi-stage workload.
| Approach | Plan once? | Cron self-contained? | Failure artifacts | File roles | HITL model |
|---|---|---|---|---|---|
| tianluo | yes (forks enumerated) | yes | independent failures/*.md |
strict 4-way separation | conditional + plan-time enumerated |
| Anthropic Effective Harnesses | yes | partial (init.sh) |
inlined in progress.txt |
progress + git only | approval |
| Anthropic Plan Mode | yes | n/a (single-pass) | n/a | plan.md re-read | approval |
| Manus 3-file | mid-run editable | no (re-reads plan) | inlined in progress.md | mono-file | varies |
| ReWOO | fully frozen | n/a (worker-fills) | re-plan on fail | n/a | n/a |
| StateFlow | one FSM | event-driven | failure transitions | n/a | varies |
| Cursor Self-Driving | mid-run editable | judge-driven | judge artifacts | git history | audit |
| LangGraph checkpointers | mid-run editable | per-node | exception nodes | JSON snapshots | varies |
Full breakdown: reference/industry-comparison.md.
A core discipline: the skill stays pure methodology. Domain pits (one specific framework's quirks, one project's data layout) belong in your project's:
~/.claude/projects/<path>/memory/feedback_*.mdfor cross-session pits<project>/experiments/<exp>_plan.md/_findings.md/_lessons.mdfor run-specific notes
Don't bloat SKILL.md with "X framework on Y filesystem triggers race Z" — that's two-component-specific and lives in project memory.
Issues and PRs welcome. The skill is opinionated by design; if you propose adding a new invariant or component, please cite at least one concrete failure case where the existing 8 invariants weren't sufficient.
When proposing changes:
- Methodology only — no domain-specific examples in
SKILL.md(those go inreference/*.mdclearly marked as illustrative) - Keep
SKILL.md≤ 250 lines; reference docs absorb depth - Add comparison entries to
industry-comparison.mdif you reference a new framework
MIT — see LICENSE.
Built by @yeewangcn. The methodology was extracted from many failed multi-hour runs where the failure mode was always "the agent paused asking permission for something it could have asked at plan-time" or "the agent's own narrative drifted across context compactions". Each of the eight invariants in SKILL.md is the inversion of one specific recurring failure.