resilient-server-ops

Weak connection protocol — interruption-safe patterns for SSH, deploys, batch jobs, mobile/train work, and any remote operation where the agent session may drop mid-flight.

A Claude Code / agent skill that codifies interruption-safe execution patterns for working with remote servers — atomic writes, nohup-backed long jobs, idempotency, incremental phases, and breadcrumb recovery.

Keywords / search terms

Install this skill if you encounter any of these:

weak connection · weak signal · flaky wifi · flaky network · unstable connection · unstable network · mobile hotspot · traveling · on a train · intermittent ssh · broken pipe · connection drops · signal drops · interruption-safe · atomic writes · nohup ssh · idempotent ops · resume after disconnect · long-running ssh · overnight batch job · recovery checkpoints · multi-agent coordination · ci/cd interruption · agent crash recovery · partial deploy recovery

Why this is packaged as a skill (not a memory note or system prompt)

The natural first instinct is to put resilience rules in your memory (per-project Claude context) or in a CLAUDE.md file. Both work for the parent agent — but they don't reach subagents.

When the parent agent delegates work via the Agent tool (Explore, general-purpose, custom subagents), the subagent receives:

A fresh context window (no parent's memory)
The task description (what you wrote in the prompt field)
The skills available in the same skill registry

The subagent does not see the parent's memory files, CLAUDE.md, or conversation history. So if the parent agent has a memory rule like "always use atomic writes when SSHing", the subagent has no idea — it'll happily run ssh host "cat > /etc/conf" and corrupt your config when its own session blips.

A skill is the only persistence mechanism that:

Reaches subagents — when the subagent's task description matches the skill's trigger description, the subagent invokes the skill and reads its body. This is the primary win.
Crosses projects — a skill installed via npx skills add owner/resilient-server-ops works in any Claude Code project, not just the one that defined it.
Crosses agents — works in Claude Code, Claude Desktop, Cursor, any harness that supports the agent skills protocol.
Is shareable — can be linked from skills.sh, distributed via GitHub, included in agent stack templates.

For these reasons, resilience patterns belong in a skill even if you also have them in memory. Memory keeps your parent agent on rails. The skill keeps every other agent in your ecosystem on rails too.

Use cases this skill addresses

The patterns generalize across these scenarios:

Scenario	Why it needs the patterns
User on mobile / train with weak signal	Connection drops mid-command, agent loses session
Long batch jobs (overnight fetches, multi-hour migrations)	SSH pipe times out before completion
Multi-agent coordination on shared infrastructure	Need to know what others are doing, avoid trampling
CI/CD pipelines that may be canceled or restarted	Re-runs must skip completed work
Agent runtime crashes (out-of-memory, OS update reboot)	Agent restarts; server work continues
Power loss / VS Code crashes	Same — agent dies, server lives
Production deploys	Half-applied changes are catastrophic

How to install

For yourself, locally (no GitHub publish needed):

Clone or copy this directory under .agents/skills/resilient-server-ops/ in your project. Or to your global skills dir (~/.claude/skills/ on Mac/Linux, %USERPROFILE%\.claude\skills\ on Windows) for it to apply across all projects.

For others / cross-machine (after publishing to GitHub):

npx skills add <github-owner>/resilient-server-ops

This installs the skill into the user's local skill registry. The next time their agent session starts, the skill is available.

When the skill triggers

There are two ways the skill activates:

Per-task triggering (default)

The agent invokes the skill when the user prompt or task context matches the description in SKILL.md's frontmatter. Example triggering phrases (illustrative, not exhaustive — agents should also trigger on other phrasings of the same intent):

"Deploy this to my server"
"Run this fetch overnight"
"Kick off a long job"
"This might take a while"
"Weak connection / flaky wifi / unstable network"
"I'm on the train and my signal keeps dropping"
Any SSH / git push / remote work where mid-flight interruption is plausible
etc.

The skill also self-recommends ("trigger proactively when issuing remote commands") so even when the user doesn't explicitly mention resilience, the agent should apply patterns when issuing remote work.

Session-wide activation: "weak connection protocol"

If you want the patterns applied to every server op in a session — not just when the agent decides per-task — activate the weak connection protocol explicitly:

"Activate weak connection protocol" / "weak connection mode on"
"I'm on a flaky network, apply resilience to every operation until I deactivate"

(The name "weak connection protocol" was chosen over "signal protocol" because "signal" is ambiguous — Unix signals, messaging apps, audio signals. "Weak connection" unambiguously refers to network reliability.)

The agent treats this as a session-level flag and applies the patterns uniformly. To turn it off:

"Deactivate weak connection protocol" / "weak connection mode off"
"Connection is stable, stop the resilience overhead"

A casual remark like "connection is OK now" does not deactivate — agents should treat that as a status report, not a rule lift. If you genuinely want to deactivate, use an explicit phrase.

For long-lived sessions across days, recommend recording the active flag in agent memory (e.g., a Claude Code memory note) so it persists across session restarts.

Composition with other skills

subagent-driven-development — when delegating server work to a subagent, the subagent invokes resilient-server-ops (via its triggering description) and applies the patterns inside its own context. This is the cleanest "delegation + resilience" combo.
systematic-debugging — when debugging a partially-applied operation, the recovery checklist (in SKILL.md) tells you what to check first.
executing-plans — phase-based plan execution composes naturally with the "incremental phases with disk checkpoints" pattern.

What's in this directory

SKILL.md — the skill itself (read by agents when triggered)
README.md — this file (read by humans evaluating/installing)

No bundled scripts (yet) — patterns are mostly mental discipline + a few shell idioms small enough to inline. Could grow if heavy automation patterns emerge (e.g., a wrapper script that auto-applies atomic-write semantics).

License & contribution

If this lives in a public GitHub repo: state your license here (MIT / CC0 / whatever). Contribution welcome — open an issue or PR on the source repo.

If this is internal-only, this section is moot.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

resilient-server-ops

Keywords / search terms

Why this is packaged as a skill (not a memory note or system prompt)

Use cases this skill addresses

How to install

When the skill triggers

Per-task triggering (default)

Session-wide activation: "weak connection protocol"

Composition with other skills

What's in this directory

License & contribution

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

resilient-server-ops

Keywords / search terms

Why this is packaged as a skill (not a memory note or system prompt)

Use cases this skill addresses

How to install

When the skill triggers

Per-task triggering (default)

Session-wide activation: "weak connection protocol"

Composition with other skills

What's in this directory

License & contribution

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Packages