18 skills that give Claude Code a memory, opinions, and a research department.
Zero dependencies. Just Markdown.
Your AI agent is powerful. Also:
- It forgets everything between sessions
- It decides alone, missing what other perspectives would catch
- It believes its own confident-sounding hallucinations
- It can't turn a client call into a proposal and prototype
- It drops action items and unanswered questions from meetings
- It ships code while you burn out, and nobody notices
- Its own tools break silently, and nobody checks
18 skills that fix all seven. Zero dependencies. Just Markdown.
git clone https://github.com/vasilievyakov/miracle-infrastructure.git
cd miracle-infrastructure && bash install.shTip
The installer shows an interactive menu. Pick individual packs or install everything. Existing files are backed up before overwriting. Safe to re-run.
2 minutes. Zero dependencies. No Docker, no database, no MCP server, no npm. Just Markdown and bash.
- What 1,169 Sessions Taught Us
- Packs -- 7 packs, 17 skills
- How It Compares
- For AI Researchers
- Architecture
- Background
- This Repo is Alive
Every tool's README promises the moon. Here is what we learned across 1,169 sessions and 10 projects over 6 months of daily use. Including the parts where we were wrong.
Workshop + Nervous System. Not Iron Man's suit. We explicitly rejected the "AI superpower" metaphor after months of actual use.
A workshop: tools in their places, each one purpose-built, and you know which one to grab without thinking. A nervous system: agents as extensions of your thinking, capturing observations, loading context, prompting you to save what matters. The system is proactive. Rules make the agent act without being asked. auto-observe captures decisions as they happen. session-start loads context before you request it. session-end reminds you to save.
Agents are precision instruments. Not general-purpose cannons. Sometimes you need automatic transmission, sometimes manual. Sometimes the handbrake, sometimes the brake pedal. Stack should be a function of the task you are solving, not what you already have.
graph LR
T[Your task] --> Q{What do you need?}
Q -->|Remember| M["/session-save\n/search-memory"]
Q -->|Decide| D["/directors\n/frameworks"]
Q -->|Verify| R["/research\n/triangulate"]
Q -->|Execute| O["/orchestrate\n2-4 agents"]
Q -->|Extract| A["/action-items\n/proposal"]
Q -->|"I'm stuck"| U["/unstuck\ndeep interview"]
style T fill:#1e1e2e,stroke:#cba6f7,color:#cdd6f4
style Q fill:#1e1e2e,stroke:#89b4fa,color:#cdd6f4
style M fill:#1e1e2e,stroke:#a6e3a1,color:#a6e3a1
style D fill:#1e1e2e,stroke:#f9e2af,color:#f9e2af
style R fill:#1e1e2e,stroke:#89b4fa,color:#89b4fa
style O fill:#1e1e2e,stroke:#cba6f7,color:#cba6f7
style A fill:#1e1e2e,stroke:#f38ba8,color:#f38ba8
style U fill:#1e1e2e,stroke:#fab387,color:#fab387
Progressive disclosure. Only load what is needed. 100 observations across 10 projects: a search costs ~4,000 tokens. Without progressive disclosure: ~15,000 tokens.
graph TD
A["MEMORY.md<br/><b>~200 tokens</b><br/>every session"] -->|project mentioned| B["project.md<br/><b>~800 tokens</b><br/>on demand"]
B -->|search query| C["observations Index<br/><b>~40 tokens/row</b><br/>scan titles only"]
C -->|match found| D["observations Details<br/><b>~150 tokens/row</b><br/>full context"]
style A fill:#1e1e2e,stroke:#a6e3a1,color:#a6e3a1
style B fill:#1e1e2e,stroke:#89b4fa,color:#89b4fa
style C fill:#1e1e2e,stroke:#f9e2af,color:#f9e2af
style D fill:#1e1e2e,stroke:#f38ba8,color:#f38ba8
What actually gets used (usage stats from 1,169 sessions)
/session-save runs at the end of roughly 70% of sessions. The other 30% are quick questions that do not produce anything worth remembering.
auto-observe captures 1 to 3 observations per session automatically. You do not invoke it. It watches for decisions, bugfixes, discoveries, and problems, then appends them to the project's observation log. The most valuable observation types turned out to be decisions and discoveries. These are knowledge types impossible to reconstruct from code alone. "Why we chose A over B" and "API actually limits 100 req/min." Things that evaporate unless written down in the moment.
/search-memory gets used 2 to 3 times per week. The typical query is something like "what did we decide about auth?" The memory system prevented the same JWT-vs-sessions debate from happening 4 times. Four times the agent would have proposed a solution that had already been evaluated and rejected. External declarative memory prevents re-derivation of previously settled conclusions. That alone justified building all of this.
/directors gets called for any project above $5k or any architectural decision. A board of experts arguing is cheaper than one real regret.
/frameworks when starting a new project phase. It picks the relevant subset of 50 frameworks based on your stage. Not all 50 at once. That would be insane.
/orchestrate for tasks that need parallel research and implementation. Researcher finds the information, Developer writes the code, Tester validates it. Simultaneously.
/research and /triangulate for any claim that sounds too good. Trust, then verify. Or just verify.
/unstuck when you know something is wrong and you cannot put your finger on it. The adaptive interview finds the real question hiding behind the one you asked. Turns out, the hardest part of solving a problem is knowing what the problem actually is.
The dead ends (honesty about failures)
Long prompts worked worse than short ones. We invested serious effort into comprehensive system prompts that anticipated every scenario. The result was worse output. The model gets confused by instruction overload the same way a person does.
Universal "do everything" skills do not work. We tried building a single skill that handled research, analysis, and recommendations. Specialization wins. Every time. A /research skill and a /triangulate skill outperform a single /research-and-verify skill.
Adding all possible skills without deliberation produces noise. More tools does not mean more capability. It means more context consumed, more busywork, more tokens burned on irrelevant processing. "Agents are precision instruments for jeweler's tasks. When you unleash all agents at once, you get work for work's sake and tokens burned inefficiently."
Mass agent launches are wasteful. A full board evaluating your grocery list is a waste. The car metaphor applies here: you do not floor the accelerator in a parking lot.
Why Markdown and not a database
Zero dependencies. Works offline. Version-controllable. Readable by humans.
graph LR
subgraph "Vector DB approach"
V1[Your code] --> V2[MCP Server] --> V3[Qdrant/Pinecone] --> V4[Docker]
V4 --> V5["maintenance<br/>debugging<br/>migrations"]
end
subgraph "This approach"
M1[Your code] --> M2["Markdown files<br/>in ~/.claude/"]
M2 --> M3["done."]
end
style V1 fill:#1e1e2e,stroke:#f38ba8,color:#cdd6f4
style V2 fill:#1e1e2e,stroke:#f38ba8,color:#cdd6f4
style V3 fill:#1e1e2e,stroke:#f38ba8,color:#cdd6f4
style V4 fill:#1e1e2e,stroke:#f38ba8,color:#cdd6f4
style V5 fill:#1e1e2e,stroke:#f38ba8,color:#f38ba8
style M1 fill:#1e1e2e,stroke:#a6e3a1,color:#cdd6f4
style M2 fill:#1e1e2e,stroke:#a6e3a1,color:#cdd6f4
style M3 fill:#1e1e2e,stroke:#a6e3a1,color:#a6e3a1
A SQLite database would be faster to query. A vector store would have better semantic search. Both would require installation steps, maintenance, and debugging when they break. Markdown files in a git repo require nothing. They survive OS upgrades, editor changes, and the inevitable migration to the next AI tool.
The non-dogmatic take: "One does not interfere with the other. There are no right tools and wrong tools. Only timely usage and excessive usage." If your project genuinely needs a vector store, use one. A big mistake is trying to use nothing and thinking you are smarter than everyone. Stack should match the task.
| Pack | Skills | What it does | |
|---|---|---|---|
| π§ | Memory | 5 skills + 3 rules | Your agent remembers yesterday, last week, and that bug from three months ago |
| π‘ | Thinking | 5 skills | Virtual experts argue, frameworks analyze, and when you're stuck, a deep interview extracts what you actually need |
| π | Research | 3 skills | Web research with confidence scores, fact verification, knowledge base |
| πΌ | Business | 1 skill | Call transcript to proposal + architecture + clickable prototype |
| π | Content | 1 skill | Extract tasks from transcripts, chats, documents |
| π | Productivity | 1 skill | Weekly integral review across 4 dimensions |
| π§ | Meta | 2 skills | Skills health audit + code security review |
Your agent remembers what happened yesterday. And last week. And that bug you fixed three months ago that is about to happen again.
Prevented the same JWT-vs-sessions debate from happening 4 times. The agent kept proposing a solution that had already been evaluated and rejected. With memory, settled conclusions stay settled.
Skills: session-save search-memory memory-health memory-init project-status
Rules: session-start session-end auto-observe
How it works:
graph LR
A[Session Start] --> B[Load dossier\nfrom memory]
B --> C[Work]
C --> D[Session End]
C -.-> E[Capture events\ntyped observations]
D --> F[Prompt\n/session-save]
The memory hierarchy uses progressive disclosure to stay token-efficient:
MEMORY.md (always loaded, ~200 tokens)
|
+-- project.md (on project mention, ~800 tokens)
|
+-- project.observations.md
+-- Index (~40 tokens/row)
+-- Details (~150 tokens/row, loaded only for matches)
With 100 observations across 10 projects, a search costs ~4,000 tokens instead of ~15,000. Your context window says thank you.
Note
Start with /memory-init after installation. It auto-detects your projects and sets up the directory structure.
Virtual experts argue about your project. Each one sees everything through their unique lens: product, engineering, architecture, design, safety.
They frequently disagree. No single "best" director. The one who catches what others miss changes every time. The value is in the ensemble, not any single expert.
Skills: directors dream-team frameworks orchestrate unstuck
Directors -- agents evaluate your project in parallel
| Director | Lens |
|---|---|
| Mira Murati | Product, rapid iteration, collaborative AI, ethics |
| Ilya Sutskever | First principles, generalization, long-term horizon |
| Boris Cherny | DX, verification loops, parallelization, institutional memory |
| Andrej Karpathy | 1.0/2.0/3.0 stack, verifiability, agent-friendly architecture |
| Jony Ive | Care, emotional resonance, simplicity, material integrity |
Produces a synthesis with consensus, disagreements, top 3 critical questions, and action items.
Dream Team -- dynamic expert assembly for planning
Unlike Directors (fixed board, evaluates what exists), Dream Team dynamically selects 3-5 experts tailored to the task for planning what to build. Experts speak in first person, then debate each other's contradictions. The synthesis surfaces unresolved disagreements as explicit decision points.
The user approves the team before launch. Can swap or add experts. The mandatory skeptic rule ensures at least one voice pushes back on the premise.
Frameworks -- 50 frameworks, activated by project stage
Determines your project stage (ideation, architecture, MVP, growth, polish, safety), activates the relevant subset of 50 frameworks, applies each one specifically, and surfaces conflicts with resolution rules.
Not all 50 at once. That would be insane.
Orchestrate -- 2-4 agents in parallel
Picks from a library of specialized agents based on task keywords. Researcher + Triangulator for fact-finding. Developer + Tester for implementation. Debugger + Developer for fixing things.
Runs them in parallel, synthesizes results into a single report.
Unstuck -- deep interview when you can't articulate what you need
Diagnoses the type of stuck (fog, choice paralysis, false dilemma, blind spot, information hunger), runs an adaptive interview of 5-10 questions, then exits via synthesis ("you already knew"), targeted research, or reframing the problem entirely.
Builds a preference profile over time. The more you use it, the faster it gets to the point.
Your agent checks its homework.
Skills: researching-web triangulate learned-lessons
Research does web search with source scoring, contradiction detection, and confidence breakdown. Catches when vendor benchmarks contradict independent tests, before you base an architectural decision on marketing claims.
Triangulate verifies claims through 3+ independent sources. Classifies each claim as fact, opinion, or prediction. Shows exactly where the confidence comes from. Flags echo bias when sources share the same ecosystem.
Learned Lessons keeps a knowledge base of solved problems. After you debug something with web search, it offers to record the solution. Next time a similar problem shows up, it checks the knowledge base first. Your agent stops googling the same error twice.
Different approaches, not competitors. "There are no right tools and wrong tools. Only timely usage and excessive usage."
| Feature | Miracle Infrastructure | memory-bank | claudemem |
|---|---|---|---|
| Zero dependencies | β | β MCP | β MCP |
| Setup under 2 min | β | β | β |
| Token efficient | β Progressive disclosure | β | β |
| Typed observations | β 5 types + custom | β | β |
| Self-validating | β Integrity tests | β | β |
| Decision making | β Directors + Frameworks | β | β |
| Research tools | β 3 skills | β | β |
| Works offline | β | β | β |
Each of these tools made a deliberate set of tradeoffs. MCP-based systems get tighter integration with external services. Database-backed systems get faster queries at scale. We chose zero dependencies and Markdown because we value portability, readability, and not debugging infrastructure when we should be debugging code. Your situation might call for something different.
Important
This comparison reflects our understanding of these tools as of February 2026. Features may have changed. If you maintain one of these projects and something is inaccurate, please open an issue.
If you work on agentic systems, memory architectures, or human-AI interaction, several findings from 1,169 sessions may be worth your attention.
Progressive disclosure as manual RAG without a vector store. The memory hierarchy (MEMORY.md > project dossiers > observation indices > observation details) implements retrieval-augmented generation through file structure alone. Token cost scales with query specificity, not corpus size. No embeddings, no similarity search, no infrastructure. The tradeoff is obvious: it requires human-designed structure. The benefit is equally obvious: zero failure modes from retrieval errors.
auto-observe as episodic memory. The observation system captures typed episodic memories (decision, bugfix, feature, discovery, problem) with mandatory context fields (Before/After, Why). This creates a structured episodic memory that an LLM can query through pattern matching on the index. The most valuable observation types were decisions and discoveries, both of which are impossible to reconstruct from code artifacts alone.
Mixture-of-experts through prompting. The Directors system runs one LLM through 5 different system prompts (product, engineering, UX, business, safety lenses). They frequently disagree. Genuinely. No single director consistently outperforms the others. The one who catches what others miss changes every time. This suggests that system prompt variation creates meaningful diversity in reasoning, analogous to mixture-of-experts architectures, without requiring separate models or fine-tuning.
Constraints improve output quality. Across 1,169 sessions, structured and constrained prompts consistently outperformed open-ended ones. Shorter prompts outperformed longer prompts. This parallels findings in instruction tuning: specificity and structure in the prompt matter more than exhaustive coverage.
External declarative memory prevents re-derivation. Without persistent memory, the agent re-derives conclusions from first principles each session. It proposes solutions already rejected, re-evaluates tradeoffs already settled. External declarative memory (project dossiers with recorded decisions) eliminates this re-derivation, functioning as a persistent belief store that survives context window boundaries.
The human is the bottleneck. After 6 months of daily use, the consistent finding is that system performance is limited by the human operator, not the model. The human's context window is smaller, attention is less reliable, and working memory is more fragile. The system's most impactful features are the ones that compensate for human limitations: auto-loading context, auto-capturing decisions, prompting to save state.
How progressive disclosure saves tokens
| What loads | When | Cost |
|---|---|---|
| MEMORY.md | Every session | ~200 tokens |
| project.md | On project mention | ~800 tokens |
| observations Index | On search | ~40 tokens/row |
| observations Details | Only for matches | ~150 tokens/row |
100 observations, searching by type: ~4,000 tokens loaded. Without progressive disclosure: ~15,000 tokens. The difference compounds across sessions.
Extension points
- Add observation types: edit
memory-config.json - Add directors: follow the system prompt pattern in
directors/SKILL.md - Add frameworks: add to any category, assign to stages
- Add agents: edit
agents-library.json - Custom dossier sections: add any
## Sectionto a dossier file
System architecture
See ARCHITECTURE.md for full system diagrams, including:
- Session lifecycle diagram
- Memory hierarchy with token costs
- Data flow between components
- File structure reference
It started simply: a solo developer got tired of re-explaining his own codebase to his own AI agent. Every Monday morning, same questions. Every architectural decision, re-debated. Every bugfix, forgotten by the next session.
Over 6 months and 1,169 sessions across 10 projects, that frustration became a philosophy about human-AI symbiosis. Not the "AI does everything" version. Not the "AI is just autocomplete" version. The version where a human with a workshop full of the right tools and an agent that extends their thinking can produce work that neither could alone.
The tools grew one at a time, each one solving a specific friction. Memory came first (stop re-explaining). Directors came next (stop making architectural decisions alone at 2am). Research followed (stop trusting the agent's confident-sounding hallucinations). Business skills last (stop losing client insights between the call and the proposal).
Every skill that survived earned its place through repeated use. The ones that did not make the cut, the universal "do everything" skill, the exhaustive system prompts, the mass agent launches, taught us something equally valuable: agents are precision instruments for jeweler's tasks, not sledgehammers.
The name "Miracle Infrastructure" comes from the original project name. The miracle is that it works with zero dependencies.
New skills get added as they prove themselves in daily use. If something works across multiple projects and multiple months, it earns a place here. If it stops being useful, it gets removed. The bar is real usage, not theoretical value.
What's coming: more packs, more field notes, better install experience. The best things I build end up here.
If this is useful to you, two things help me keep going:
Stars tell me someone finds this useful. The Telegram channel is where I write about AI, human-AI workflows, and the philosophy behind these tools.
MIT. Do whatever you want with it.