Skip to content

vasilievyakov/miracle-infrastructure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

73 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Miracle Infrastructure

18 skills that give Claude Code a memory, opinions, and a research department.
Zero dependencies. Just Markdown.

License: MIT 18 Skills 7 Packs Zero Dependencies 1169+ Sessions

GitHub Stars GitHub Forks Last Commit


The Problem

Your AI agent is powerful. Also:

  • It forgets everything between sessions
  • It decides alone, missing what other perspectives would catch
  • It believes its own confident-sounding hallucinations
  • It can't turn a client call into a proposal and prototype
  • It drops action items and unanswered questions from meetings
  • It ships code while you burn out, and nobody notices
  • Its own tools break silently, and nobody checks

18 skills that fix all seven. Zero dependencies. Just Markdown.

Install

git clone https://github.com/vasilievyakov/miracle-infrastructure.git
cd miracle-infrastructure && bash install.sh

Installation

Tip

The installer shows an interactive menu. Pick individual packs or install everything. Existing files are backed up before overwriting. Safe to re-run.

2 minutes. Zero dependencies. No Docker, no database, no MCP server, no npm. Just Markdown and bash.


Contents


What 1,169 Sessions Actually Taught Us

Every tool's README promises the moon. Here is what we learned across 1,169 sessions and 10 projects over 6 months of daily use. Including the parts where we were wrong.

Structure beats Freedom

Structure vs Freedom
Typed observations: useful 6 months later.
Free-form notes: unnavigable garbage.

Precision, not Power

Precision vs Power
Right tool for the task.
$0.35 vs $4.20 for the same answer.

Memory prevents re-derivation

Why Memory Matters
Settled conclusions stay settled.
Zero minutes wasted.

The operator is the bottleneck

Bottleneck Analysis
The system compensates for you.
Not the other way.

The Philosophy

Workshop + Nervous System. Not Iron Man's suit. We explicitly rejected the "AI superpower" metaphor after months of actual use.

A workshop: tools in their places, each one purpose-built, and you know which one to grab without thinking. A nervous system: agents as extensions of your thinking, capturing observations, loading context, prompting you to save what matters. The system is proactive. Rules make the agent act without being asked. auto-observe captures decisions as they happen. session-start loads context before you request it. session-end reminds you to save.

Agents are precision instruments. Not general-purpose cannons. Sometimes you need automatic transmission, sometimes manual. Sometimes the handbrake, sometimes the brake pedal. Stack should be a function of the task you are solving, not what you already have.

graph LR
    T[Your task] --> Q{What do you need?}
    Q -->|Remember| M["/session-save\n/search-memory"]
    Q -->|Decide| D["/directors\n/frameworks"]
    Q -->|Verify| R["/research\n/triangulate"]
    Q -->|Execute| O["/orchestrate\n2-4 agents"]
    Q -->|Extract| A["/action-items\n/proposal"]
    Q -->|"I'm stuck"| U["/unstuck\ndeep interview"]

    style T fill:#1e1e2e,stroke:#cba6f7,color:#cdd6f4
    style Q fill:#1e1e2e,stroke:#89b4fa,color:#cdd6f4
    style M fill:#1e1e2e,stroke:#a6e3a1,color:#a6e3a1
    style D fill:#1e1e2e,stroke:#f9e2af,color:#f9e2af
    style R fill:#1e1e2e,stroke:#89b4fa,color:#89b4fa
    style O fill:#1e1e2e,stroke:#cba6f7,color:#cba6f7
    style A fill:#1e1e2e,stroke:#f38ba8,color:#f38ba8
    style U fill:#1e1e2e,stroke:#fab387,color:#fab387
Loading

Progressive disclosure. Only load what is needed. 100 observations across 10 projects: a search costs ~4,000 tokens. Without progressive disclosure: ~15,000 tokens.

graph TD
    A["MEMORY.md<br/><b>~200 tokens</b><br/>every session"] -->|project mentioned| B["project.md<br/><b>~800 tokens</b><br/>on demand"]
    B -->|search query| C["observations Index<br/><b>~40 tokens/row</b><br/>scan titles only"]
    C -->|match found| D["observations Details<br/><b>~150 tokens/row</b><br/>full context"]

    style A fill:#1e1e2e,stroke:#a6e3a1,color:#a6e3a1
    style B fill:#1e1e2e,stroke:#89b4fa,color:#89b4fa
    style C fill:#1e1e2e,stroke:#f9e2af,color:#f9e2af
    style D fill:#1e1e2e,stroke:#f38ba8,color:#f38ba8
Loading
What actually gets used (usage stats from 1,169 sessions)

/session-save runs at the end of roughly 70% of sessions. The other 30% are quick questions that do not produce anything worth remembering.

auto-observe captures 1 to 3 observations per session automatically. You do not invoke it. It watches for decisions, bugfixes, discoveries, and problems, then appends them to the project's observation log. The most valuable observation types turned out to be decisions and discoveries. These are knowledge types impossible to reconstruct from code alone. "Why we chose A over B" and "API actually limits 100 req/min." Things that evaporate unless written down in the moment.

/search-memory gets used 2 to 3 times per week. The typical query is something like "what did we decide about auth?" The memory system prevented the same JWT-vs-sessions debate from happening 4 times. Four times the agent would have proposed a solution that had already been evaluated and rejected. External declarative memory prevents re-derivation of previously settled conclusions. That alone justified building all of this.

/directors gets called for any project above $5k or any architectural decision. A board of experts arguing is cheaper than one real regret.

/frameworks when starting a new project phase. It picks the relevant subset of 50 frameworks based on your stage. Not all 50 at once. That would be insane.

/orchestrate for tasks that need parallel research and implementation. Researcher finds the information, Developer writes the code, Tester validates it. Simultaneously.

/research and /triangulate for any claim that sounds too good. Trust, then verify. Or just verify.

/unstuck when you know something is wrong and you cannot put your finger on it. The adaptive interview finds the real question hiding behind the one you asked. Turns out, the hardest part of solving a problem is knowing what the problem actually is.

The dead ends (honesty about failures)

Long prompts worked worse than short ones. We invested serious effort into comprehensive system prompts that anticipated every scenario. The result was worse output. The model gets confused by instruction overload the same way a person does.

Universal "do everything" skills do not work. We tried building a single skill that handled research, analysis, and recommendations. Specialization wins. Every time. A /research skill and a /triangulate skill outperform a single /research-and-verify skill.

Adding all possible skills without deliberation produces noise. More tools does not mean more capability. It means more context consumed, more busywork, more tokens burned on irrelevant processing. "Agents are precision instruments for jeweler's tasks. When you unleash all agents at once, you get work for work's sake and tokens burned inefficiently."

Mass agent launches are wasteful. A full board evaluating your grocery list is a waste. The car metaphor applies here: you do not floor the accelerator in a parking lot.

Why Markdown and not a database

Zero dependencies. Works offline. Version-controllable. Readable by humans.

graph LR
    subgraph "Vector DB approach"
        V1[Your code] --> V2[MCP Server] --> V3[Qdrant/Pinecone] --> V4[Docker]
        V4 --> V5["maintenance<br/>debugging<br/>migrations"]
    end

    subgraph "This approach"
        M1[Your code] --> M2["Markdown files<br/>in ~/.claude/"]
        M2 --> M3["done."]
    end

    style V1 fill:#1e1e2e,stroke:#f38ba8,color:#cdd6f4
    style V2 fill:#1e1e2e,stroke:#f38ba8,color:#cdd6f4
    style V3 fill:#1e1e2e,stroke:#f38ba8,color:#cdd6f4
    style V4 fill:#1e1e2e,stroke:#f38ba8,color:#cdd6f4
    style V5 fill:#1e1e2e,stroke:#f38ba8,color:#f38ba8
    style M1 fill:#1e1e2e,stroke:#a6e3a1,color:#cdd6f4
    style M2 fill:#1e1e2e,stroke:#a6e3a1,color:#cdd6f4
    style M3 fill:#1e1e2e,stroke:#a6e3a1,color:#a6e3a1
Loading

A SQLite database would be faster to query. A vector store would have better semantic search. Both would require installation steps, maintenance, and debugging when they break. Markdown files in a git repo require nothing. They survive OS upgrades, editor changes, and the inevitable migration to the next AI tool.

The non-dogmatic take: "One does not interfere with the other. There are no right tools and wrong tools. Only timely usage and excessive usage." If your project genuinely needs a vector store, use one. A big mistake is trying to use nothing and thinking you are smarter than everyone. Stack should match the task.


Packs

Pack Skills What it does
🧠 Memory 5 skills + 3 rules Your agent remembers yesterday, last week, and that bug from three months ago
πŸ’‘ Thinking 5 skills Virtual experts argue, frameworks analyze, and when you're stuck, a deep interview extracts what you actually need
πŸ” Research 3 skills Web research with confidence scores, fact verification, knowledge base
πŸ’Ό Business 1 skill Call transcript to proposal + architecture + clickable prototype
πŸ“‹ Content 1 skill Extract tasks from transcripts, chats, documents
πŸ“Š Productivity 1 skill Weekly integral review across 4 dimensions
πŸ”§ Meta 2 skills Skills health audit + code security review

Memory

Your agent remembers what happened yesterday. And last week. And that bug you fixed three months ago that is about to happen again.

Prevented the same JWT-vs-sessions debate from happening 4 times. The agent kept proposing a solution that had already been evaluated and rejected. With memory, settled conclusions stay settled.

Skills: session-save search-memory memory-health memory-init project-status Rules: session-start session-end auto-observe

Session Save

How it works:

graph LR
    A[Session Start] --> B[Load dossier\nfrom memory]
    B --> C[Work]
    C --> D[Session End]
    C -.-> E[Capture events\ntyped observations]
    D --> F[Prompt\n/session-save]
Loading

The memory hierarchy uses progressive disclosure to stay token-efficient:

MEMORY.md (always loaded, ~200 tokens)
    |
    +-- project.md (on project mention, ~800 tokens)
    |
    +-- project.observations.md
        +-- Index (~40 tokens/row)
        +-- Details (~150 tokens/row, loaded only for matches)

With 100 observations across 10 projects, a search costs ~4,000 tokens instead of ~15,000. Your context window says thank you.

Note

Start with /memory-init after installation. It auto-detects your projects and sets up the directory structure.

Full documentation β†’


Thinking

Virtual experts argue about your project. Each one sees everything through their unique lens: product, engineering, architecture, design, safety.

They frequently disagree. No single "best" director. The one who catches what others miss changes every time. The value is in the ensemble, not any single expert.

Skills: directors dream-team frameworks orchestrate unstuck

Directors Board

Directors -- agents evaluate your project in parallel
Director Lens
Mira Murati Product, rapid iteration, collaborative AI, ethics
Ilya Sutskever First principles, generalization, long-term horizon
Boris Cherny DX, verification loops, parallelization, institutional memory
Andrej Karpathy 1.0/2.0/3.0 stack, verifiability, agent-friendly architecture
Jony Ive Care, emotional resonance, simplicity, material integrity

Produces a synthesis with consensus, disagreements, top 3 critical questions, and action items.

Dream Team -- dynamic expert assembly for planning

Unlike Directors (fixed board, evaluates what exists), Dream Team dynamically selects 3-5 experts tailored to the task for planning what to build. Experts speak in first person, then debate each other's contradictions. The synthesis surfaces unresolved disagreements as explicit decision points.

The user approves the team before launch. Can swap or add experts. The mandatory skeptic rule ensures at least one voice pushes back on the premise.

Frameworks -- 50 frameworks, activated by project stage

Determines your project stage (ideation, architecture, MVP, growth, polish, safety), activates the relevant subset of 50 frameworks, applies each one specifically, and surfaces conflicts with resolution rules.

Not all 50 at once. That would be insane.

Orchestrate -- 2-4 agents in parallel

Picks from a library of specialized agents based on task keywords. Researcher + Triangulator for fact-finding. Developer + Tester for implementation. Debugger + Developer for fixing things.

Runs them in parallel, synthesizes results into a single report.

Unstuck -- deep interview when you can't articulate what you need

Diagnoses the type of stuck (fog, choice paralysis, false dilemma, blind spot, information hunger), runs an adaptive interview of 5-10 questions, then exits via synthesis ("you already knew"), targeted research, or reframing the problem entirely.

Builds a preference profile over time. The more you use it, the faster it gets to the point.

Full documentation β†’


Research

Your agent checks its homework.

Skills: researching-web triangulate learned-lessons

Research

Research does web search with source scoring, contradiction detection, and confidence breakdown. Catches when vendor benchmarks contradict independent tests, before you base an architectural decision on marketing claims.

Triangulate verifies claims through 3+ independent sources. Classifies each claim as fact, opinion, or prediction. Shows exactly where the confidence comes from. Flags echo bias when sources share the same ecosystem.

Learned Lessons keeps a knowledge base of solved problems. After you debug something with web search, it offers to record the solution. Next time a similar problem shows up, it checks the knowledge base first. Your agent stops googling the same error twice.

Full documentation β†’


Business

From "we had a call" to "here is the proposal, architecture, and clickable prototype."

Skill: transcript-to-proposal

Transcript to Proposal

Give it a product description and a call transcript. It extracts pains, maps them to features, generates a proposal using the client's own words, builds system architecture, and creates an interactive HTML prototype. You review and confirm at checkpoints before it proceeds. No fully autonomous pipeline.

"LLMs catch what you missed in conversation. The prototype built with the client's own words leaves an unforgettable impression."

Full docs β†’

Content

Nobody reads meeting transcripts twice. This skill reads them once and extracts everything actionable.

Skill: action-items

Action Items

Handles .txt transcripts, chat exports (JSON/HTML), PDFs, raw text. Produces a prioritized checklist with assignees, deadlines, and source quotes. The hidden superpower: catches unanswered questions that everyone agreed were important and nobody followed up on.

Full docs β†’

Productivity

A weekly review that looks at more than your commit count.

Skill: aqal-review

AQAL Review

Uses the AQAL integral model to evaluate progress across 4 quadrants (interior/exterior, individual/collective) and 5 development lines. Tracks trends over weeks. Catches the silent killer: IT quadrant climbing while WE quadrant stagnates. You ship code, team communication atrophies, and nobody sees it until it's too late.

Full docs β†’

Meta

Audits for your infrastructure and code.

Skills: skill-checkup, miracle-security

Skill Checkup

skill-checkup validates file references, frontmatter, trigger uniqueness, and dependency drift. miracle-security runs 5 parallel agents for code security review or 4 for enterprise assessment, with threat model calibration.

Full docs β†’


How It Compares

Different approaches, not competitors. "There are no right tools and wrong tools. Only timely usage and excessive usage."

Feature Miracle Infrastructure memory-bank claudemem
Zero dependencies βœ… ❌ MCP ❌ MCP
Setup under 2 min βœ… ❌ βœ…
Token efficient βœ… Progressive disclosure ❌ ❌
Typed observations βœ… 5 types + custom ❌ ❌
Self-validating βœ… Integrity tests ❌ ❌
Decision making βœ… Directors + Frameworks ❌ ❌
Research tools βœ… 3 skills ❌ ❌
Works offline βœ… ❌ ❌

Each of these tools made a deliberate set of tradeoffs. MCP-based systems get tighter integration with external services. Database-backed systems get faster queries at scale. We chose zero dependencies and Markdown because we value portability, readability, and not debugging infrastructure when we should be debugging code. Your situation might call for something different.

Important

This comparison reflects our understanding of these tools as of February 2026. Features may have changed. If you maintain one of these projects and something is inaccurate, please open an issue.


For AI Researchers

If you work on agentic systems, memory architectures, or human-AI interaction, several findings from 1,169 sessions may be worth your attention.

Research Concept Mapping

Progressive disclosure as manual RAG without a vector store. The memory hierarchy (MEMORY.md > project dossiers > observation indices > observation details) implements retrieval-augmented generation through file structure alone. Token cost scales with query specificity, not corpus size. No embeddings, no similarity search, no infrastructure. The tradeoff is obvious: it requires human-designed structure. The benefit is equally obvious: zero failure modes from retrieval errors.

auto-observe as episodic memory. The observation system captures typed episodic memories (decision, bugfix, feature, discovery, problem) with mandatory context fields (Before/After, Why). This creates a structured episodic memory that an LLM can query through pattern matching on the index. The most valuable observation types were decisions and discoveries, both of which are impossible to reconstruct from code artifacts alone.

Mixture-of-experts through prompting. The Directors system runs one LLM through 5 different system prompts (product, engineering, UX, business, safety lenses). They frequently disagree. Genuinely. No single director consistently outperforms the others. The one who catches what others miss changes every time. This suggests that system prompt variation creates meaningful diversity in reasoning, analogous to mixture-of-experts architectures, without requiring separate models or fine-tuning.

Constraints improve output quality. Across 1,169 sessions, structured and constrained prompts consistently outperformed open-ended ones. Shorter prompts outperformed longer prompts. This parallels findings in instruction tuning: specificity and structure in the prompt matter more than exhaustive coverage.

External declarative memory prevents re-derivation. Without persistent memory, the agent re-derives conclusions from first principles each session. It proposes solutions already rejected, re-evaluates tradeoffs already settled. External declarative memory (project dossiers with recorded decisions) eliminates this re-derivation, functioning as a persistent belief store that survives context window boundaries.

The human is the bottleneck. After 6 months of daily use, the consistent finding is that system performance is limited by the human operator, not the model. The human's context window is smaller, attention is less reliable, and working memory is more fragile. The system's most impactful features are the ones that compensate for human limitations: auto-loading context, auto-capturing decisions, prompting to save state.


For the Curious

How progressive disclosure saves tokens
What loads When Cost
MEMORY.md Every session ~200 tokens
project.md On project mention ~800 tokens
observations Index On search ~40 tokens/row
observations Details Only for matches ~150 tokens/row

100 observations, searching by type: ~4,000 tokens loaded. Without progressive disclosure: ~15,000 tokens. The difference compounds across sessions.

Extension points
  • Add observation types: edit memory-config.json
  • Add directors: follow the system prompt pattern in directors/SKILL.md
  • Add frameworks: add to any category, assign to stages
  • Add agents: edit agents-library.json
  • Custom dossier sections: add any ## Section to a dossier file

Full extension guide β†’

System architecture

See ARCHITECTURE.md for full system diagrams, including:

  • Session lifecycle diagram
  • Memory hierarchy with token costs
  • Data flow between components
  • File structure reference

Background

It started simply: a solo developer got tired of re-explaining his own codebase to his own AI agent. Every Monday morning, same questions. Every architectural decision, re-debated. Every bugfix, forgotten by the next session.

Over 6 months and 1,169 sessions across 10 projects, that frustration became a philosophy about human-AI symbiosis. Not the "AI does everything" version. Not the "AI is just autocomplete" version. The version where a human with a workshop full of the right tools and an agent that extends their thinking can produce work that neither could alone.

The tools grew one at a time, each one solving a specific friction. Memory came first (stop re-explaining). Directors came next (stop making architectural decisions alone at 2am). Research followed (stop trusting the agent's confident-sounding hallucinations). Business skills last (stop losing client insights between the call and the proposal).

Every skill that survived earned its place through repeated use. The ones that did not make the cut, the universal "do everything" skill, the exhaustive system prompts, the mass agent launches, taught us something equally valuable: agents are precision instruments for jeweler's tasks, not sledgehammers.

The name "Miracle Infrastructure" comes from the original project name. The miracle is that it works with zero dependencies.

This Repo is Alive

New skills get added as they prove themselves in daily use. If something works across multiple projects and multiple months, it earns a place here. If it stops being useful, it gets removed. The bar is real usage, not theoretical value.

What's coming: more packs, more field notes, better install experience. The best things I build end up here.

If this is useful to you, two things help me keep going:

Star on GitHub Β Β  Telegram Channel

Stars tell me someone finds this useful. The Telegram channel is where I write about AI, human-AI workflows, and the philosophy behind these tools.


License

MIT. Do whatever you want with it.

Get Started

About

17 skills for Claude Code: memory, decision-making, research, security, and business workflows. Zero dependencies.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors