The "Holy Grail" of System Prompts. A synthesis of the world's best AI IDE instructions, distilled by Gemini Pro & Claude, stress-tested against real-world expert setups, and continuously refined by Obvious Works.
- Introduction & Mission
- Why Prompts Matter in Agentic Coding
- The Methodology: How We Built It
- The Showdown: Gemini vs. Claude
- The Analysis: APEX vs. THE ARCHITECT
- Round 2: APEX vs. The Real World
- The Final Result: APEX Hybrid
- Installation & Usage
- Credits & Acknowledgments
We are at the dawn of a new era in software development: Agentic Coding. Tools like Cursor, Windsurf, and Cline are revolutionizing how code is written. However, the quality of the output is only as good as the system instructions (prompts) guiding the model.
At Obvious Works, we asked ourselves: What happens if you aggregate the collective intelligence of the world's best AI coding tools and force them to compete against each other?
We undertook the effort to sift through hundreds of "Master Prompts," analyze them, and reverse-engineer a Meta-Master-Prompt that combines every advantage while eliminating the weaknesses.
This repository documents that process — including a second iteration where we benchmarked our own APEX prompt against the real-world setup of one of the most respected practitioners in the field.
In "Agentic Software Development," the LLM is no longer just a chatbot; it is an autonomous agent reading file systems, executing commands, and debugging errors.
Without precise System Prompts, models tend to exhibit:
- Laziness: Omitting code blocks or using placeholders like
// implementation here. - Hallucinations: Inventing syntax or referencing non-existent libraries.
- Context Drift: Losing track of the overall project architecture.
- Amnesia: Repeating the same mistakes across sessions with no self-correction.
A high-quality prompt acts as a Cognitive Operating System for the agent. It defines boundaries, enforces planning phases (Chain of Thought), and guarantees code quality that meets industrial standards.
Our approach was radically data-driven. Here is the workflow we executed to create this prompt:
- Data Mining: We utilized the incredible repository system-prompts-and-models-of-ai-tools as our ground truth.
- Aggregation: Using Google Antigravity, we extracted every available master prompt and consolidated them into a massive single text file (approx. 1.5 Megabytes of plain text).
- Meta-Analysis Prompting: We asked Google Antigravity to write a specific analysis prompt designed to deconstruct this massive dataset.
- The LLM Analysis:
- Run 1: Uploaded the 1.5 MB dataset to Google Gemini Pro with instructions to extract the ultimate Master Prompt.
- Run 2: Executed the exact same process with Claude Opus (Thinking Model + Knowledge Base).
The Goal: To generate two competing visions of the "Perfect Prompt."
The results from the two models were fascinatingly distinct, representing two different philosophies of "High-Level Prompt Engineering." We then fed these two resulting prompts back into Gemini Pro 1.5 to run a comparative analysis and final synthesis.
Here are the contenders:
- Prompt 1 (Codename: APEX): Generated by the Gemini lineage. Focuses on strict structure and XML.
- Prompt 2 (Codename: THE ARCHITECT): Generated by the Claude lineage. Focuses on cognitive processes and reasoning.
Here is the summary of the comparative analysis that led to our first synthesis:
- APEX: Utilizes a strict XML Tag Structure (
<system_identity>,<core_mandates>).- Advantage: Modern LLMs (Claude 4.5, GPT-5x) excel with XML. It sets clear semantic boundaries, allowing the model to know exactly where a rule begins and ends.
- THE ARCHITECT: Uses classic Markdown headers.
- Disadvantage: These are "softer" boundaries compared to hard XML tags, leading to occasional instruction bleed.
- APEX: Describes a cognitive framework but does not enforce a specific output format for thinking.
- THE ARCHITECT: Mandates an
<architect_thought>XML tag before every action.- Advantage: The Killer Feature. When a model is forced to write out its plan before generating code, error rates in complex logic tasks drop drastically.
Both prompts have strong protocols against "lazy coding."
- APEX: More detailed regarding Naming Conventions, Testing, and Security protocols.
- THE ARCHITECT: Extremely strong on editing mechanics (Context Matching), which prevents hallucinations during "Search & Replace" operations.
- APEX: Features
<response_mode_adaptation>. It distinguishes between "Lightweight Mode" (Chat) and "Full Engineering Mode." - THE ARCHITECT: Is always in "Full Mode," which can create overhead for simple queries.
After publishing APEX v1, we ran a second experiment — this time benchmarking our synthesized prompt against the actual CLAUDE.md setup used by Boris Cherny, the creator of Claude Code at Anthropic.
Boris's setup is radically different from APEX: it is roughly 10x shorter (~250 words vs. ~2,500), but laser-focused on six principles that address real failure modes in long-running agentic sessions.
| Capability | APEX v1 | Boris's claude.md |
|---|---|---|
| Subagent Orchestration | ❌ Absent | ✅ Explicit strategy |
| Persistent Task Tracking | ❌ In-context only | ✅ tasks/todo.md |
| Self-Improvement Loop | ❌ None | ✅ tasks/lessons.md |
| Re-planning on Failure | ✅ Explicit STOP signal | |
| Elegance Check | ❌ Not enforced | ✅ Built into workflow |
| Autonomous Bug Fixing | ✅ Explicit |
| Capability | APEX v1 | Boris's claude.md |
|---|---|---|
| Tool Usage Rules | ✅ Detailed | ❌ None |
| Security & Privacy | ✅ Explicit | ❌ None |
| Naming Conventions | ✅ Concrete examples | ❌ Abstract only |
| Response Mode Adaptation | ✅ Lightweight vs. Full | ❌ Always full mode |
| Communication Protocol | ✅ With anti-patterns | ❌ Not defined |
| Cognitive Framework | ✅ <cognitive_thought> |
❌ Not enforced |
Boris's prompt wins on adaptivity and persistence — the agent learns from mistakes and externalizes state into files that survive context resets. APEX v1 wins on completeness and precision — but that completeness comes at a cost: cognitive overload, where critical rules get diluted inside 2,500 words.
Neither prompt alone is optimal. The hybrid is.
APEX_AGENT.md is the result of both synthesis rounds. It is a chimera that takes the best of three sources:
- The Skeleton of APEX v1: Tool rules, security, naming conventions, response modes, cognitive framework.
- The Brain of THE ARCHITECT: Mandatory
<cognitive_thought>tags for enforced reasoning before every action. - The Operational Discipline of Boris: Subagent strategy, persistent task files, self-improvement loop, explicit STOP signals, elegance checks, and autonomous bug fixing.
tasks/todo.md— Every non-trivial task starts with a written plan, checked in before implementation, marked complete on delivery.tasks/lessons.md— After any correction, the agent writes a rule preventing that mistake. Reviewed at the start of every session.- Subagent Strategy — Main context window stays clean. Research, exploration, and parallel analysis are offloaded.
- Elegance Gate — Before presenting any non-trivial fix: "Is there a more elegant solution?"
- STOP Signal — If something goes sideways, the agent stops and re-plans. No more pushing through broken logic.
- Autonomous Bug Fixing — Given a bug report, the agent fixes it. No hand-holding, no confirmation loops.
- 40% shorter than APEX v1 — same coverage, higher signal density.
A great agent prompt is not a legal document. It is a set of habits — precise enough to be followed, short enough to be remembered, and adaptive enough to improve over time.
This prompt is optimized for use in Cursor (.cursorrules), Windsurf, Cline, or as a CLAUDE.md file for Claude Code.
- Copy the content of
APEX_AGENT.mdfrom this repo. - Create or open
.cursorrulesin your project root. - Paste the content and save.
- Place
APEX_AGENT.mdin your project root and rename itCLAUDE.md. - Claude Code will automatically read it as its operating instructions.
- Paste the prompt into the "System Instructions" or "Project Instructions" field.
On the first run, ask your agent to initialize the task management structure:
Create tasks/todo.md and tasks/lessons.md in this project.
The agent will maintain these files autonomously from that point forward.
This project stands on the shoulders of giants.
- Analysis & Synthesis: The team at Obvious Works.
- Original Data Source: A massive thank you to x1xhlol for the foundational dataset that made our meta-analysis possible.
- Round 2 Reference: Thanks to Boris Cherny — creator of Claude Code at Anthropic — for sharing his real-world
CLAUDE.mdsetup, which exposed the critical gaps in APEX v1 around persistence, subagents, and self-improvement.
Disclaimer: This prompt is powerful and detailed. It may increase token costs per session, but it delivers consistent Senior Developer-level output and — uniquely — gets measurably better over time through its self-improvement loop.