Most B2B companies write copy that disappears in a sea of sameness. The work is to make it fresh, valuable, real and outstanding.
This skill will run A/B test autonomously until your messaging lands.
It runs an autonomous loop that generates copy variants, tests them against real business metrics, and learns what works.. without you touching it. Every cycle makes the next one smarter. You sleep.. your sales copy improves.
Andrej Karpathy (co-founder of OpenAI) built autoresearch... an autonomous ML research system where an LLM agent runs overnight, makes changes, measures results, and iterates without human intervention.
We took that core architecture and developed it further for a completely different domain: go-to-market execution.
Where autoresearch optimizes neural network architectures against validation loss... Self-Improving GTM Lab optimizes your outbound emails, YouTube titles, and social hooks against the metrics that actually pay your bills: open rates, click-through rates, and reply rates.
| Karpathy's autoresearch | Self-Improving GTM Lab |
|---|---|
train.py (model code) |
Copy variants (subject lines, titles, hooks) |
| Validation loss | Open rate, CTR, engagement rate |
| 5-minute experiment | 24-72 hour measurement window |
| Single LLM judge | 3 specialized judges (Ogilvy, Seven Critics, Rule Compliance) |
| Git revert on failure | Failures archived for learning |
| One optimization target | Three parallel tracks (email, YouTube, social) |
program.md.. human writes the research agenda, agent executes itresults.tsv.. every experiment logged, agent reads history to inform next hypothesis- LLM as the search algorithm .. no hardcoded grid search. The LLM reasons about what to try next
- Autonomous loop .. runs without intervention until interrupted
- Three specialized judges instead of one.. Ogilvy (persuasion analysis), Seven Critics (7 hostile reader personas), Rule Compliance (brand + constraint checks)
- Real-world business metrics instead of validation loss.. actual open rates from Hunter.io, actual CTR from YouTube, actual engagement from LinkedIn
- Three parallel optimization tracks instead of one.. email subject lines, YouTube titles, social hooks
- 24-72 hour measurement windows instead of 5-minute experiments.. because real humans need real time to open emails and click links
- Failure as data instead of git revert.. losing variants get archived and analyzed. Sometimes the "worst" variant teaches you the most
1. READ .. program.md (your objectives), results.tsv (past experiments), brand rules
2. GENERATE .. 10 variants of the target asset
3. EVALUATE .. score each through 3 LLM judges
4. RANK .. pick top 2 for A/B deployment
5. DEPLOY .. send to the channel (Hunter.io, YouTube, LinkedIn)
6. MEASURE .. pull real metrics after 24-72 hours
7. LEARN .. log results, update program.md with new insights
8. REPEAT .. use learnings to inform next generation cycle
Optimizes outbound email subject lines. Generates variants within brand constraints (max 3 words, no spam triggers, natural language only). Scores through three judges. Deploys A/B to campaign segments. Measures open rates after 48h. Feeds winners back into the next cycle.
Your worst-performing subject line last quarter becomes impossible to repeat.
Generates 10 title options per episode. Scores against historical performance data. Publishes the best. Evaluates CTR and views after 48h. Swaps the title if it underperforms the baseline.
Your titles stop being gut feel and start being data-driven.
Creates opening hooks for LinkedIn posts and YouTube community posts. Measures engagement rate (likes + comments / impressions). Learns which patterns drive interaction in your specific audience.
You stop copying what worked for someone else and start learning what works for you.
/self-improving subject-lines # Run the Subject Line Lab
/self-improving youtube-titles # Run the YouTube Title Optimizer
/self-improving social-hooks # Run the Social Hook Generator
/self-improving report # View results and learnings
/self-improving generate-only # Dry run.. generate and score without deploying
self-improving/
README.md # This file
program.md # Research agenda and constraints (you edit this)
results.tsv # Experiment log (the agent writes this)
Dead simple. Two files do all the work. program.md is where you tell the system what to optimize. results.tsv is where it logs what it learned. You review when you want.
B2B founders and sales teams who are tired of guessing what copy works. If you send outbound emails, publish YouTube content, or post on LinkedIn... and you want those activities to compound instead of flatline... this is for you.
We built this at Strategy Sprints where we help B2B companies close more deals at the prices they deserve. 20 years... 40+ countries... enterprise grade.
The Self-Improving GTM Lab is one of 47 AI skills we run daily. It's part of a system that took our own prospecting, content, and sales execution from manual to autonomous.
I'd love to walk you through how this works on your actual data.. for free.
Pick your best time: calendly.com/simonseverino/coffee-with-simon
Or explore the full system:
- /leverage .. find the 3-7 highest-impact moves
- /prospectingwork .. 95% work, 5% ask
- /ogilvy .. analyze any copy through David Ogilvy's lens
- /sevencritics .. 7 hostile reader personas stress-test your copy
- /deals .. daily deal advancement
- /briefme .. pre-call intelligence in 30 seconds
Simon Severino CEO, Strategy Sprints. Author of "Strategy Sprints" and "Time Freedom" (with Jay Abraham). 20 years helping B2B companies sell more at the prices they deserve.
keep rolling, Simon & The Sprinters