Founder & Principal Architect at NehmeAI Labs
I build AI systems that actually work. Not the biggest modelsโthe right-sized ones.
Most production AI stacks are over-engineered. 70B models for tasks a 4B handles. JSON output where delimiters would do. GPT-4 for email classification that a fine-tuned 1B nails at 1/100th the cost.
I fix this. Architecture audits that cut inference costs 40-60%. Tools that prove you don't need frontier models for most prompts. Research that shows specialization beats scale.
FlashCheck โ Hallucination detection that actually works
A 4B model that hits 91.7% on RAG Truth, beating Llama 405B. Purpose-built verification > general-purpose giants.
FlashCheck-Nano (270M) and FlashCheck-Lite (1B) are open source.
RightSize โ Stop overpaying for inference
Most prompts don't need frontier models. This tool proves it on your actual data. 50-100x cost savings.
LLM Sanity Checks โ A practical guide to not over-engineering
Decision trees, benchmarks, anti-patterns. Before you reach for GPT-4, read this.
Can a regex solve it? โ Use that. Stop.
Is it search/retrieval? โ Try BM25 first. It's 20x faster.
Is the task simple? โ 1B-8B model. Test it.
Actually complex reasoning? โ Maybe frontier. But measure.
The JSON Tax: Everyone outputs JSON. But {"name": "John"} is 3x the tokens of John. At scale, that's real money.
Specialization > Scale: FlashCheck-4B beats models 100x larger because it does one thing well. Your extraction task doesn't need a model trained on Shakespeare.
Measure First, Scale Never: I've never seen a production workload where 0% of prompts could use smaller models. The number is usually 60-80%.
- Global #1 on HackerRank in Python
- 9+ years building high-load backend systems
- Previously: Engineering Consultant at Dun & Bradstreet, Senior Software Engineer at Nykaa,
- Built verification systems processing 2M+ daily requests
- Outperformed Microsoft Presidio SOTA benchmarks by 10.19% F1 on PII detection
- Shipping specialized models that outperform frontier on narrow tasks
- Building tools that make right-sizing painless
- Writing about what actually matters in production AI
- Helping enterprises stop burning money on over-provisioned inference
Building AI and watching your inference bill climb? Let's talk.
๐ nehmeailabs.com
๐ผ LinkedIn
"That's not a flex. That's a $50K/month cloud bill waiting to happen." โ on teams using GPT-4 for everything





