An AI-powered calorie tracking app that uses Google Gemini with structured outputs to parse natural language and calculate net calories.
Based on the article about testing LLM applications the right way - no mocks, just structure.
This app demonstrates how to:
- ✅ Use structured outputs with LLMs (Gemini)
- ✅ Write meaningful tests for non-deterministic systems
- ✅ Keep logic deterministic and testable
- ❌ Avoid the false confidence of mocked LLM tests
- Natural language calorie parsing
- Intelligent number parsing ("3k" → 3000)
- Structured JSON output via Gemini
- Real tests that actually catch regressions
- Deterministic math functions
- Go to Google AI Studio
- Create a new API key
- Copy it
npm installcp .env.example .env
# Edit .env and add your GEMINI_API_KEYnpm run devThen open http://localhost:3000 in your browser!
The app features:
- 🎨 Beautiful modern UI with animated logo
- 🍔 Cute cherub mascot holding a burger
- 🔥 Real-time AI calorie analysis
- 📊 Visual breakdown of consumed vs burned calories
npm run cli "I ate 10 donuts and binged Netflix"# Explicit calories
npm run cli "I ate 3k calories but burned 500 running"
# Natural language
npm run cli "Had pizza and wings, then hit the gym for an hour"
# The classic BigBax user
npm run cli "Yesterday I ate 10 donuts, a few bags of hot chips, a sweet pickle and binged Netflix. Oh, and I also threw my remote at the TV a few times during the big game!"npm testnpm run test:ai✅ Structure-based tests (with real API calls):
- Schema validation (arrays contain numbers)
- Behavior (k notation, consumed vs burned)
- Sanity checks (nonsense → empty arrays)
✅ Deterministic tests (fast, free):
- Math calculations
- Edge cases (empty arrays, negatives)
❌ What we DON'T test:
- Exact wording of responses
- Mocked LLM outputs
bigbax/
├── app/
│ ├── api/
│ │ └── track/
│ │ └── route.ts # Next.js API route
│ ├── layout.tsx # Root layout
│ ├── page.tsx # Main page component
│ └── globals.css # Global styles
├── src/
│ ├── lib/
│ │ └── calorieAgent.ts # Gemini integration with structured outputs
│ ├── utils/
│ │ └── calorieUtils.ts # Deterministic math functions
│ ├── tests/
│ │ └── calorieAgent.test.ts # Real tests, no mocks!
│ └── index.ts # CLI interface
├── public/
│ └── logo.svg # BigBax cherub logo
├── package.json
├── tsconfig.json
├── next.config.js
├── vitest.config.ts
└── README.md
Instead of parsing free-form text:
// ❌ Bad: Free-form text
const response = await ai('parse this food');
const calories = parseInt(response.match(/\d+/)[0]); // Brittle!Use schemas:
// ✅ Good: Structured output
const CalorieSchema = z.object({
consumedCalories: z.array(z.number()),
burnedCalories: z.array(z.number()),
});
const response = await gemini.generate({
responseSchema: zodToJsonSchema(CalorieSchema),
});// ❌ This test is useless
jest.mock('./calorieAgent', () => ({
calorieAgent: () => ({ consumed: [3000], burned: [500] }),
}));
test('calculates calories', () => {
const result = await calorieAgent('anything');
expect(result.consumed).toEqual([3000]); // Always passes!
});Real tests catch real problems:
// ✅ This test catches regressions
test('parses k notation', async () => {
const parsed = await calorieAgent('I ate 3k calories');
expect(parsed.consumedCalories).toContain(3000); // Real API call
});Keep math outside the LLM:
// ✅ Testable, predictable, fast
export function calculateNetCalories(data: CalorieData): number {
const consumed = data.consumedCalories.reduce((a, b) => a + b, 0);
const burned = data.burnedCalories.reduce((a, b) => a + b, 0);
return consumed - burned;
}LLM tests cost money. Run them strategically:
- ✅ After prompt changes
- ✅ Before production deploys
- ✅ Nightly schedules
- ❌ NOT on every PR commit
Use test filters:
# Only AI tests
npm run test:ai
# Everything else
npm test -- --testNamePattern="^(?!.*Calorie Agent).*"This project implements concepts from the article about testing LLM applications properly:
Key Takeaways:
- ❌ Don't mock LLM outputs - they lie to you
- ✅ Force structured responses with schemas
- ✅ Keep logic deterministic
- ✅ Test behavior and structure, not phrasing
- ✅ Run tests selectively to control cost
MIT
Built with Gemini, Zod, and zero tolerance for mocked tests.
BigBax™ - Because your 0.5% equity depends on it.