ReasonForge

A sophisticated web application that implements an evolutionary reasoning workflow using multiple LLM roles: Planner, Judge, and Chat.

Features

🎯 Planner LLM: Generates 3 distinct solution plans with trade-off analysis
⚖️ Judge LLM: Evaluates plans using evolutionary reasoning (crossover, mutation)
✨ Synthesized Plan: Judge generates optimized plan combining best aspects of all candidates
💬 Chat Interface: Iterative refinement of the synthesized best plan
🚀 OpenAI Responses API: Stateful conversations without server-side state management
🔌 Multi-Provider Support: Works with OpenAI, OpenRouter, Ollama, Groq, and more
⚡ Streaming Responses: Real-time token streaming with asyncLLM for instant feedback
🎨 Efficient Rendering: Uses lit-html for minimal DOM updates
🧪 Partial JSON Parsing: Renders incomplete results as they stream in
🎯 Separate Models: Choose different models for Planner and Judge
🔍 GPT-4/5 Filter: Only shows gpt-4.* and gpt-5.* models
🌐 Custom Proxies: Text input for base URL supports any proxy
💾 Persistent Settings: API keys and model selections saved to localStorage

Architecture

Technology Stack

Frontend: Vanilla JavaScript (ES modules)
Rendering: lit-html - Efficient, declarative templates
Styling: Bootstrap 5 (no custom CSS)
LLM Integration:
- bootstrap-llm-provider - Multi-provider configuration
- asyncLLM - Streaming LLM responses
- partial-json - Parse incomplete JSON
API:
- OpenAI: Responses API (/v1/responses) - Stateful conversations, no server required
- Other Providers: Chat Completions API (/v1/chat/completions) - Universal compatibility

Project Structure

reasonforge/
├── index.html              # Main HTML file
├── script.js               # Entry point (imports modules)
├── js/                     # Modular JavaScript (~840 lines total, DRY)
│   ├── llm-service.js      # LLM configuration & API calls (287 lines)
│   ├── renderers.js        # lit-html rendering logic (202 lines)
│   └── ui-controller.js    # UI interactions & workflow (348 lines)
├── prompts/                # System prompts (markdown)
│   ├── planner.md          # Planner LLM prompt
│   ├── judge.md            # Judge LLM prompt
│   └── chat.md             # Chat LLM prompt
├── app.md                  # Original specification
└── README.md               # This file

Module Architecture

Separation of Concerns:

Module	Responsibility	Lines	Key Functions
`llm-service.js`	LLM API layer	287	`configureLLMProvider()`, `callPlannerLLM()`, `callJudgeLLM()`, `callChatLLM()`
`renderers.js`	Pure rendering	202	`renderPlans()`, `renderJudgeResults()`, `renderScoreBadge()`
`ui-controller.js`	UI orchestration	348	`generateAndEvaluate()`, `sendChatMessage()`, `init()`
`script.js`	Entry point	10	Imports ui-controller

Design Principles:

✅ DRY: No repeated code, shared utilities extracted
✅ Single Responsibility: Each module has one clear purpose
✅ Loose Coupling: Modules communicate via clean interfaces
✅ High Cohesion: Related functionality grouped together
✅ ES Modules: Modern import/export syntax
✅ Pure Functions: Rendering functions are side-effect free

Module Dependencies:

script.js (Entry)
    └── ui-controller.js (Orchestration)
            ├── llm-service.js (API & State)
            │   ├── asyncllm (Streaming)
            │   ├── partial-json (Parsing)
            │   └── bootstrap-llm-provider (Config)
            └── renderers.js (Pure UI)
                └── lit-html (Templates)

Data Flow:

User Input → ui-controller → llm-service → LLM API
                ↓                              ↓
          renderers ← ← ← ← ← ← ← ← Streaming Response
                ↓
            Browser DOM

Why lit-html?

lit-html provides several benefits over manual HTML string manipulation:

Memory Efficient: Only updates DOM nodes that changed
Performance: Faster re-renders with minimal overhead
Security: Automatic XSS protection (except with unsafeHTML)
Developer Experience: Clean template literal syntax
No Virtual DOM: Direct DOM updates with intelligent diffing

Example:

// Before (string concatenation)
html = `<div>${escapeHtml(data)}</div>`;
container.innerHTML = html;

// After (lit-html)
const template = html`<div>${data}</div>`;  // Auto-escaped
render(template, container);  // Only updates what changed

Streaming with asyncLLM

The app uses asyncLLM for real-time token streaming

Benefits:

Instant Feedback: See results as they're generated
Better UX: No waiting for complete responses
Partial JSON: Plans/evaluations render progressively using partial-json
Error Recovery: Graceful handling of incomplete responses

OpenAI Responses API vs Chat Completions API

ReasonForge intelligently uses different APIs based on your provider:

🚀 OpenAI Responses API (for api.openai.com)

Endpoint: /v1/responses

Key Benefits:

✅ Stateful Conversations: Server maintains conversation context automatically
✅ No History Management: After first message, only send new messages (not full history)
✅ Lower Bandwidth: Subsequent messages are just the user input + previous_response_id
✅ previous_response_id: Automatically tracked from each response's id field
✅ Server-Side State: OpenAI remembers the full conversation - no client-side management

Request Format (First Message):

{
  "model": "gpt-4.1-mini",
  "input": [
    { "role": "system", "content": "..." },
    { "role": "user", "content": "..." }
  ]
}

Request Format (Follow-up Messages):

{
  "model": "gpt-4.1-mini",
  "input": "What about error handling?",
  "previous_response_id": "resp_abc123..."
}

How Session Works:

First chat message sends full context (system prompt, problem, plan)
Response includes an id field (e.g., "id": "resp_abc123...")
Subsequent messages only send the new user input + previous_response_id
OpenAI server maintains full conversation history - no need to resend!

Note on JSON Output: The Responses API doesn't have a simple json_object mode like Chat Completions. While it supports json_schema (which requires a full schema definition), ReasonForge relies on prompt instructions to request JSON output, which works reliably with modern models.

🔌 Chat Completions API (for other providers)

Endpoint: /v1/chat/completions

Key Benefits:

✅ Universal Compatibility: Works with OpenRouter, Ollama, Groq, Mistral, Together AI
✅ Standard Format: Industry-standard API format
✅ Stateless: Full control over conversation history

Request Format:

{
  "model": "gpt-4.1-mini",
  "messages": [
    { "role": "system", "content": "..." },
    { "role": "user", "content": "..." }
  ]
}

Automatic Detection: The app automatically detects your provider and uses the appropriate API format. When connected, the UI shows which API is being used.

Getting Started

Prerequisites

Modern web browser with ES module support
API key from one of the supported providers:
- OpenAI
- OpenRouter (recommended - access to many models)
- Ollama (local, free)
- Groq
- Others (Mistral AI, Together AI, etc.)

Setup

Clone or download this repository

Serve the files:

# Using Python
python -m http.server 8000

# Using Node.js
npx serve .

# Using PHP
php -S localhost:8000

Open in browser: http://localhost:8000
Configure LLM Provider:
- Click the "Settings" button
- Select or enter your provider's base URL (supports custom proxies)
- Enter your API key
- The app will fetch available models and save configuration to localStorage
Select Models (optional):
- Choose separate models for Planner and Judge LLMs
- Only gpt-4.* and gpt-5.* models are shown
- Selections are persisted to localStorage
Start reasoning:
- Enter a problem statement
- Click "Generate Plans"

Usage

1. Enter Problem Statement

Describe your coding, algorithmic, or system design challenge:

Design a scalable real-time chat system that can handle 
1 million concurrent users with minimal latency and 
guaranteed message delivery.

2. Review Plans

The Planner generates 3 distinct approaches, each with:

Summary and steps
Assumptions and trade-offs
Scores (correctness, efficiency, complexity, maintainability)

3. Judge Evaluation

The Judge critiques each plan and:

Identifies strengths and weaknesses of each candidate
Suggests mutations (targeted improvements)
Proposes crossover (combining best aspects)
Generates a synthesized "best plan" that combines the strongest elements from all candidates

The synthesized plan includes:

Approach: High-level strategy
Steps: Detailed implementation steps
Key Decisions: Critical choices and rationale
Trade-offs Resolved: How conflicting priorities were addressed
Expected Outcomes: Success criteria
Implementation Notes: Practical guidance

4. Refine via Chat

Engage in conversation to:

Request implementation details
Adjust trade-offs
Generate code snippets
Explore alternative approaches

5. Export Results

Save your reasoning artifacts:

Markdown Report: Complete human-readable documentation with all plans, evaluations, and chat history
JSON Export: Structured data for programmatic access or future re-import

Click the export buttons in the results section header to download your analysis.

Export Functionality

ReasonForge allows you to export your reasoning artifacts in two formats:

Markdown Report (.md)

A comprehensive, human-readable document containing:

Problem statement and configuration
All 3 candidate plans with scores, steps, assumptions, and trade-offs
Judge evaluation with synthesized best plan
Evolution analysis (crossover, mutations, rationale)
Plan reviews with feedback and suggestions
Complete chat conversation history

Use cases:

Documentation and knowledge sharing
Version control (Git-friendly format)
Converting to PDF/HTML with tools like Pandoc
Sharing via email, Slack, or documentation systems

JSON Export (.json)

Complete structured data export containing:

All metadata (models, timestamps, provider)
Full planner and judge JSON responses
Chat history with timestamps
Application state

Use cases:

Programmatic access and automation
Data analysis and metrics
Future re-import feature (planned)
Integration with other tools (Jira, Confluence, etc.)

Both exports are available via buttons in the results section header after generating plans.

Customization

Edit Prompts

All system prompts are in prompts/*.md for easy editing:

prompts/planner.md   # Controls plan generation
prompts/judge.md     # Controls evaluation criteria
prompts/chat.md      # Controls chat behavior

Changes take effect after page reload.

Modify UI Colors

Edit the gradient colors in index.html:

<!-- Problem Section -->
<div class="card-header bg-gradient" 
     style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);">

<!-- Plans Section -->
style="background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%);">

<!-- Judge Section -->
style="background: linear-gradient(135deg, #fa709a 0%, #fee140 100%);">

<!-- Chat Section -->
style="background: linear-gradient(135deg, #43e97b 0%, #38f9d7 100%);">

Add More Providers

Update DEFAULT_BASE_URLS in script.js:

const DEFAULT_BASE_URLS = [
    { url: "https://api.example.com/v1", name: "My Provider" },
    // ... existing providers
];

How It Works

1. Planner Phase

graph LR
    A[Problem] --> B[Planner LLM]
    B --> C[Plan 1: Simple]
    B --> D[Plan 2: Optimal]
    B --> E[Plan 3: Innovative]

2. Judge Phase

graph TB
    A[All Plans] --> B[Judge LLM]
    B --> C[Critique Each]
    B --> D[Score Each]
    B --> E[Select Best Candidate]
    B --> F[Apply Crossover]
    B --> G[Apply Mutations]
    F --> H[Synthesized Plan]
    G --> H

3. Evolutionary Reasoning

The Judge applies evolutionary concepts to create a superior synthesized plan:

Population: Multiple diverse candidate plans (from Planner)
Fitness Evaluation: Scoring each candidate on multiple criteria
Selection: Identify the best base candidate
Crossover: Combine strengths from multiple plans (e.g., "Use Plan 1's caching strategy + Plan 2's error handling")
Mutation: Apply targeted improvements to fix identified weaknesses
New Generation: Output a synthesized plan that's better than any individual candidate

Example Evolution:

Plan 1: Simple but slow (60% score)
Plan 2: Fast but complex (70% score)
Plan 3: Memory-efficient but risky (65% score)
Synthesized: Fast (from Plan 2) + Simple patterns (from Plan 1) + Memory-efficient (from Plan 3) = 85% score

4. Chat Phase

Continuous refinement loop:

User Query → Chat LLM (with context) → Response → Refined Understanding
    ↑                                                      ↓
    └──────────────────────────────────────────────────────┘

Performance

Initial Load: ~100-200ms (CDN-hosted dependencies)
Prompt Loading: ~50ms (3 small markdown files)
API Calls: Depends on provider and model
DOM Updates: ~5-10ms (lit-html efficient diffing)
Memory: Minimal overhead from lit-html templates

Browser Support

Chrome/Edge: ✅ (90+)
Firefox: ✅ (90+)
Safari: ✅ (14+)
Opera: ✅ (76+)

Requires ES modules and import maps support.

Configuration & Data Persistence

localStorage Keys

The app stores configuration in your browser's localStorage:

Key	Purpose	Data Stored
`reasonforge_llm_config`	LLM provider settings	Base URL, API key, available models
`reasonforge_model_selection`	Model preferences	Selected Planner and Judge models

Data Privacy

All data stays local: API keys and settings are stored in browser localStorage only
No server tracking: This is a client-side only application
Private conversations: Chat history is in-memory and cleared on page refresh
Secure by default: Configure your own LLM provider, no third-party intermediary

Clearing Settings

To reset all configuration:

// Open browser console (F12) and run:
localStorage.removeItem("reasonforge_llm_config");
localStorage.removeItem("reasonforge_model_selection");
// Refresh the page

Troubleshooting

"Failed to load prompts"

Ensure you're serving files via HTTP/HTTPS (not file://)
Check browser console for CORS errors
Verify prompts/*.md files exist

"LLM provider not configured"

Click Settings and configure a provider
Check API key validity
Verify provider URL is correct

Plans not generating

Check browser console for errors
Verify model supports JSON output
Try a different model (e.g., GPT-4)

Tooltips not showing

Tooltips initialize after render
Check Bootstrap JS is loaded
Verify data-bs-toggle="tooltip" attributes

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

License

ReasonForge is released under the MIT License.

MIT License © 2025 Prudhvi
Permission is granted to use, copy, modify, merge, publish, distribute,
sublicense, and/or sell copies of this software, provided the copyright
notice and this permission notice appear in all copies.

Acknowledgments

lit-html - Efficient templating library
bootstrap-llm-provider - Multi-provider LLM configuration
Bootstrap 5 - UI framework
OpenAI and other LLM providers for their APIs

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
js		js
prompts		prompts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
script.js		script.js

License

prudhvi1709/reasonforge

Folders and files

Latest commit

History

Repository files navigation

ReasonForge

Features

Architecture

Technology Stack

Project Structure

Module Architecture

Why lit-html?

Streaming with asyncLLM

OpenAI Responses API vs Chat Completions API

🚀 OpenAI Responses API (for api.openai.com)

🔌 Chat Completions API (for other providers)

Getting Started

Prerequisites

Setup

Usage

1. Enter Problem Statement

2. Review Plans

3. Judge Evaluation

4. Refine via Chat

5. Export Results

Export Functionality

Markdown Report (.md)

JSON Export (.json)

Customization

Edit Prompts

Modify UI Colors

Add More Providers

How It Works

1. Planner Phase

2. Judge Phase

3. Evolutionary Reasoning

4. Chat Phase

Performance

Browser Support

Configuration & Data Persistence

localStorage Keys

Data Privacy

Clearing Settings

Troubleshooting

"Failed to load prompts"

"LLM provider not configured"

Plans not generating

Tooltips not showing

Contributing

License

Acknowledgments

Roadmap

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages