🏛️ Consilium

A quiver of methods for seeking truth through AI councils.
v1.2.0

Query multiple AI models simultaneously. Compare, debate, verify, refine, and synthesize their responses.

⚡ Skip to Install · Docker

📊 More Screenshots

Analytics Dashboard

Track model performance, costs, and usage patterns.

Evaluation Panel

See detailed rankings, scores by criteria, and AI-generated analysis.

Philosophy • The Quiver • Quick Start • Docker • Configuration

🎯 Philosophy

The Core Insight

There is no single method that reliably produces truth.

But there are appropriate methods for different kinds of questions.

Consilium (Latin for "council" or "deliberation") is an epistemological framework instantiated in software. Different questions require different methods of inquiry. Each mode is an "arrow" in your quiver, designed for a specific epistemic target.

Why Multiple Models?

No single AI has all the answers. Each LLM has different training data, reasoning approaches, and blind spots. What one model gets wrong, another might get right.

Challenge	How Consilium Helps
AI hallucinations	Cross-check answers across multiple models (Veritas mode)
Model bias	Anonymous deliberation removes reputation bias (Consensus, Arbitrium)
Finding the right model	Blind preference voting reveals true preferences (Arbitrium)
Complex decisions	Structured debate surfaces all arguments (Debate, Elenchus)
Quality output	Sequential refinement polishes content (Limatura)
Comprehensive answers	Synthesize insights from multiple sources (Synthesis)

Methodological Pluralism

This is methodological pluralism — the philosophical position that different domains of inquiry require different approaches:

Question Type	Method	Consilium Mode
Factual claims	Verification	Veritas
Complex trade-offs	Dialectic	Debate
Bias reduction	Deliberation	Consensus, Arbitrium
Quality assessment	Cross-examination	Analysis, Elenchus
Capability testing	Empiricism	Peira
Comprehensive coverage	Integration	Synthesis
Quality improvement	Iteration	Limatura

🏹 The Quiver (12 Modes)

Consilium provides 12 distinct modes — each an arrow designed for a different target:

Mode Overview

Mode	Shortcut	Purpose	When to Use
Forum	`Ctrl+1`	Compare & Judge	General questions, find best answer
Debate	`Ctrl+2`	Round-Robin Discussion	Complex topics with trade-offs
Consensus	`Ctrl+3`	Anonymous Deliberation	Bias-reduced conclusions
Analysis	`Ctrl+4`	Multi-Judge Critique	Deep evaluation of one answer
Synthesis	`Ctrl+5`	Combine into One	Comprehensive coverage needed
Analytics	`Ctrl+6`	Performance Stats	Review usage and costs
Peira	`Ctrl+7`	Capability Testing	Benchmark model abilities
Elenchus	`Ctrl+8`	Adversarial Red Team	Stress-test code/ideas
Versus	`Ctrl+9`	Local vs Commercial	Compare local to cloud models
Arbitrium	`Ctrl+0`	Blind Preference Vote	Discover true preferences
Veritas	`Ctrl+-`	Fact Check & Verify	Detect hallucinations
Limatura	`Ctrl+=`	Iterative Polish	Refine through multiple passes
Prompting	`Ctrl+G`	Prompting Guide	Learn effective prompting techniques

📖 Detailed Mode Guide

1. Forum Mode (Ctrl+1)

Latin: "forum" — public place of discussion

All selected models answer your question simultaneously, then an AI judge ranks them.

┌─────────────────────────────────────────────────────────┐
│                    YOUR QUESTION                        │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┼───────────────┐
        ▼               ▼               ▼
   ┌─────────┐    ┌─────────┐    ┌─────────┐
   │ Model A │    │ Model B │    │ Model C │
   └────┬────┘    └────┬────┘    └────┬────┘
        │              │              │
        ▼              ▼              ▼
┌─────────────────────────────────────────────────────────┐
│      SIDE-BY-SIDE COMPARISON                            │
│   + Blind Evaluation (judges see "Response A" not names)│
└─────────────────────────────────────────────────────────┘

Best for: General questions, comparing writing styles, finding the best model for your use case

Features:

Real-time streaming responses
Blind evaluation (prevents model reputation bias)
Follow-up questions with context
Auto-evaluation ranks responses when complete

2. Debate Mode (Ctrl+2)

Latin: "debattuere" — to fight, contend

A structured multi-round discussion where models build on each other's ideas.

┌─────────────────────────────────────────────────────────┐
│                    YOUR TOPIC                           │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│ ROUND 1                                                 │
│ Model A → Model B → Model C (sees all previous)         │
└───────────────────────┬─────────────────────────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│ ROUND 2                                                 │
│ Model A → Model B → Model C (builds on Round 1)         │
└───────────────────────┬─────────────────────────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│           AUTOMATIC CONSENSUS SUMMARY                   │
└─────────────────────────────────────────────────────────┘

Best for: Complex topics with trade-offs, controversial questions, exploring all sides

How to use:

Select 2+ Participants
Set number of Rounds (1-5)
Models discuss round-robin, building on previous responses
Automatic Consensus Summary generated at the end

3. Consensus Mode (Ctrl+3)

Latin: "consensus" — agreement, harmony

Models deliberate anonymously over multiple rounds to find where they agree.

┌─────────────────────────────────────────────────────────┐
│                    YOUR QUESTION                        │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│ ROUND 0 - INITIAL POSITIONS                             │
│ Each model answers independently                        │
│ Responses anonymized: Position A, B, C, D...            │
└───────────────────────┬─────────────────────────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│ ROUNDS 1-3 - DELIBERATION                               │
│ Each model sees ALL anonymized positions                │
│ (but NOT who said what - prevents bias)                 │
│ Task: Consider others, identify agreements/disputes,    │
│       refine position, move toward consensus            │
└───────────────────────┬─────────────────────────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│ FINAL - ARBITER SYNTHESIS                               │
│ ✅ Consensus answer (if agreement reached)              │
│ OR                                                      │
│ 📊 Summary: What they agree on + What remains disputed  │
└─────────────────────────────────────────────────────────┘

Best for: Reducing model bias, finding fundamental agreements, cross-validated answers

Key difference from Debate: Models don't know who said what during deliberation, preventing "I agree with GPT because it's GPT" bias.

4. Analysis Mode (Ctrl+4)

Greek: "analusis" — breaking up, investigation

One model answers, multiple analysts evaluate the response from different perspectives.

┌─────────────────────────────────────────────────────────┐
│                    YOUR QUESTION                        │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│              ANSWERER MODEL RESPONDS                    │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┼───────────────┐
        ▼               ▼               ▼
   ┌──────────┐   ┌──────────┐   ┌──────────┐
   │ Analyst 1│   │ Analyst 2│   │ Analyst 3│
   │ Evaluates│   │ Evaluates│   │ Evaluates│
   └────┬─────┘   └────┬─────┘   └────┬─────┘
        │              │              │
        ▼              ▼              ▼
┌─────────────────────────────────────────────────────────┐
│    MULTI-PERSPECTIVE EVALUATION & SCORING               │
└─────────────────────────────────────────────────────────┘

Best for: Deep critique, understanding strengths/weaknesses, academic review

5. Synthesis Mode (Ctrl+5)

Greek: "sunthesis" — putting together

Multiple models answer, one synthesizer combines the best parts into a unified response.

┌─────────────────────────────────────────────────────────┐
│                    YOUR QUESTION                        │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┼───────────────┐
        ▼               ▼               ▼
   ┌─────────┐    ┌─────────┐    ┌─────────┐
   │ Source 1│    │ Source 2│    │ Source 3│
   │ Answers │    │ Answers │    │ Answers │
   └────┬────┘    └────┬────┘    └────┬────┘
        │              │              │
        └──────────────┼──────────────┘
                       ▼
         ┌─────────────────────────┐
         │   SYNTHESIZER MODEL     │
         │ Combines all responses  │
         │  into unified answer    │
         └─────────────────────────┘

Best for: Research requiring comprehensive coverage, combining expertise, unified summaries

6. Peira Mode (Ctrl+7)

Greek: "πεῖρα" (peira) — trial, experiment, test

Systematically test what models can and cannot do with structured benchmarks.

┌─────────────────────────────────────────────────────────┐
│              SELECT TEST CATEGORY                       │
│  [Coding] [Math] [Reasoning] [Knowledge] [Creative]     │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│              SELECT MODELS TO TEST                      │
│  □ Claude Sonnet 4.5    □ GPT-5.2    □ Gemini 3 Pro     │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│              STRUCTURED TEST BATTERY                    │
│  Each model receives identical test prompts             │
│  for fair comparison                                    │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│              CAPABILITY REPORT                          │
│  ┌─────────────────────────────────────────────────┐   │
│  │ Model          │ Score │ Speed │ Style        │   │
│  │ Claude Sonnet  │  92%  │ 45t/s │ Detailed     │   │
│  │ GPT-5.2        │  89%  │ 52t/s │ Concise      │   │
│  │ Gemini 3 Pro   │  87%  │ 61t/s │ Structured   │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Test Categories:

Coding: Algorithm implementation, debugging, code review
Math/Logic: Arithmetic, word problems, proofs, puzzles
Reasoning: Syllogisms, analogies, causal reasoning
Knowledge: Trivia, history, science, current events
Creativity: Storytelling, poetry, brainstorming

Unique Value: This is the only mode where the question is fundamentally about the models themselves, not the world.

7. Elenchus Mode (Ctrl+8)

Greek: "ἔλεγχος" (elenchus) — cross-examination, refutation (Socrates' method)

Stress-test ideas, code, or plans by having models attack them.

┌─────────────────────────────────────────────────────────┐
│         YOUR CONTENT TO BE CHALLENGED                   │
│    (code, argument, plan, proposal, idea)               │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┴───────────────┐
        ▼                               ▼
┌───────────────┐               ┌───────────────┐
│   DEFENDER    │               │  CHALLENGERS  │
│  (1 model)    │   ⚔️ VS ⚔️    │  (1+ models)  │
│ Defends the   │               │ Attack/find   │
│   content     │               │    flaws      │
└───────┬───────┘               └───────┬───────┘
        │                               │
        └───────────────┬───────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│ ROUND 1: Challengers attack                             │
│ ROUND 2: Defender responds                              │
│ ROUND 3: Challengers counter                            │
│ ...                                                     │
└───────────────────────┬─────────────────────────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│              ARBITER VERDICT (optional)                 │
│  • Vulnerabilities found                                │
│  • Defenses successful                                  │
│  • Final assessment                                     │
└─────────────────────────────────────────────────────────┘

Use Cases:

Security Review: "Find vulnerabilities in this code"
Argument Testing: "What's wrong with this reasoning?"
Business Plans: "What could go wrong with this strategy?"
Risk Assessment: "Why shouldn't I do this?"

Unique Value: Systematic adversarial testing. Truth survives challenge.

8. Versus Mode (Ctrl+9)

Latin: "versus" — against, turned toward

Compare your local models against commercial frontier models with blind evaluation.

┌─────────────────────────────────────────────────────────┐
│                    YOUR PROMPT                          │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┴───────────────┐
        ▼                               ▼
┌───────────────────────┐   ┌───────────────────────┐
│   LOCAL COUNCIL       │   │  COMMERCIAL COUNCIL   │
│  • llama3.3:70b       │   │  • Claude Sonnet 4.5  │
│  • qwen2.5:32b        │   │  • GPT-5.2            │
│  • deepseek-r1:14b    │   │  • Gemini 3 Pro       │
└───────────┬───────────┘   └───────────┬───────────┘
            │                           │
            ▼                           ▼
┌───────────────────────┐   ┌───────────────────────┐
│   SYNTHESIZE into     │   │   SYNTHESIZE into     │
│   one council answer  │   │   one council answer  │
└───────────┬───────────┘   └───────────┬───────────┘
            │                           │
            └───────────┬───────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│              BLIND JUDGE EVALUATION                     │
│  (Compares councils AND local vs each individual model) │
└───────────────────────┬─────────────────────────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│              RESULTS & INSIGHTS                         │
│  🏆 Winner: [Local Council/Commercial Council]          │
│  💰 Cost: Local $0 vs Commercial $X.XX                  │
│  📊 Savings if local wins: $X.XX saved!                 │
│                                                         │
│  🎯 Local Council vs Individual Models:                 │
│  ┌────────┐  ┌────────┐  ┌────────┐                    │
│  │   ✅   │  │   ❌   │  │   🤝   │                    │
│  │ Claude │  │ GPT-5  │  │ Gemini │                    │
│  └────────┘  └────────┘  └────────┘                    │
│  ✅ = Local council beats model                        │
│  ❌ = Model beats local council                        │
└─────────────────────────────────────────────────────────┘

How It Works:

Both councils answer your question (models run serially for quality)
Each council synthesizes individual responses into one unified answer
Judge compares synthesized answers (Council A vs B) — blind, fair
Judge also compares local synthesis vs each individual commercial model
Results show: winner, cost saved, and whether your council beats frontier models individually

Best for: Testing if local models can replace paid APIs, finding which tasks locals handle well

Unique Value: Two levels of insight:

"Is my local council as good as commercial?" (synthesis vs synthesis)
"Can my local council beat individual frontier models?" (teamwork vs individuals)

9. Arbitrium Mode (Ctrl+0)

Latin: "arbitrium" — judgment, decision, free will

Discover your true preferences without model reputation bias.

┌─────────────────────────────────────────────────────────┐
│                    YOUR QUESTION                        │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┴───────────────┐
        ▼                               ▼
┌───────────────────────┐   ┌───────────────────────┐
│     RESPONSE A        │   │     RESPONSE B        │
│   (Model hidden)      │   │   (Model hidden)      │
│                       │   │                       │
│   [Full response      │   │   [Full response      │
│    displayed here]    │   │    displayed here]    │
│                       │   │                       │
└───────────┬───────────┘   └───────────┬───────────┘
            │                           │
            └───────────┬───────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│           WHICH DO YOU PREFER?                          │
│      [Vote A]              [Vote B]                     │
└───────────────────────┬─────────────────────────────────┘
                        │ (after voting)
                        ▼
┌─────────────────────────────────────────────────────────┐
│           REVEAL: You chose Claude Sonnet 4.5!          │
│   Your preference data feeds into personal analytics    │
└─────────────────────────────────────────────────────────┘

Features:

Blind by default — no peeking at model names
Reveal after voting — see which model you actually preferred
Preference tracking — builds personal model rankings over time
Arena-style data — similar to LMSYS Chatbot Arena, but personal

Unique Value: Removes reputation bias. You might discover you prefer different models than you thought!

10. Veritas Mode (Ctrl+-)

Latin: "veritas" — truth

Detect hallucinations and verify factual claims through cross-model consensus.

┌─────────────────────────────────────────────────────────┐
│         CLAIM OR QUESTION TO VERIFY                     │
│   "The Great Wall of China is visible from space"       │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┼───────────────┐
        ▼               ▼               ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│  VERIFIER 1   │ │  VERIFIER 2   │ │  VERIFIER 3   │
│ Claude Sonnet │ │    GPT-5.2    │ │  Gemini Pro   │
│               │ │               │ │               │
│ Verdict: FALSE│ │ Verdict: FALSE│ │ Verdict: FALSE│
│ Confidence:95%│ │ Confidence:92%│ │ Confidence:88%│
│ Citations: ✓  │ │ Citations: ✓  │ │ Citations: ✓  │
└───────────────┘ └───────────────┘ └───────────────┘
        │               │               │
        └───────────────┼───────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│              ANALYZER SYNTHESIS                         │
│  ────────────────────────────────────────────────────  │
│  OVERALL VERDICT: FALSE                                 │
│  CONFIDENCE: 92%                                        │
│                                                         │
│  CONSENSUS FACTS:                                       │
│  ✅ All models agree the claim is false                 │
│  ✅ Cited NASA astronaut testimonies                    │
│  ✅ Referenced physics of human vision                  │
│                                                         │
│  KEY EVIDENCE:                                          │
│  • Wall is ~30ft wide, not visible at orbital altitude  │
│  • Myth debunked by multiple astronauts                 │
│                                                         │
│  DISPUTED: None                                         │
│  UNSUPPORTED: None                                      │
└─────────────────────────────────────────────────────────┘

Three Verification Methods:

Method	Description	Best For
🧠 Memory Only	Uses model training data only. "If unknown, say so."	Testing model knowledge without external sources
🌐 Shared Research	One search, all models get same results	Fair comparison with consistent evidence
🔍 Independent Research	Each model searches independently	Seeing how models approach verification differently

Independent Research - Source Comparison: When using Independent Research mode, Veritas compares sources found by different models:

Common Sources: URLs found by multiple models (high confidence)
Unique Sources: URLs only one model found (may reveal blind spots)
Search Queries: See what each model searched for

Verification Flow:

Select verification method (Memory Only / Shared Research / Independent Research)
Enter claim or question to verify
Multiple verifier models independently assess truthfulness with citations
Analyzer model synthesizes final report
Report shows: consensus facts, disputed claims, confidence levels

Best for: Fact-checking before publishing, detecting hallucinations, verifying information

Unique Value: Structured hallucination detection with flexible research options. Trust but verify.

11. Limatura Mode (Ctrl+=)

Latin: "limatura" — filing, polishing, refinement

Polish and improve output through sequential model passes.

┌─────────────────────────────────────────────────────────┐
│                CONTENT TO POLISH                        │
│        (code, text, email, document)                    │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│   V0: ORIGINAL (Model A creates initial response)       │
│   "Here is my first draft of the email..."              │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│   V1: FIRST REFINEMENT (Model B improves V0)            │
│   "Here is the improved version with clearer..."        │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│   V2: SECOND REFINEMENT (Model C improves V1)           │
│   "Here is the polished final version..."               │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│              VERSION COMPARISON                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │ V0 (Original)  │ V1 (Refined)  │ V2 (Polished) │   │
│  │ [View]         │ [View]        │ [View] ★      │   │
│  └─────────────────────────────────────────────────┘   │
│  Current: V2 by Model C                                 │
│  [Copy Final] [Continue Refining]                       │
└─────────────────────────────────────────────────────────┘

Refinement Types:

General improvement: "Make this better"
Style refinement: "Make this more concise/formal/casual"
Code refinement: "Optimize and clean this code"
Custom instruction: User-defined refinement criteria

Best for: Code optimization, document drafting, email refinement, creative writing polish

Unique Value: Sequential improvement, not just comparison. Each model builds on the last.

12. Prompting Guide (Ctrl+G)

Purpose: Learn and apply effective prompting techniques

A comprehensive guide to crafting effective AI prompts with 8 proven formulas.

┌─────────────────────────────────────────────────────────┐
│              PROMPTING GUIDE                            │
│                                                         │
│  📋 8 PROVEN FORMULAS:                                 │
│                                                         │
│  1. RTCF - Role, Task, Context, Format                 │
│  2. CREATE - Character, Request, Examples...           │
│  3. RISEN - Role, Instructions, Steps, End Goal...     │
│  4. Chain-of-Thought - Step-by-step reasoning          │
│  5. Few-Shot Learning - Input/output examples          │
│  6. STAR - Situation, Task, Action, Result             │
│  7. Code Generation - Language, Requirements...        │
│  8. Self-Critique - Generate, critique, improve        │
│                                                         │
│  Each formula includes:                                 │
│  • Component breakdown                                  │
│  • Real-world examples                                  │
│  • Best use cases                                       │
│  • One-click copy                                       │
└─────────────────────────────────────────────────────────┘

Available Formulas:

Formula	Components	Best For
RTCF	Role + Task + Context + Format	General structured prompts
CREATE	Character + Request + Examples + Adjustments + Type + Extras	Detailed specifications
RISEN	Role + Instructions + Steps + End Goal + Narrowing	Multi-step tasks
Chain-of-Thought	Step-by-step reasoning	Complex reasoning problems
Few-Shot	Input → Output examples	Pattern learning
STAR	Situation + Task + Action + Result	Problem-solving narratives
Code Generation	Language + Requirements + Standards + Edge Cases	Programming tasks
Self-Critique	Generate → Critique → Improve	Quality iteration

Best for: Learning prompt engineering, improving query quality, teaching prompting techniques

Unique Value: Reference guide for effective prompting, always available with Ctrl+G.

📊 Question Type Matrix

Use this table to choose the right mode for your question:

Question Type	Recommended Mode	Why
"What is X?" (Factual)	Forum or Veritas	Forum for comparison; Veritas for accuracy
"What's the best X?" (Opinion)	Consensus or Arbitrium	Consensus reduces bias; Arbitrium reveals preference
Creative writing	Forum or Limatura	Forum for variety; Limatura for polish
Coding/Technical	Forum or Elenchus	Forum for solutions; Elenchus for security review
Controversial/Ethical	Debate	Models engage with counterarguments
"Should I do X?" (Decision)	Consensus or Elenchus	Consensus for recommendation; Elenchus for risks
Research/Comprehensive	Synthesis + Veritas	Synthesis for coverage; Veritas for accuracy
Security Review	Elenchus	Adversarial testing finds vulnerabilities
Model Benchmarking	Peira	Structured capability testing
Quick Comparison	Arbitrium	Fast blind preference voting
Quality Polish	Limatura	Iterative improvement chain
Hallucination Check	Veritas	Cross-model fact verification
Local vs Cloud	Versus	Data-driven cost/quality comparison

✨ Features

🎯 Core Capabilities

Multi-Model Comparison — Query multiple LLMs simultaneously
Streaming Responses — Real-time output from all models
Blind Evaluation — Anonymized judging prevents bias
URL Content Fetching — Include webpage content in prompts
Session Management — Save, tag, search, reload sessions
Export Options — JSON, Markdown, CSV

🔧 Advanced Features

Knowledge Base (RAG) — Upload documents for context-aware responses
Vision Support — Upload images for multi-model analysis
Research Mode (SearXNG) — Web search before querying models
Conversation Continuity — Follow-up questions with context
Prompt Templates — Reusable prompts with variables
Cost Tracking — Estimated API costs per response
Model Analytics — Track which models win evaluations
Pin/Favorite Responses — Star great responses
Keyboard Shortcuts — Full keyboard navigation
Local Model Support — Ollama and LM Studio integration
Dark/Light Themes — Beautiful UI in both modes
Model Sync — Fetch latest models from OpenRouter API
Benchmark Sync — Update benchmark scores from HuggingFace Leaderboard
Prompting Guide — Learn effective prompting techniques

🚀 Quick Start

Prerequisites

Node.js 18+
npm or yarn
API key from OpenRouter

💡 Why OpenRouter? One API key = access to 25+ models (OpenAI, Anthropic, Google, xAI, Mistral, and more). Pay-as-you-go pricing.

Installation

# Clone the repository
git clone https://github.com/lafintiger/Consilium.git
cd Consilium

# Install backend dependencies
cd backend
npm install

# Configure environment
cp ../env.example.txt .env
# Edit .env and add your OPENROUTER_API_KEY

# Start backend
npm run dev

# In a new terminal, install and start frontend
cd frontend
npm install
npm run dev

Access the App

Frontend: http://localhost:3800
Backend API: http://localhost:3801

🐳 Docker

Using Docker Compose (Recommended)

# Copy and configure environment
cp env.example.txt .env
# Edit .env with your API keys

# Build and start
docker compose up -d

# View logs
docker compose logs -f

# Stop
docker compose down

Updating to Latest Version

git pull
docker compose down
docker compose build --no-cache
docker compose up -d

⌨️ Keyboard Shortcuts

Shortcut	Action
`Ctrl+1`	Forum mode
`Ctrl+2`	Debate mode
`Ctrl+3`	Consensus mode
`Ctrl+4`	Analysis mode
`Ctrl+5`	Synthesis mode
`Ctrl+6`	Analytics dashboard
`Ctrl+7`	Peira (capability testing)
`Ctrl+8`	Elenchus (red team)
`Ctrl+9`	Versus (local vs commercial)
`Ctrl+0`	Arbitrium (blind voting)
`Ctrl+-`	Veritas (fact check)
`Ctrl+=`	Limatura (iterative polish)
`Ctrl+G`	Prompting Guide
`Ctrl+R`	Toggle Research mode

⚙️ Configuration

Environment Variables

Copy env.example.txt to .env in the backend folder:

# Required
OPENROUTER_API_KEY=sk-or-v1-your-key-here

# Optional - Local Models
OLLAMA_URL=http://localhost:11434
LMSTUDIO_URL=http://localhost:1234

# Optional - Research Mode
SEARXNG_URL=http://localhost:4000

# Performance
LOCAL_MODELS_SEQUENTIAL=true  # Run local models one at a time

Available Models

Consilium supports 25+ models via OpenRouter:

Provider	Models
Anthropic	Claude Sonnet 4.5, Opus 4.5, Haiku 4.5
OpenAI	GPT-5.2, GPT-5.2 Pro, GPT-5.1, o3
Google	Gemini 3 Pro, Gemini 2.5 Pro/Flash
xAI	Grok 4, Grok 4 Fast, Grok 3
DeepSeek	DeepSeek V3.2, V3.2 Speciale
Mistral	Mistral Large 3, Devstral 2
Local	Any Ollama/LM Studio model

🏠 Local Models

Ollama Setup

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull models
ollama pull llama3.3
ollama pull qwen2.5:32b
ollama pull deepseek-r1:14b

# Start Ollama server
ollama serve

Consilium automatically detects running Ollama models.

Docker + Local Models

Scenario	OLLAMA_URL
Both native	`http://localhost:11434`
Consilium in Docker, Ollama native	`http://host.docker.internal:11434`

📁 Project Structure

Consilium/
├── backend/                 # Express.js API server
│   ├── src/
│   │   ├── index.js        # Server entry point
│   │   ├── config/         # Model configs, benchmarks
│   │   ├── db/             # SQLite database
│   │   └── routes/         # API endpoints
│   └── package.json
│
├── frontend/               # React + Vite + Tailwind
│   ├── src/
│   │   ├── components/     # React components
│   │   ├── constants/      # Mode definitions
│   │   ├── stores/         # Zustand state
│   │   └── types/          # TypeScript definitions
│   └── package.json
│
├── docker-compose.yml
├── DEVELOPER_GUIDE.md     # Developer guide
└── README.md              # This file

📚 Knowledge Base (RAG)

Consilium includes a built-in Retrieval Augmented Generation (RAG) system that lets you upload documents and have AI models answer questions using your own content.

How It Works

┌─────────────────────────────────────────────────────────────┐
│                 KNOWLEDGE BASE WORKFLOW                      │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│ 1. UPLOAD DOCUMENTS                                          │
│    • Click Database icon (🗄️) in header                      │
│    • Create collections: "Tech Docs", "Research", etc.       │
│    • Upload PDFs, Word docs, text files, Markdown            │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│ 2. AUTOMATIC PROCESSING (Background)                         │
│    • Parse document → Extract text                           │
│    • Chunk text → Smart segmentation (~500 tokens each)      │
│    • Generate embeddings → Ollama qwen3-embedding:8b         │
│    • Store in SQLite database                                │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│ 3. QUERY WITH KNOWLEDGE                                      │
│    • Toggle "Knowledge" button in prompt input               │
│    • Select specific collection or "All Collections"         │
│    • Ask your question                                       │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│ 4. SEMANTIC SEARCH & AUGMENTATION                            │
│    • Your question → Embedded → Compare to chunks            │
│    • Top 5 most relevant chunks retrieved                    │
│    • Chunks added as context to your prompt                  │
│    • All models receive the augmented prompt                 │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│ 5. VIEW SOURCES                                              │
│    • "Knowledge Base Sources" panel shows retrieved chunks   │
│    • Document name, collection, similarity score             │
│    • Preview of the chunk content                            │
└─────────────────────────────────────────────────────────────┘

Supported Document Types

Type	Extension	Notes
PDF	`.pdf`	Text extraction via pdf-parse
Word	`.docx`	Modern Word format via mammoth
Text	`.txt`	Plain text files
Markdown	`.md`	Markdown files

Knowledge Collections

Organize your documents into themed collections:

┌─────────────────────────────────────────────────────────────┐
│ 📂 Collections                                               │
├─────────────────────────────────────────────────────────────┤
│ 🏥 Medical Research      │ 12 docs │ 342 chunks            │
│ 💻 Tech Documentation    │ 8 docs  │ 215 chunks            │
│ 📋 Company Policies      │ 5 docs  │ 89 chunks             │
│ 📖 General               │ 3 docs  │ 47 chunks             │
└─────────────────────────────────────────────────────────────┘

Create collections with custom names, colors, and descriptions
Filter searches to specific collections or search all
Move documents between collections as needed
Delete collections without losing documents (they go to "uncategorized")

Requirements for Knowledge Base

Ollama must be running with an embedding model:

# Install the embedding model
ollama pull qwen3-embedding:8b

# Start Ollama server
ollama serve

Status Check: The Knowledge Panel shows embedding model status
- ✅ Green = Ready to process documents
- ❌ Red = Embedding model not available

Configuration

Environment Variable	Default	Description
`EMBEDDING_MODEL`	`qwen3-embedding:8b`	Ollama embedding model to use
`KNOWLEDGE_TOP_K`	`5`	Max chunks to retrieve per query
`KNOWLEDGE_MIN_SIMILARITY`	`0.3`	Minimum similarity threshold
`KNOWLEDGE_MAX_TOKENS`	`8000`	Max tokens for context

Use Cases

Scenario	How to Use
Company Q&A Bot	Upload policy docs → Ask questions about procedures
Research Assistant	Upload papers → Ask for summaries and connections
Documentation Search	Upload tech docs → Query specific APIs or features
Study Helper	Upload course materials → Ask practice questions
Legal Research	Upload contracts → Query for specific clauses

Combined with Other Features

Knowledge Base works alongside other Consilium features:

Combination	Result
Knowledge + Forum	Multiple models answer using your documents
Knowledge + Veritas	Fact-check claims against your own sources
Knowledge + Synthesis	Combine document insights from multiple models
Knowledge + Research	Use both your docs AND web search

📜 License

Polyform Noncommercial 1.0.0 — See LICENSE for details.

Use Case	Allowed
Educators & Students	✅ Free
Personal/Hobby Use	✅ Free
Non-profit Organizations	✅ Free
Research	✅ Free
Commercial Use	❌ Contact for license

🙏 Acknowledgments

OpenRouter for unified LLM API access
Ollama for local model inference
Vite + React + Tailwind CSS

🏹 A quiver of methods for seeking truth

Built with 🧠 — Seeking Truth Through AI Councils

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
backend		backend
docs		docs
frontend		frontend
.cursorrules		.cursorrules
.gitignore		.gitignore
DEVELOPER_GUIDE.md		DEVELOPER_GUIDE.md
LICENSE		LICENSE
PRD.md		PRD.md
README.md		README.md
cursor_agent_and_readme_documentation_r.md		cursor_agent_and_readme_documentation_r.md
docker-compose.yml		docker-compose.yml
env.example.txt		env.example.txt

Folders and files

Latest commit

History

Repository files navigation

🏛️ Consilium

Analytics Dashboard

Evaluation Panel

🎯 Philosophy

The Core Insight

Why Multiple Models?

Methodological Pluralism

🏹 The Quiver (12 Modes)

Mode Overview

📖 Detailed Mode Guide

1. Forum Mode (Ctrl+1)

2. Debate Mode (Ctrl+2)

3. Consensus Mode (Ctrl+3)

4. Analysis Mode (Ctrl+4)

5. Synthesis Mode (Ctrl+5)

6. Peira Mode (Ctrl+7)

7. Elenchus Mode (Ctrl+8)

8. Versus Mode (Ctrl+9)

9. Arbitrium Mode (Ctrl+0)

10. Veritas Mode (Ctrl+-)

11. Limatura Mode (Ctrl+=)

12. Prompting Guide (Ctrl+G)

📊 Question Type Matrix

✨ Features

🎯 Core Capabilities

🔧 Advanced Features

🚀 Quick Start

Prerequisites

Installation

Access the App

🐳 Docker

Using Docker Compose (Recommended)

Updating to Latest Version

⌨️ Keyboard Shortcuts

⚙️ Configuration

Environment Variables

Available Models

🏠 Local Models

Ollama Setup

Docker + Local Models

📁 Project Structure

📚 Knowledge Base (RAG)

How It Works

Supported Document Types

Knowledge Collections

Requirements for Knowledge Base

Configuration

Use Cases

Combined with Other Features

📜 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages