Skip to content

TheAIHorizon/Consilium

Repository files navigation

🏛️ Consilium

Consilium Logo

A quiver of methods for seeking truth through AI councils.
v1.2.0

Query multiple AI models simultaneously. Compare, debate, verify, refine, and synthesize their responses.

Skip to Install · Docker

Consilium Forum Mode

📊 More Screenshots

Analytics Dashboard

Track model performance, costs, and usage patterns.

Analytics Dashboard

Evaluation Panel

See detailed rankings, scores by criteria, and AI-generated analysis.

Evaluation Panel

PhilosophyThe QuiverQuick StartDockerConfiguration


🎯 Philosophy

The Core Insight

There is no single method that reliably produces truth.

But there are appropriate methods for different kinds of questions.

Consilium (Latin for "council" or "deliberation") is an epistemological framework instantiated in software. Different questions require different methods of inquiry. Each mode is an "arrow" in your quiver, designed for a specific epistemic target.

Why Multiple Models?

No single AI has all the answers. Each LLM has different training data, reasoning approaches, and blind spots. What one model gets wrong, another might get right.

Challenge How Consilium Helps
AI hallucinations Cross-check answers across multiple models (Veritas mode)
Model bias Anonymous deliberation removes reputation bias (Consensus, Arbitrium)
Finding the right model Blind preference voting reveals true preferences (Arbitrium)
Complex decisions Structured debate surfaces all arguments (Debate, Elenchus)
Quality output Sequential refinement polishes content (Limatura)
Comprehensive answers Synthesize insights from multiple sources (Synthesis)

Methodological Pluralism

This is methodological pluralism — the philosophical position that different domains of inquiry require different approaches:

Question Type Method Consilium Mode
Factual claims Verification Veritas
Complex trade-offs Dialectic Debate
Bias reduction Deliberation Consensus, Arbitrium
Quality assessment Cross-examination Analysis, Elenchus
Capability testing Empiricism Peira
Comprehensive coverage Integration Synthesis
Quality improvement Iteration Limatura

🏹 The Quiver (12 Modes)

Consilium provides 12 distinct modes — each an arrow designed for a different target:

Mode Overview

Mode Shortcut Purpose When to Use
Forum Ctrl+1 Compare & Judge General questions, find best answer
Debate Ctrl+2 Round-Robin Discussion Complex topics with trade-offs
Consensus Ctrl+3 Anonymous Deliberation Bias-reduced conclusions
Analysis Ctrl+4 Multi-Judge Critique Deep evaluation of one answer
Synthesis Ctrl+5 Combine into One Comprehensive coverage needed
Analytics Ctrl+6 Performance Stats Review usage and costs
Peira Ctrl+7 Capability Testing Benchmark model abilities
Elenchus Ctrl+8 Adversarial Red Team Stress-test code/ideas
Versus Ctrl+9 Local vs Commercial Compare local to cloud models
Arbitrium Ctrl+0 Blind Preference Vote Discover true preferences
Veritas Ctrl+- Fact Check & Verify Detect hallucinations
Limatura Ctrl+= Iterative Polish Refine through multiple passes
Prompting Ctrl+G Prompting Guide Learn effective prompting techniques

📖 Detailed Mode Guide

1. Forum Mode (Ctrl+1)

Latin: "forum" — public place of discussion

All selected models answer your question simultaneously, then an AI judge ranks them.

┌─────────────────────────────────────────────────────────┐
│                    YOUR QUESTION                        │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┼───────────────┐
        ▼               ▼               ▼
   ┌─────────┐    ┌─────────┐    ┌─────────┐
   │ Model A │    │ Model B │    │ Model C │
   └────┬────┘    └────┬────┘    └────┬────┘
        │              │              │
        ▼              ▼              ▼
┌─────────────────────────────────────────────────────────┐
│      SIDE-BY-SIDE COMPARISON                            │
│   + Blind Evaluation (judges see "Response A" not names)│
└─────────────────────────────────────────────────────────┘

Best for: General questions, comparing writing styles, finding the best model for your use case

Features:

  • Real-time streaming responses
  • Blind evaluation (prevents model reputation bias)
  • Follow-up questions with context
  • Auto-evaluation ranks responses when complete

2. Debate Mode (Ctrl+2)

Latin: "debattuere" — to fight, contend

A structured multi-round discussion where models build on each other's ideas.

┌─────────────────────────────────────────────────────────┐
│                    YOUR TOPIC                           │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│ ROUND 1                                                 │
│ Model A → Model B → Model C (sees all previous)         │
└───────────────────────┬─────────────────────────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│ ROUND 2                                                 │
│ Model A → Model B → Model C (builds on Round 1)         │
└───────────────────────┬─────────────────────────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│           AUTOMATIC CONSENSUS SUMMARY                   │
└─────────────────────────────────────────────────────────┘

Best for: Complex topics with trade-offs, controversial questions, exploring all sides

How to use:

  1. Select 2+ Participants
  2. Set number of Rounds (1-5)
  3. Models discuss round-robin, building on previous responses
  4. Automatic Consensus Summary generated at the end

3. Consensus Mode (Ctrl+3)

Latin: "consensus" — agreement, harmony

Models deliberate anonymously over multiple rounds to find where they agree.

┌─────────────────────────────────────────────────────────┐
│                    YOUR QUESTION                        │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│ ROUND 0 - INITIAL POSITIONS                             │
│ Each model answers independently                        │
│ Responses anonymized: Position A, B, C, D...            │
└───────────────────────┬─────────────────────────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│ ROUNDS 1-3 - DELIBERATION                               │
│ Each model sees ALL anonymized positions                │
│ (but NOT who said what - prevents bias)                 │
│ Task: Consider others, identify agreements/disputes,    │
│       refine position, move toward consensus            │
└───────────────────────┬─────────────────────────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│ FINAL - ARBITER SYNTHESIS                               │
│ ✅ Consensus answer (if agreement reached)              │
│ OR                                                      │
│ 📊 Summary: What they agree on + What remains disputed  │
└─────────────────────────────────────────────────────────┘

Best for: Reducing model bias, finding fundamental agreements, cross-validated answers

Key difference from Debate: Models don't know who said what during deliberation, preventing "I agree with GPT because it's GPT" bias.


4. Analysis Mode (Ctrl+4)

Greek: "analusis" — breaking up, investigation

One model answers, multiple analysts evaluate the response from different perspectives.

┌─────────────────────────────────────────────────────────┐
│                    YOUR QUESTION                        │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│              ANSWERER MODEL RESPONDS                    │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┼───────────────┐
        ▼               ▼               ▼
   ┌──────────┐   ┌──────────┐   ┌──────────┐
   │ Analyst 1│   │ Analyst 2│   │ Analyst 3│
   │ Evaluates│   │ Evaluates│   │ Evaluates│
   └────┬─────┘   └────┬─────┘   └────┬─────┘
        │              │              │
        ▼              ▼              ▼
┌─────────────────────────────────────────────────────────┐
│    MULTI-PERSPECTIVE EVALUATION & SCORING               │
└─────────────────────────────────────────────────────────┘

Best for: Deep critique, understanding strengths/weaknesses, academic review


5. Synthesis Mode (Ctrl+5)

Greek: "sunthesis" — putting together

Multiple models answer, one synthesizer combines the best parts into a unified response.

┌─────────────────────────────────────────────────────────┐
│                    YOUR QUESTION                        │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┼───────────────┐
        ▼               ▼               ▼
   ┌─────────┐    ┌─────────┐    ┌─────────┐
   │ Source 1│    │ Source 2│    │ Source 3│
   │ Answers │    │ Answers │    │ Answers │
   └────┬────┘    └────┬────┘    └────┬────┘
        │              │              │
        └──────────────┼──────────────┘
                       ▼
         ┌─────────────────────────┐
         │   SYNTHESIZER MODEL     │
         │ Combines all responses  │
         │  into unified answer    │
         └─────────────────────────┘

Best for: Research requiring comprehensive coverage, combining expertise, unified summaries


6. Peira Mode (Ctrl+7)

Greek: "πεῖρα" (peira) — trial, experiment, test

Systematically test what models can and cannot do with structured benchmarks.

┌─────────────────────────────────────────────────────────┐
│              SELECT TEST CATEGORY                       │
│  [Coding] [Math] [Reasoning] [Knowledge] [Creative]     │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│              SELECT MODELS TO TEST                      │
│  □ Claude Sonnet 4.5    □ GPT-5.2    □ Gemini 3 Pro     │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│              STRUCTURED TEST BATTERY                    │
│  Each model receives identical test prompts             │
│  for fair comparison                                    │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│              CAPABILITY REPORT                          │
│  ┌─────────────────────────────────────────────────┐   │
│  │ Model          │ Score │ Speed │ Style        │   │
│  │ Claude Sonnet  │  92%  │ 45t/s │ Detailed     │   │
│  │ GPT-5.2        │  89%  │ 52t/s │ Concise      │   │
│  │ Gemini 3 Pro   │  87%  │ 61t/s │ Structured   │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Test Categories:

  • Coding: Algorithm implementation, debugging, code review
  • Math/Logic: Arithmetic, word problems, proofs, puzzles
  • Reasoning: Syllogisms, analogies, causal reasoning
  • Knowledge: Trivia, history, science, current events
  • Creativity: Storytelling, poetry, brainstorming

Unique Value: This is the only mode where the question is fundamentally about the models themselves, not the world.


7. Elenchus Mode (Ctrl+8)

Greek: "ἔλεγχος" (elenchus) — cross-examination, refutation (Socrates' method)

Stress-test ideas, code, or plans by having models attack them.

┌─────────────────────────────────────────────────────────┐
│         YOUR CONTENT TO BE CHALLENGED                   │
│    (code, argument, plan, proposal, idea)               │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┴───────────────┐
        ▼                               ▼
┌───────────────┐               ┌───────────────┐
│   DEFENDER    │               │  CHALLENGERS  │
│  (1 model)    │   ⚔️ VS ⚔️    │  (1+ models)  │
│ Defends the   │               │ Attack/find   │
│   content     │               │    flaws      │
└───────┬───────┘               └───────┬───────┘
        │                               │
        └───────────────┬───────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│ ROUND 1: Challengers attack                             │
│ ROUND 2: Defender responds                              │
│ ROUND 3: Challengers counter                            │
│ ...                                                     │
└───────────────────────┬─────────────────────────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│              ARBITER VERDICT (optional)                 │
│  • Vulnerabilities found                                │
│  • Defenses successful                                  │
│  • Final assessment                                     │
└─────────────────────────────────────────────────────────┘

Use Cases:

  • Security Review: "Find vulnerabilities in this code"
  • Argument Testing: "What's wrong with this reasoning?"
  • Business Plans: "What could go wrong with this strategy?"
  • Risk Assessment: "Why shouldn't I do this?"

Unique Value: Systematic adversarial testing. Truth survives challenge.


8. Versus Mode (Ctrl+9)

Latin: "versus" — against, turned toward

Compare your local models against commercial frontier models with blind evaluation.

┌─────────────────────────────────────────────────────────┐
│                    YOUR PROMPT                          │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┴───────────────┐
        ▼                               ▼
┌───────────────────────┐   ┌───────────────────────┐
│   LOCAL COUNCIL       │   │  COMMERCIAL COUNCIL   │
│  • llama3.3:70b       │   │  • Claude Sonnet 4.5  │
│  • qwen2.5:32b        │   │  • GPT-5.2            │
│  • deepseek-r1:14b    │   │  • Gemini 3 Pro       │
└───────────┬───────────┘   └───────────┬───────────┘
            │                           │
            ▼                           ▼
┌───────────────────────┐   ┌───────────────────────┐
│   SYNTHESIZE into     │   │   SYNTHESIZE into     │
│   one council answer  │   │   one council answer  │
└───────────┬───────────┘   └───────────┬───────────┘
            │                           │
            └───────────┬───────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│              BLIND JUDGE EVALUATION                     │
│  (Compares councils AND local vs each individual model) │
└───────────────────────┬─────────────────────────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│              RESULTS & INSIGHTS                         │
│  🏆 Winner: [Local Council/Commercial Council]          │
│  💰 Cost: Local $0 vs Commercial $X.XX                  │
│  📊 Savings if local wins: $X.XX saved!                 │
│                                                         │
│  🎯 Local Council vs Individual Models:                 │
│  ┌────────┐  ┌────────┐  ┌────────┐                    │
│  │   ✅   │  │   ❌   │  │   🤝   │                    │
│  │ Claude │  │ GPT-5  │  │ Gemini │                    │
│  └────────┘  └────────┘  └────────┘                    │
│  ✅ = Local council beats model                        │
│  ❌ = Model beats local council                        │
└─────────────────────────────────────────────────────────┘

How It Works:

  1. Both councils answer your question (models run serially for quality)
  2. Each council synthesizes individual responses into one unified answer
  3. Judge compares synthesized answers (Council A vs B) — blind, fair
  4. Judge also compares local synthesis vs each individual commercial model
  5. Results show: winner, cost saved, and whether your council beats frontier models individually

Best for: Testing if local models can replace paid APIs, finding which tasks locals handle well

Unique Value: Two levels of insight:

  • "Is my local council as good as commercial?" (synthesis vs synthesis)
  • "Can my local council beat individual frontier models?" (teamwork vs individuals)

9. Arbitrium Mode (Ctrl+0)

Latin: "arbitrium" — judgment, decision, free will

Discover your true preferences without model reputation bias.

┌─────────────────────────────────────────────────────────┐
│                    YOUR QUESTION                        │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┴───────────────┐
        ▼                               ▼
┌───────────────────────┐   ┌───────────────────────┐
│     RESPONSE A        │   │     RESPONSE B        │
│   (Model hidden)      │   │   (Model hidden)      │
│                       │   │                       │
│   [Full response      │   │   [Full response      │
│    displayed here]    │   │    displayed here]    │
│                       │   │                       │
└───────────┬───────────┘   └───────────┬───────────┘
            │                           │
            └───────────┬───────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│           WHICH DO YOU PREFER?                          │
│      [Vote A]              [Vote B]                     │
└───────────────────────┬─────────────────────────────────┘
                        │ (after voting)
                        ▼
┌─────────────────────────────────────────────────────────┐
│           REVEAL: You chose Claude Sonnet 4.5!          │
│   Your preference data feeds into personal analytics    │
└─────────────────────────────────────────────────────────┘

Features:

  • Blind by default — no peeking at model names
  • Reveal after voting — see which model you actually preferred
  • Preference tracking — builds personal model rankings over time
  • Arena-style data — similar to LMSYS Chatbot Arena, but personal

Unique Value: Removes reputation bias. You might discover you prefer different models than you thought!


10. Veritas Mode (Ctrl+-)

Latin: "veritas" — truth

Detect hallucinations and verify factual claims through cross-model consensus.

┌─────────────────────────────────────────────────────────┐
│         CLAIM OR QUESTION TO VERIFY                     │
│   "The Great Wall of China is visible from space"       │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┼───────────────┐
        ▼               ▼               ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│  VERIFIER 1   │ │  VERIFIER 2   │ │  VERIFIER 3   │
│ Claude Sonnet │ │    GPT-5.2    │ │  Gemini Pro   │
│               │ │               │ │               │
│ Verdict: FALSE│ │ Verdict: FALSE│ │ Verdict: FALSE│
│ Confidence:95%│ │ Confidence:92%│ │ Confidence:88%│
│ Citations: ✓  │ │ Citations: ✓  │ │ Citations: ✓  │
└───────────────┘ └───────────────┘ └───────────────┘
        │               │               │
        └───────────────┼───────────────┘
                        ▼
┌─────────────────────────────────────────────────────────┐
│              ANALYZER SYNTHESIS                         │
│  ────────────────────────────────────────────────────  │
│  OVERALL VERDICT: FALSE                                 │
│  CONFIDENCE: 92%                                        │
│                                                         │
│  CONSENSUS FACTS:                                       │
│  ✅ All models agree the claim is false                 │
│  ✅ Cited NASA astronaut testimonies                    │
│  ✅ Referenced physics of human vision                  │
│                                                         │
│  KEY EVIDENCE:                                          │
│  • Wall is ~30ft wide, not visible at orbital altitude  │
│  • Myth debunked by multiple astronauts                 │
│                                                         │
│  DISPUTED: None                                         │
│  UNSUPPORTED: None                                      │
└─────────────────────────────────────────────────────────┘

Three Verification Methods:

Method Description Best For
🧠 Memory Only Uses model training data only. "If unknown, say so." Testing model knowledge without external sources
🌐 Shared Research One search, all models get same results Fair comparison with consistent evidence
🔍 Independent Research Each model searches independently Seeing how models approach verification differently

Independent Research - Source Comparison: When using Independent Research mode, Veritas compares sources found by different models:

  • Common Sources: URLs found by multiple models (high confidence)
  • Unique Sources: URLs only one model found (may reveal blind spots)
  • Search Queries: See what each model searched for

Verification Flow:

  1. Select verification method (Memory Only / Shared Research / Independent Research)
  2. Enter claim or question to verify
  3. Multiple verifier models independently assess truthfulness with citations
  4. Analyzer model synthesizes final report
  5. Report shows: consensus facts, disputed claims, confidence levels

Best for: Fact-checking before publishing, detecting hallucinations, verifying information

Unique Value: Structured hallucination detection with flexible research options. Trust but verify.


11. Limatura Mode (Ctrl+=)

Latin: "limatura" — filing, polishing, refinement

Polish and improve output through sequential model passes.

┌─────────────────────────────────────────────────────────┐
│                CONTENT TO POLISH                        │
│        (code, text, email, document)                    │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│   V0: ORIGINAL (Model A creates initial response)       │
│   "Here is my first draft of the email..."              │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│   V1: FIRST REFINEMENT (Model B improves V0)            │
│   "Here is the improved version with clearer..."        │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│   V2: SECOND REFINEMENT (Model C improves V1)           │
│   "Here is the polished final version..."               │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│              VERSION COMPARISON                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │ V0 (Original)  │ V1 (Refined)  │ V2 (Polished) │   │
│  │ [View]         │ [View]        │ [View] ★      │   │
│  └─────────────────────────────────────────────────┘   │
│  Current: V2 by Model C                                 │
│  [Copy Final] [Continue Refining]                       │
└─────────────────────────────────────────────────────────┘

Refinement Types:

  • General improvement: "Make this better"
  • Style refinement: "Make this more concise/formal/casual"
  • Code refinement: "Optimize and clean this code"
  • Custom instruction: User-defined refinement criteria

Best for: Code optimization, document drafting, email refinement, creative writing polish

Unique Value: Sequential improvement, not just comparison. Each model builds on the last.


12. Prompting Guide (Ctrl+G)

Purpose: Learn and apply effective prompting techniques

A comprehensive guide to crafting effective AI prompts with 8 proven formulas.

┌─────────────────────────────────────────────────────────┐
│              PROMPTING GUIDE                            │
│                                                         │
│  📋 8 PROVEN FORMULAS:                                 │
│                                                         │
│  1. RTCF - Role, Task, Context, Format                 │
│  2. CREATE - Character, Request, Examples...           │
│  3. RISEN - Role, Instructions, Steps, End Goal...     │
│  4. Chain-of-Thought - Step-by-step reasoning          │
│  5. Few-Shot Learning - Input/output examples          │
│  6. STAR - Situation, Task, Action, Result             │
│  7. Code Generation - Language, Requirements...        │
│  8. Self-Critique - Generate, critique, improve        │
│                                                         │
│  Each formula includes:                                 │
│  • Component breakdown                                  │
│  • Real-world examples                                  │
│  • Best use cases                                       │
│  • One-click copy                                       │
└─────────────────────────────────────────────────────────┘

Available Formulas:

Formula Components Best For
RTCF Role + Task + Context + Format General structured prompts
CREATE Character + Request + Examples + Adjustments + Type + Extras Detailed specifications
RISEN Role + Instructions + Steps + End Goal + Narrowing Multi-step tasks
Chain-of-Thought Step-by-step reasoning Complex reasoning problems
Few-Shot Input → Output examples Pattern learning
STAR Situation + Task + Action + Result Problem-solving narratives
Code Generation Language + Requirements + Standards + Edge Cases Programming tasks
Self-Critique Generate → Critique → Improve Quality iteration

Best for: Learning prompt engineering, improving query quality, teaching prompting techniques

Unique Value: Reference guide for effective prompting, always available with Ctrl+G.


📊 Question Type Matrix

Use this table to choose the right mode for your question:

Question Type Recommended Mode Why
"What is X?" (Factual) Forum or Veritas Forum for comparison; Veritas for accuracy
"What's the best X?" (Opinion) Consensus or Arbitrium Consensus reduces bias; Arbitrium reveals preference
Creative writing Forum or Limatura Forum for variety; Limatura for polish
Coding/Technical Forum or Elenchus Forum for solutions; Elenchus for security review
Controversial/Ethical Debate Models engage with counterarguments
"Should I do X?" (Decision) Consensus or Elenchus Consensus for recommendation; Elenchus for risks
Research/Comprehensive Synthesis + Veritas Synthesis for coverage; Veritas for accuracy
Security Review Elenchus Adversarial testing finds vulnerabilities
Model Benchmarking Peira Structured capability testing
Quick Comparison Arbitrium Fast blind preference voting
Quality Polish Limatura Iterative improvement chain
Hallucination Check Veritas Cross-model fact verification
Local vs Cloud Versus Data-driven cost/quality comparison

✨ Features

🎯 Core Capabilities

  • Multi-Model Comparison — Query multiple LLMs simultaneously
  • Streaming Responses — Real-time output from all models
  • Blind Evaluation — Anonymized judging prevents bias
  • URL Content Fetching — Include webpage content in prompts
  • Session Management — Save, tag, search, reload sessions
  • Export Options — JSON, Markdown, CSV

🔧 Advanced Features

  • Knowledge Base (RAG) — Upload documents for context-aware responses
  • Vision Support — Upload images for multi-model analysis
  • Research Mode (SearXNG) — Web search before querying models
  • Conversation Continuity — Follow-up questions with context
  • Prompt Templates — Reusable prompts with variables
  • Cost Tracking — Estimated API costs per response
  • Model Analytics — Track which models win evaluations
  • Pin/Favorite Responses — Star great responses
  • Keyboard Shortcuts — Full keyboard navigation
  • Local Model Support — Ollama and LM Studio integration
  • Dark/Light Themes — Beautiful UI in both modes
  • Model Sync — Fetch latest models from OpenRouter API
  • Benchmark Sync — Update benchmark scores from HuggingFace Leaderboard
  • Prompting Guide — Learn effective prompting techniques

🚀 Quick Start

Prerequisites

💡 Why OpenRouter? One API key = access to 25+ models (OpenAI, Anthropic, Google, xAI, Mistral, and more). Pay-as-you-go pricing.

Installation

# Clone the repository
git clone https://github.com/lafintiger/Consilium.git
cd Consilium

# Install backend dependencies
cd backend
npm install

# Configure environment
cp ../env.example.txt .env
# Edit .env and add your OPENROUTER_API_KEY

# Start backend
npm run dev

# In a new terminal, install and start frontend
cd frontend
npm install
npm run dev

Access the App


🐳 Docker

Using Docker Compose (Recommended)

# Copy and configure environment
cp env.example.txt .env
# Edit .env with your API keys

# Build and start
docker compose up -d

# View logs
docker compose logs -f

# Stop
docker compose down

Updating to Latest Version

git pull
docker compose down
docker compose build --no-cache
docker compose up -d

⌨️ Keyboard Shortcuts

Shortcut Action
Ctrl+1 Forum mode
Ctrl+2 Debate mode
Ctrl+3 Consensus mode
Ctrl+4 Analysis mode
Ctrl+5 Synthesis mode
Ctrl+6 Analytics dashboard
Ctrl+7 Peira (capability testing)
Ctrl+8 Elenchus (red team)
Ctrl+9 Versus (local vs commercial)
Ctrl+0 Arbitrium (blind voting)
Ctrl+- Veritas (fact check)
Ctrl+= Limatura (iterative polish)
Ctrl+G Prompting Guide
Ctrl+R Toggle Research mode

⚙️ Configuration

Environment Variables

Copy env.example.txt to .env in the backend folder:

# Required
OPENROUTER_API_KEY=sk-or-v1-your-key-here

# Optional - Local Models
OLLAMA_URL=http://localhost:11434
LMSTUDIO_URL=http://localhost:1234

# Optional - Research Mode
SEARXNG_URL=http://localhost:4000

# Performance
LOCAL_MODELS_SEQUENTIAL=true  # Run local models one at a time

Available Models

Consilium supports 25+ models via OpenRouter:

Provider Models
Anthropic Claude Sonnet 4.5, Opus 4.5, Haiku 4.5
OpenAI GPT-5.2, GPT-5.2 Pro, GPT-5.1, o3
Google Gemini 3 Pro, Gemini 2.5 Pro/Flash
xAI Grok 4, Grok 4 Fast, Grok 3
DeepSeek DeepSeek V3.2, V3.2 Speciale
Mistral Mistral Large 3, Devstral 2
Local Any Ollama/LM Studio model

🏠 Local Models

Ollama Setup

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull models
ollama pull llama3.3
ollama pull qwen2.5:32b
ollama pull deepseek-r1:14b

# Start Ollama server
ollama serve

Consilium automatically detects running Ollama models.

Docker + Local Models

Scenario OLLAMA_URL
Both native http://localhost:11434
Consilium in Docker, Ollama native http://host.docker.internal:11434

📁 Project Structure

Consilium/
├── backend/                 # Express.js API server
│   ├── src/
│   │   ├── index.js        # Server entry point
│   │   ├── config/         # Model configs, benchmarks
│   │   ├── db/             # SQLite database
│   │   └── routes/         # API endpoints
│   └── package.json
│
├── frontend/               # React + Vite + Tailwind
│   ├── src/
│   │   ├── components/     # React components
│   │   ├── constants/      # Mode definitions
│   │   ├── stores/         # Zustand state
│   │   └── types/          # TypeScript definitions
│   └── package.json
│
├── docker-compose.yml
├── DEVELOPER_GUIDE.md     # Developer guide
└── README.md              # This file

📚 Knowledge Base (RAG)

Consilium includes a built-in Retrieval Augmented Generation (RAG) system that lets you upload documents and have AI models answer questions using your own content.

How It Works

┌─────────────────────────────────────────────────────────────┐
│                 KNOWLEDGE BASE WORKFLOW                      │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│ 1. UPLOAD DOCUMENTS                                          │
│    • Click Database icon (🗄️) in header                      │
│    • Create collections: "Tech Docs", "Research", etc.       │
│    • Upload PDFs, Word docs, text files, Markdown            │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│ 2. AUTOMATIC PROCESSING (Background)                         │
│    • Parse document → Extract text                           │
│    • Chunk text → Smart segmentation (~500 tokens each)      │
│    • Generate embeddings → Ollama qwen3-embedding:8b         │
│    • Store in SQLite database                                │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│ 3. QUERY WITH KNOWLEDGE                                      │
│    • Toggle "Knowledge" button in prompt input               │
│    • Select specific collection or "All Collections"         │
│    • Ask your question                                       │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│ 4. SEMANTIC SEARCH & AUGMENTATION                            │
│    • Your question → Embedded → Compare to chunks            │
│    • Top 5 most relevant chunks retrieved                    │
│    • Chunks added as context to your prompt                  │
│    • All models receive the augmented prompt                 │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│ 5. VIEW SOURCES                                              │
│    • "Knowledge Base Sources" panel shows retrieved chunks   │
│    • Document name, collection, similarity score             │
│    • Preview of the chunk content                            │
└─────────────────────────────────────────────────────────────┘

Supported Document Types

Type Extension Notes
PDF .pdf Text extraction via pdf-parse
Word .docx Modern Word format via mammoth
Text .txt Plain text files
Markdown .md Markdown files

Knowledge Collections

Organize your documents into themed collections:

┌─────────────────────────────────────────────────────────────┐
│ 📂 Collections                                               │
├─────────────────────────────────────────────────────────────┤
│ 🏥 Medical Research      │ 12 docs │ 342 chunks            │
│ 💻 Tech Documentation    │ 8 docs  │ 215 chunks            │
│ 📋 Company Policies      │ 5 docs  │ 89 chunks             │
│ 📖 General               │ 3 docs  │ 47 chunks             │
└─────────────────────────────────────────────────────────────┘
  • Create collections with custom names, colors, and descriptions
  • Filter searches to specific collections or search all
  • Move documents between collections as needed
  • Delete collections without losing documents (they go to "uncategorized")

Requirements for Knowledge Base

  1. Ollama must be running with an embedding model:

    # Install the embedding model
    ollama pull qwen3-embedding:8b
    
    # Start Ollama server
    ollama serve
  2. Status Check: The Knowledge Panel shows embedding model status

    • ✅ Green = Ready to process documents
    • ❌ Red = Embedding model not available

Configuration

Environment Variable Default Description
EMBEDDING_MODEL qwen3-embedding:8b Ollama embedding model to use
KNOWLEDGE_TOP_K 5 Max chunks to retrieve per query
KNOWLEDGE_MIN_SIMILARITY 0.3 Minimum similarity threshold
KNOWLEDGE_MAX_TOKENS 8000 Max tokens for context

Use Cases

Scenario How to Use
Company Q&A Bot Upload policy docs → Ask questions about procedures
Research Assistant Upload papers → Ask for summaries and connections
Documentation Search Upload tech docs → Query specific APIs or features
Study Helper Upload course materials → Ask practice questions
Legal Research Upload contracts → Query for specific clauses

Combined with Other Features

Knowledge Base works alongside other Consilium features:

Combination Result
Knowledge + Forum Multiple models answer using your documents
Knowledge + Veritas Fact-check claims against your own sources
Knowledge + Synthesis Combine document insights from multiple models
Knowledge + Research Use both your docs AND web search

📜 License

Polyform Noncommercial 1.0.0 — See LICENSE for details.

Use Case Allowed
Educators & Students ✅ Free
Personal/Hobby Use ✅ Free
Non-profit Organizations ✅ Free
Research ✅ Free
Commercial Use ❌ Contact for license

🙏 Acknowledgments


🏹 A quiver of methods for seeking truth

Built with 🧠 — Seeking Truth Through AI Councils

About

Multi-LLM Council Interface — query multiple models simultaneously, compare responses side-by-side, and evaluate with structured criteria. A Sanctum partner tool.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors