feat: ensemble decision pipeline — panel, arbiter, judge, ChromaDB feedback by Z0mb13V1 · Pull Request #721 · mindcraft-bots/mindcraft

Z0mb13V1 · 2026-03-04T01:01:15Z

Ensemble Decision Pipeline

Adds a 4-model parallel consensus pipeline that improves action quality by aggregating multiple LLM opinions before acting.

Architecture

\
User input → Panel (4x parallel LLMs) → Arbiter (heuristic) → [Judge] → Action
↑
ChromaDB feedback
\\

New Files

File	Description
\src/ensemble/panel.js\	Queries 4 LLMs in parallel, collects candidate actions
\src/ensemble/arbiter.js\	Heuristic scoring; escalates to judge when margin < 0.08
\src/ensemble/judge.js\	Gemini Flash LLM judge for close decisions (10s timeout)
\src/ensemble/feedback.js\	ChromaDB loop — embeds context, retrieves past decisions (similarity > 0.6)
\src/ensemble/controller.js\	Orchestrates Panel → Arbiter → Judge flow
\src/ensemble/logger.js\	Structured decision logging

Why This Matters

Single-model Mindcraft bots frequently make locally-optimal but globally-poor decisions. Ensemble voting reduces hallucination rate and improves long-horizon task completion.

Adds a 4-model parallel decision pipeline to replace single-model responses: - src/ensemble/panel.js: queries 4 LLMs in parallel, collects candidate actions - src/ensemble/arbiter.js: heuristic scoring with 0.08 margin escalation to judge - src/ensemble/judge.js: Gemini Flash LLM judge for close decisions (10s timeout) - src/ensemble/feedback.js: ChromaDB feedback loop — embeds context, retrieves similar past decisions (similarity > 0.6), injects as [PAST EXPERIENCE] - src/ensemble/controller.js: orchestrates the full Panel -> Arbiter -> Judge flow - src/ensemble/logger.js: structured ensemble decision logging

Copilot

Pull request overview

This PR introduces an “ensemble decision pipeline” intended to act as a drop-in replacement for the existing single-LLM chat_model, by running multiple LLMs in parallel (Panel), selecting via a heuristic scorer (Arbiter), optionally escalating close calls to an LLM judge (Judge), and persisting/querying past decisions via ChromaDB (Feedback), with structured logging.

Changes:

Added Panel to query multiple configured models concurrently with per-model timeouts and command extraction.
Added Arbiter heuristic scoring + “low confidence” signal for judge escalation.
Added LLMJudge, FeedbackCollector (ChromaDB), and EnsembleLogger, orchestrated by EnsembleModel in controller.js.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
src/ensemble/panel.js	Parallel panel querying with timeouts; extracts command + pre-command text from responses.
src/ensemble/arbiter.js	Heuristic scoring, majority bonus, latency penalty, and low-confidence thresholding.
src/ensemble/judge.js	LLM-based judge used when arbiter confidence is low.
src/ensemble/feedback.js	ChromaDB-backed vector “experience” storage + similarity retrieval.
src/ensemble/controller.js	Orchestrates Panel → Arbiter → (optional) Judge flow; injects “past experiences”; aggregates usage.
src/ensemble/logger.js	Persists decision logs and basic stats under `./bots/<agent>/ensemble_log.json`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-04T01:05:50Z

src/ensemble/arbiter.js

+    constructor(config = {}) {
+        this.strategy = config.strategy || 'heuristic';
+        this.majorityBonus = config.majority_bonus ?? 0.2;
+        this.latencyPenalty = config.latency_penalty_per_sec ?? 0.02;
+        this._confidenceThreshold = config.confidence_threshold ?? 0.08;
+        this._lastConfidence = 1.0;  // set after each pick()
+    }


config.strategy is stored on the instance but never used by the arbiter logic. If strategy selection is planned, implement it; otherwise remove it to avoid dead config options and reduce maintenance overhead.

Copilot · 2026-03-04T01:05:51Z

src/ensemble/feedback.js

+import { ChromaClient } from 'chromadb';
+
+const COLLECTION_NAME = 'ensemble_memory';
+const CHROMADB_URL = process.env.CHROMADB_URL || 'http://localhost:8000';
+


chromadb is imported at module top-level, but the dependency is not present in package.json. This will throw on startup even if ChromaDB is meant to be optional. Consider adding chromadb to dependencies, or switching to a lazy/dynamic import inside _initAsync() so the app can run without it.

Copilot · 2026-03-04T01:05:51Z

src/ensemble/controller.js

+export class EnsembleModel {
+    static prefix = 'ensemble';
+
+    /**


EnsembleModel declares static prefix = 'ensemble', but it lives under src/ensemble/ and is not discoverable by src/models/_model_map.js (which only auto-loads models from src/models/). As-is, selectAPI/createModel cannot construct this model from a profile, so it isn't actually a drop-in chat_model replacement. Consider moving/exporting this model from src/models/ (e.g., src/models/ensemble.js) or extending _model_map.js to include it.

Copilot · 2026-03-04T01:05:51Z

src/ensemble/judge.js

+     * @param {Proposal[]} proposals - successful proposals only
+     * @param {string} systemMessage - the original system prompt (abbreviated)
+     * @param {Array} turns - last few conversation turns for context
+     * @returns {Promise<string|null>} winning agentId, or null if judge fails
+     */
+    async judge(proposals, systemMessage, turns) {
+        if (proposals.length === 0) return null;
+        if (proposals.length === 1) return proposals[0].agentId;


The systemMessage parameter is documented and passed into judge(), but it's not used anywhere in the method. Either incorporate it into the judge prompt (if intended) or remove the parameter to avoid confusion and keep the API honest.

Copilot · 2026-03-04T01:05:52Z

src/ensemble/controller.js

+
+        // Phase 2: LLM judge fallback when heuristic confidence is low
+        if (this.judge && this.arbiter.isLowConfidence() && successful.length >= 2) {
+            console.log(`[Ensemble] Low confidence (margin=${this.arbiter._lastConfidence.toFixed(3)}), consulting LLM judge...`);


This log line reaches into this.arbiter._lastConfidence directly. Since _lastConfidence is effectively internal state, consider exposing it via a getter (e.g., lastConfidence) to avoid controller↔arbiter coupling on a private field name.

Suggested change

console.log(`[Ensemble] Low confidence (margin=${this.arbiter._lastConfidence.toFixed(3)}), consulting LLM judge...`);

console.log('[Ensemble] Low confidence detected, consulting LLM judge...');

Copilot · 2026-03-04T01:05:52Z

src/ensemble/panel.js

+ * @property {number} latencyMs - Time taken for this model's response
+ * @property {string} status - "success" | "error" | "timeout"
+ * @property {string|null} error - Error message if status !== "success"
+ * @property {number|null} score - Set by Arbiter


The Proposal typedef doesn't mention the usage field, but _queryMember() returns usage: member.model._lastUsage || null. Either add usage to the typedef or remove it from the returned object to keep the shape consistent/documented.

Suggested change

* @property {number|null} score - Set by Arbiter

* @property {number|null} score - Set by Arbiter

* @property {string|null} usage - Last recorded usage information for this model, or null

Copilot · 2026-03-04T01:05:52Z

src/ensemble/controller.js

+
+        // Log decision
+        if (this.logDecisions) {
+            this.logger.logDecision(proposals, winner);


logDecision() is async but is called without await/.catch(). If the logger's _ready promise rejects (e.g., mkdir fails), this becomes an unhandled promise rejection. Either await the call, or explicitly handle it with .catch()/void plus internal error handling to keep logging failures from surfacing.

Suggested change

this.logger.logDecision(proposals, winner);

try {

await this.logger.logDecision(proposals, winner);

} catch (err) {

console.warn('[Ensemble] Failed to log decision:', err);

}

Copilot · 2026-03-04T01:05:53Z

src/ensemble/logger.js

+        const successful = allProposals.filter(p => p.status === 'success');
+        const commands = successful.map(p => p.command).filter(Boolean);
+        const uniqueCommands = [...new Set(commands)];
+        const agreement = uniqueCommands.length <= 1 && commands.length > 0
+            ? 1.0
+            : commands.length > 0
+                ? Math.max(...uniqueCommands.map(c => commands.filter(x => x === c).length)) / commands.length
+                : 0;
+
+        const entry = {
+            timestamp: new Date().toISOString(),
+            decision_id: this.decisionCount,
+            proposals: allProposals.map(p => ({
+                agent_id: p.agentId,
+                model: p.modelName,
+                status: p.status,
+                command: p.command || null,
+                pre_text: p.preCommandText ? p.preCommandText.slice(0, 100) : '',
+                score: p.score ?? null,
+                latency_ms: p.latencyMs,
+                error: p.error || null
+            })),
+            winner: winner ? {
+                agent_id: winner.agentId,
+                command: winner.command,
+                score: winner.score,
+                reason: winner.winReason || 'highest_score'
+            } : null,
+            majority_command: this._findMajority(commands),
+            panel_agreement: Math.round(agreement * 100) / 100
+        };
+
+        await this._ready;
+        let log = await this._readLog();
+        log.push(entry);
+
+        if (log.length > MAX_ENTRIES) {
+            log = log.slice(log.length - TRIM_TO);
+        }
+
+        try {
+            await writeFile(this.filePath, JSON.stringify(log, null, 2));
+        } catch (err) {
+            console.error(`[Ensemble] Failed to write log: ${err.message}`);


logDecision() awaits this._ready but doesn't wrap the overall method in a try/catch. If directory creation fails, await this._ready will reject and propagate to the caller. Since logging is typically best-effort, consider catching _ready/file I/O errors so logDecision() never rejects.

Suggested change

const successful = allProposals.filter(p => p.status === 'success');

const commands = successful.map(p => p.command).filter(Boolean);

const uniqueCommands = [...new Set(commands)];

const agreement = uniqueCommands.length <= 1 && commands.length > 0

? 1.0

: commands.length > 0

? Math.max(...uniqueCommands.map(c => commands.filter(x => x === c).length)) / commands.length

: 0;

const entry = {

timestamp: new Date().toISOString(),

decision_id: this.decisionCount,

proposals: allProposals.map(p => ({

agent_id: p.agentId,

model: p.modelName,

status: p.status,

command: p.command || null,

pre_text: p.preCommandText ? p.preCommandText.slice(0, 100) : '',

score: p.score ?? null,

latency_ms: p.latencyMs,

error: p.error || null

})),

winner: winner ? {

agent_id: winner.agentId,

command: winner.command,

score: winner.score,

reason: winner.winReason || 'highest_score'

} : null,

majority_command: this._findMajority(commands),

panel_agreement: Math.round(agreement * 100) / 100

};

await this._ready;

let log = await this._readLog();

log.push(entry);

if (log.length > MAX_ENTRIES) {

log = log.slice(log.length - TRIM_TO);

}

try {

await writeFile(this.filePath, JSON.stringify(log, null, 2));

} catch (err) {

console.error(`[Ensemble] Failed to write log: ${err.message}`);

try {

const successful = allProposals.filter(p => p.status === 'success');

const commands = successful.map(p => p.command).filter(Boolean);

const uniqueCommands = [...new Set(commands)];

const agreement = uniqueCommands.length <= 1 && commands.length > 0

? 1.0

: commands.length > 0

? Math.max(...uniqueCommands.map(c => commands.filter(x => x === c).length)) / commands.length

: 0;

const entry = {

timestamp: new Date().toISOString(),

decision_id: this.decisionCount,

proposals: allProposals.map(p => ({

agent_id: p.agentId,

model: p.modelName,

status: p.status,

command: p.command || null,

pre_text: p.preCommandText ? p.preCommandText.slice(0, 100) : '',

score: p.score ?? null,

latency_ms: p.latencyMs,

error: p.error || null

})),

winner: winner ? {

agent_id: winner.agentId,

command: winner.command,

score: winner.score,

reason: winner.winReason || 'highest_score'

} : null,

majority_command: this._findMajority(commands),

panel_agreement: Math.round(agreement * 100) / 100

};

await this._ready;

let log = await this._readLog();

log.push(entry);

if (log.length > MAX_ENTRIES) {

log = log.slice(log.length - TRIM_TO);

}

try {

await writeFile(this.filePath, JSON.stringify(log, null, 2));

} catch (err) {

console.error(`[Ensemble] Failed to write log: ${err.message}`);

}

} catch (err) {

console.error(`[Ensemble] Failed to log decision: ${err.message}`);

Copilot · 2026-03-04T01:05:53Z

src/ensemble/judge.js

+        try {
+            const result = await Promise.race([
+                model.sendRequest(judgeTurns, judgeSystem),
+                new Promise((_, reject) =>
+                    setTimeout(() => reject(new Error('judge timeout')), this.timeoutMs)
+                )
+            ]);


The judge timeout uses setTimeout(...) inside Promise.race, but the timer isn't cleared when the model responds first. This keeps extra timers alive and can accumulate under load. Consider keeping the timer handle and clearTimeout() after the race settles (similar to Panel._queryMember).

Copilot AI review requested due to automatic review settings March 4, 2026 01:01

Copilot started reviewing on behalf of Z0mb13V1 March 4, 2026 01:01 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ensemble decision pipeline — panel, arbiter, judge, ChromaDB feedback#721

feat: ensemble decision pipeline — panel, arbiter, judge, ChromaDB feedback#721
Z0mb13V1 wants to merge 1 commit intomindcraft-bots:developfrom
Z0mb13V1:feat/pr2-ensemble-pipeline

Z0mb13V1 commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	console.log(`[Ensemble] Low confidence (margin=${this.arbiter._lastConfidence.toFixed(3)}), consulting LLM judge...`);
	console.log('[Ensemble] Low confidence detected, consulting LLM judge...');

	* @property {number\|null} score - Set by Arbiter
	* @property {number\|null} score - Set by Arbiter
	* @property {string\|null} usage - Last recorded usage information for this model, or null

Conversation

Z0mb13V1 commented Mar 4, 2026

Ensemble Decision Pipeline

Architecture

New Files

Why This Matters

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants