Skip to content

Releases: FonaTech/Clouds-Coder

v2026.04.06

05 Apr 16:28
6f58945

Choose a tag to compare

CHANGELOG 2026-04-06

Bug fixes and minor strategy tweaks.

Clouds Coder 2026.04.05-Stable

04 Apr 16:51
7a2968d

Choose a tag to compare

CHANGELOG 2026-04-05

Bug fixes and minor strategy tweaks.

Clouds Coder 2026.04.02-Stable

02 Apr 17:51
ef150b3

Choose a tag to compare

CHANGELOG 2026-04-02

Bug fixes and minor strategy tweaks.
Fix PyPI packaging path issues

Clouds Coder 2026.04.01-Stable

31 Mar 17:11
5568629

Choose a tag to compare

CHANGELOG 2026-04-01

Bug fixes and minor strategy tweaks.

Clouds Coder 2026.03.31-Stable

30 Mar 20:10
874cbc4

Choose a tag to compare

CHANGELOG 2026-03-31

UX polish + richer file preview + dependency alignment

This update focuses on usability and source-install completeness: richer document/table/media preview coverage, timeout consistency improvements, and a refreshed requirements.txt that matches the current file-loading pipeline.


English

Headline: Better preview coverage, smoother UX, and source-install dependency sync

1. Richer file preview coverage

  • Expanded the source-install dependency set so local environments can load and preview uploaded files more reliably.
  • The current preview / parsing stack now aligns with the runtime behavior for:
    • PDF: pdfminer.six + PyMuPDF
    • CSV / TSV: built-in csv parsing, plus pandas workflows used by analysis skills
    • Excel: openpyxl (.xlsx/.xlsm) and xlrd (.xls)
    • Word: python-docx
    • PowerPoint: python-pptx
    • Image normalization / asset handling: Pillow
  • Browser-side preview coverage continues to include image, audio, and video files directly, while unsupported or partially supported formats still fall back to parsed markdown/text previews instead of failing hard.

2. User-experience improvements

  • Preview documentation now explicitly reflects support for PDF, Office files, tables, code, HTML/Markdown, and media.
  • Source-install instructions now clearly explain that pip install -r requirements.txt is the recommended path when users need full upload parsing and rich file preview behavior.
  • The model wait / wake timeout path was synchronized with the global runtime timeout so long model wake-ups no longer stop early on a stale fixed 35s / 40s limit.

3. Documentation / packaging alignment

  • Added a root requirements.txt to match the current runtime capabilities exposed by Clouds_Coder.py.
  • Updated the top-level README to point to this 2026-03-31 changelog as the latest architecture / UX update.
  • Reworded the dependency positioning in README: PyPI install remains the lightweight base runtime, while source install enables the fuller preview/parser stack.

2026-03-31 Summary

  • Added an explicit source-install requirements.txt
  • Synced README install guidance with the real preview/parser dependency set
  • Documented broader file preview support and recent UX improvements
  • Recorded global-timeout synchronization for the model wait chain

中文

标题:文件预览增强、交互体验优化、源码安装依赖补齐

1. 文件预览能力增强

  • 补齐了源码安装所需依赖,使本地环境在上传/载入文件时更容易获得完整预览能力。
  • 当前预览/解析链路已与运行时能力对齐,覆盖:
    • PDF:pdfminer.six + PyMuPDF
    • CSV / TSV:内置 csv 解析,同时兼容分析链路中的 pandas
    • Excel:openpyxl.xlsx/.xlsm)与 xlrd.xls
    • Word:python-docx
    • PowerPoint:python-pptx
    • 图片归一化 / 资源处理:Pillow
  • 浏览器端仍可直接预览图片、音频、视频;对不完全支持的格式,后端继续保留 markdown / 文本 fallback,而不是直接失败。

2. 用户体验优化

  • README 中的预览说明已明确写出 PDF、Office、表格、代码、HTML/Markdown、媒体文件等覆盖范围。
  • 源码安装说明现在明确建议:如果需要完整的上传解析与富预览能力,应使用 pip install -r requirements.txt
  • 模型 wait / wake 链路的超时已同步到全局 runtime timeout,不再被旧的 35s / 40s 硬编码提前截断。

3. 文档与依赖对齐

  • 新增根目录 requirements.txt,与 Clouds_Coder.py 当前实际能力保持一致。
  • 更新顶层 README,把最新变更入口切换到本次 2026-03-31 changelog。
  • README 中对依赖策略的表述已调整:PyPI 安装仍是轻量基础运行时,源码安装则启用更完整的文件预览/解析依赖栈。

2026-03-31 更新摘要

  • 新增源码安装用 requirements.txt
  • README 安装说明与当前预览/解析依赖完成对齐
  • 补充了更多文件预览与 UX 优化说明
  • 记录模型 wait 链路与全局超时的同步修复

日本語

見出し:ファイルプレビュー強化、UX 改善、ソース導入依存の整合

1. ファイルプレビュー対応の強化

  • ソース導入時の依存関係を補強し、アップロードファイルの読み込みとプレビューがより安定しました。
  • 現在のプレビュー / 解析スタックは以下の形式に対応します。
    • PDF: pdfminer.six + PyMuPDF
    • CSV / TSV: 組み込み csv、分析系では pandas
    • Excel: openpyxl.xlsx/.xlsm)と xlrd.xls
    • Word: python-docx
    • PowerPoint: python-pptx
    • 画像アセット処理: Pillow
  • 画像・音声・動画は引き続きブラウザ側で直接プレビューでき、未対応または部分対応フォーマットは markdown / text fallback に自動で退避します。

2. UX 改善

  • README のプレビュー説明を更新し、PDF・Office・表形式・コード・HTML/Markdown・メディア対応を明記しました。
  • フル機能のファイル解析とリッチプレビューが必要な場合、pip install -r requirements.txt を使うべきことをソース導入手順に明記しました。
  • モデルの wait / wake タイムアウトはグローバル runtime timeout に同期され、古い固定 35s / 40s 制限で途中終了しなくなりました。

3. ドキュメント / 依存の整合

  • ルートに requirements.txt を追加し、Clouds_Coder.py の現行機能と一致させました。
  • トップ README の最新更新リンクを本 2026-03-31 changelog に更新しました。
  • README の依存方針も更新し、PyPI 導入は軽量ベース、ソース導入はより完全なプレビュー / 解析スタックを有効化する形に整理しました。

2026-03-31 要約

  • ソース導入向け requirements.txt を追加
  • README の導入説明を実際の依存関係に合わせて更新
  • より広いファイルプレビュー対応と UX 改善を記録
  • モデル wait チェーンのグローバルタイムアウト同期を記録

Clouds Coder 2026.03.30-Stable

29 Mar 07:37
8c16a1b

Choose a tag to compare

CHANGELOG 2026-03-30

Based on 0325 Version. Bugs Fixes and Some Strategy Adjustments.


English

Headline: Universal Skills Ecosystem + Dual RAG Architecture + Core Reliability Fixes

1. Universal Skills Ecosystem Compatibility (Critical)

5-ecosystem compatibility: Clouds Coder now loads and executes skills from any of the five major skill ecosystems without any per-provider adapters:

  • awesome-claude-skills
  • Minimax-skills
  • skills-main
  • kimi-agent-internals
  • academic-pptx-skill-main

Root cause of previous failures: The Execution Guide injection (lines 11094–11131) forced read_file calls on virtual skill paths that don't exist in the filesystem, causing the model to loop indefinitely trying to read non-existent files instead of following the skill's SKILL.md instructions.

Fixes and simplifications:

  • Removed Execution Guide injection entirely — the model now follows SKILL.md instructions directly without interference
  • Removed Chain Tracking system (7 methods including _skill_chain_completion_blocker, _record_skill_chain_entry) — eliminated over-engineered interception that silently blocked skill execution
  • Simplified _broadcast_loaded_skill blackboard writes from 16 fields → 6 fields (name, path, description, loaded_at, trigger_context, source)
  • Simplified _loaded_skills_prompt_hint from ~350 tokens → ~120 tokens: compact in-context hint that tells the model which skills are loaded without cluttering its context budget
  • LLM-driven autonomous discovery: discovery prompt simplified (max 3 skills per scan, 30 catalog entries, max_tokens=120) so the model makes its own judgment about what skill fits the task type — no keyword-based forced triggers
  • Multi-skill loading with conflict detection: multiple skills can be loaded simultaneously; loading a skill that directly conflicts with an already-loaded skill is blocked
  • Sync-mode Manager TodoWrite capability: Manager in sync mode now has access to TodoWrite for coordinating plan steps with skill execution

New: _preload_skills_from_plan_steps — scans plan step text for skill name mentions and proactively preloads them before execution begins, reducing skill-load latency mid-execution.

Shell path auto-quoting (_rewrite_shell_virtual_paths): paths containing spaces are now automatically wrapped in double quotes before shell dispatch, fixing execution failures on macOS paths with spaces.

Plan expansion:

  • Plan steps limit raised from 10 → 20 steps
  • Per-step character limit raised from 400 → 600 characters
  • Anti-hallucination constraint added to plan synthesis: "Only reference scripts and files that ACTUALLY EXIST in the session filesystem"

2. Dual RAG Architecture — Code RAG + Data RAG (High)

Architecture: Two independent ingestion and retrieval engines, both built on TF_G_IDF_RAG:

  • RAGIngestionService (Data RAG): handles documents, PDFs, structured data files — general knowledge base
  • CodeIngestionService (Code RAG): handles source code files with code-aware tokenization — code-specific knowledge base

Unified retrieval: query_knowledge_library(query, top_k) searches both libraries in parallel and returns a merged ranked result, so the model always queries one interface regardless of content type.

RAG guide injection: Both research-orchestrator-pro and scientific-reasoning-lab now include a full retrieval guide documenting the query_knowledge_library interface, parameter meanings, response format, and best-practice query patterns. The model can leverage the knowledge base directly from within a loaded skill.

3. Built-in Skills Overhaul: research-orchestrator-pro & scientific-reasoning-lab (High)

research-orchestrator-pro rewritten as cooperative decision hub:

  • Previous version: conflicted with output skills (e.g., ppt) by trying to run its own analysis workflow in parallel, causing the model to synthesize hallucinated scripts
  • New design: acts as an analysis decision hub that focuses exclusively on evidence synthesis and task structuring; when loaded alongside an output skill (e.g., ppt, report-writer), it defers all output formatting to that skill
  • Includes full RAG retrieval guide for background knowledge augmentation
  • Anti-hallucination posture: "Do NOT generate file/script names that don't exist"

scientific-reasoning-lab rewritten as 5-step self-iterating reasoning engine:

  • Phase 1: Problem decomposition — defines variables, constraints, assumptions
  • Phase 2: Formal reasoning chain — step-by-step derivation with mathematical rigor (now embedded as sub-engine of research-orchestrator-pro Phase 2)
  • Phase 3: Self-verification — checks logical consistency, unit/dimension coherence, numerical ranges
  • Phase 4: Critical evaluation — identifies gaps, uncertainty bounds, and edge cases
  • Phase 5: Integration — synthesizes findings into structured conclusions with explicit confidence levels
  • Includes full RAG retrieval guide for referencing prior knowledge during reasoning

4. Multi-Factor Priority Context Compression (High)

Problem: Previous _auto_compact discarded messages chronologically (oldest first), which could drop task-critical information (current plan step, recent errors, active skills) while retaining low-value content from early in a session.

New _classify_message_priority — 10-factor scoring (0–10):

  • Recency: 0–3 points (newest messages score highest)
  • Role weight: system=3, user=2, assistant=2, tool=1
  • Task progress markers (TodoWrite, plan_step, finish_task): +2
  • Error / critical information (Error:, exception traces): +2
  • Current goal relevance: +1
  • Skill-related content (<loaded-skill>, skill loaded): +1
  • compact-resume note: forced to 10 (always preserved)

New _priority_compress_messages — priority-based three-tier compression:

  • High priority (score ≥ 7): kept intact
  • Medium priority (score 4–6): content truncated to 500-character summary
  • Low priority (score 0–3): collapsed to a one-liner or dropped if over token budget
  • Output is re-sorted by original message index to maintain conversation order

_build_state_handoff enhanced with four new structured fields:

  • PLAN_PROGRESS: completed/total plan steps
  • CURRENT_STEP: text of the active plan step
  • ACTIVE_SKILLS: list of currently loaded skills
  • RECENT_TOOLS: summary of last 5 tool calls

_auto_compact integration: priority compression runs first; original chronological pop(0) is preserved as a safety fallback if priority compression doesn't reduce tokens far enough.

5. Anti-stall Mechanism Optimization (Medium)

Problem: _manager_apply_anti_stall triggered "CHANGE YOUR APPROACH" after only 2 consecutive delegations to the same target, interrupting agents that were legitimately making incremental progress across multiple turns.

Changes:

  • Threshold raised from 2 → 3 consecutive same-target delegations before triggering
  • Instruction softened from the blunt "CHANGE YOUR APPROACH" to a collaborative guidance message:

    "You have been working on this for multiple rounds without visible progress. Consider: 1) Use ask_colleague to request help from another agent. 2) Try a completely different tool or approach. 3) If the subtask is complete, call finish_current_task with what you have so far."

6. Critical Bug Fixes (High)

Fix 1 — CodeIngestionService._flush_lock (AttributeError)

  • Symptom: AttributeError: 'CodeIngestionService' object has no attribute '_flush_lock' when uploading code files to the Code Library
  • Root cause: CodeIngestionService.__init__ completely overrides parent RAGIngestionService.__init__ without calling super().__init__(), so _flush_lock = threading.Lock() (initialized in the parent) was never created
  • Fix: added self._flush_lock = threading.Lock() at the end of CodeIngestionService.__init__

Fix 2 — Frontend setTaskLevel() complexity selector resets

  • Symptom: selecting a task complexity level (e.g., L4) works momentarily then reverts to "Auto" after the next message
  • Root cause: setTaskLevel() called updateLevelBtn(lvl) to update the UI button but never called scheduleSnapshot(), so the next SSE snapshot refresh overwrote the button state with stale server data
  • Fix: added scheduleSnapshot({forceFull:false, delayMs:80, allowWhenFrozen:true}) after updateLevelBtn(lvl), matching the pattern already used by applyModel()

Fix 3 — _sync_todos_from_blackboard drops worker TodoWrite items

  • Symptom: in sync mode, items written by TodoWrite from developer/explorer/reviewer agents only persist for one round, then disappear on the next blackboard sync
  • Root cause: items with owner ∈ {developer, explorer, reviewer} were being filtered out from non_system_rows (as non-system items) but were not included in system_rows either — so they were silently lost on every sync cycle
  • Fix: worker items are now collected into a separate worker_rows list and merged with priority (placed between system rows and non-system rows), protected from sync overwrites

Fix 4 — Anti-stall threshold and instruction (see item 5 above)

Fix 5 — Multi-factor context compression (see item 4 above)

2026-03-25 Summary

  • Skills ecosystem now compatible with all 5 major skill providers; Execution Guide and Chain Tracking removed to let the model follow skills naturally
  • Dual RAG architecture (Code RAG + Data RAG) with unified query_knowledge_library retrieval interface and injected retrieval guides in built-in skills
  • research-orchestrator-pro redesigned as a non-interfering analysis hub; scientific-reasoning-lab rebuilt as a 5-phase self-iterating reasoning engine
  • Context compression upgraded from chronological-only to 10-factor priority scoring, pr...
Read more

Clouds Coder 2026.03.27-Stable

26 Mar 17:36
6ee86f5

Choose a tag to compare

CHANGELOG 2026-03-27

Bugs Fixes and Some Strategy Adjustments.


CHANGELOG 2026-03-25


English

Headline: Universal Skills Ecosystem + Dual RAG Architecture + Core Reliability Fixes

1. Universal Skills Ecosystem Compatibility (Critical)

5-ecosystem compatibility: Clouds Coder now loads and executes skills from any of the five major skill ecosystems without any per-provider adapters:

  • awesome-claude-skills
  • Minimax-skills
  • skills-main
  • kimi-agent-internals
  • academic-pptx-skill-main

Root cause of previous failures: The Execution Guide injection (lines 11094–11131) forced read_file calls on virtual skill paths that don't exist in the filesystem, causing the model to loop indefinitely trying to read non-existent files instead of following the skill's SKILL.md instructions.

Fixes and simplifications:

  • Removed Execution Guide injection entirely — the model now follows SKILL.md instructions directly without interference
  • Removed Chain Tracking system (7 methods including _skill_chain_completion_blocker, _record_skill_chain_entry) — eliminated over-engineered interception that silently blocked skill execution
  • Simplified _broadcast_loaded_skill blackboard writes from 16 fields → 6 fields (name, path, description, loaded_at, trigger_context, source)
  • Simplified _loaded_skills_prompt_hint from ~350 tokens → ~120 tokens: compact in-context hint that tells the model which skills are loaded without cluttering its context budget
  • LLM-driven autonomous discovery: discovery prompt simplified (max 3 skills per scan, 30 catalog entries, max_tokens=120) so the model makes its own judgment about what skill fits the task type — no keyword-based forced triggers
  • Multi-skill loading with conflict detection: multiple skills can be loaded simultaneously; loading a skill that directly conflicts with an already-loaded skill is blocked
  • Sync-mode Manager TodoWrite capability: Manager in sync mode now has access to TodoWrite for coordinating plan steps with skill execution

New: _preload_skills_from_plan_steps — scans plan step text for skill name mentions and proactively preloads them before execution begins, reducing skill-load latency mid-execution.

Shell path auto-quoting (_rewrite_shell_virtual_paths): paths containing spaces are now automatically wrapped in double quotes before shell dispatch, fixing execution failures on macOS paths with spaces.

Plan expansion:

  • Plan steps limit raised from 10 → 20 steps
  • Per-step character limit raised from 400 → 600 characters
  • Anti-hallucination constraint added to plan synthesis: "Only reference scripts and files that ACTUALLY EXIST in the session filesystem"

2. Dual RAG Architecture — Code RAG + Data RAG (High)

Architecture: Two independent ingestion and retrieval engines, both built on TF_G_IDF_RAG:

  • RAGIngestionService (Data RAG): handles documents, PDFs, structured data files — general knowledge base
  • CodeIngestionService (Code RAG): handles source code files with code-aware tokenization — code-specific knowledge base

Unified retrieval: query_knowledge_library(query, top_k) searches both libraries in parallel and returns a merged ranked result, so the model always queries one interface regardless of content type.

RAG guide injection: Both research-orchestrator-pro and scientific-reasoning-lab now include a full retrieval guide documenting the query_knowledge_library interface, parameter meanings, response format, and best-practice query patterns. The model can leverage the knowledge base directly from within a loaded skill.

3. Built-in Skills Overhaul: research-orchestrator-pro & scientific-reasoning-lab (High)

research-orchestrator-pro rewritten as cooperative decision hub:

  • Previous version: conflicted with output skills (e.g., ppt) by trying to run its own analysis workflow in parallel, causing the model to synthesize hallucinated scripts
  • New design: acts as an analysis decision hub that focuses exclusively on evidence synthesis and task structuring; when loaded alongside an output skill (e.g., ppt, report-writer), it defers all output formatting to that skill
  • Includes full RAG retrieval guide for background knowledge augmentation
  • Anti-hallucination posture: "Do NOT generate file/script names that don't exist"

scientific-reasoning-lab rewritten as 5-step self-iterating reasoning engine:

  • Phase 1: Problem decomposition — defines variables, constraints, assumptions
  • Phase 2: Formal reasoning chain — step-by-step derivation with mathematical rigor (now embedded as sub-engine of research-orchestrator-pro Phase 2)
  • Phase 3: Self-verification — checks logical consistency, unit/dimension coherence, numerical ranges
  • Phase 4: Critical evaluation — identifies gaps, uncertainty bounds, and edge cases
  • Phase 5: Integration — synthesizes findings into structured conclusions with explicit confidence levels
  • Includes full RAG retrieval guide for referencing prior knowledge during reasoning

4. Multi-Factor Priority Context Compression (High)

Problem: Previous _auto_compact discarded messages chronologically (oldest first), which could drop task-critical information (current plan step, recent errors, active skills) while retaining low-value content from early in a session.

New _classify_message_priority — 10-factor scoring (0–10):

  • Recency: 0–3 points (newest messages score highest)
  • Role weight: system=3, user=2, assistant=2, tool=1
  • Task progress markers (TodoWrite, plan_step, finish_task): +2
  • Error / critical information (Error:, exception traces): +2
  • Current goal relevance: +1
  • Skill-related content (<loaded-skill>, skill loaded): +1
  • compact-resume note: forced to 10 (always preserved)

New _priority_compress_messages — priority-based three-tier compression:

  • High priority (score ≥ 7): kept intact
  • Medium priority (score 4–6): content truncated to 500-character summary
  • Low priority (score 0–3): collapsed to a one-liner or dropped if over token budget
  • Output is re-sorted by original message index to maintain conversation order

_build_state_handoff enhanced with four new structured fields:

  • PLAN_PROGRESS: completed/total plan steps
  • CURRENT_STEP: text of the active plan step
  • ACTIVE_SKILLS: list of currently loaded skills
  • RECENT_TOOLS: summary of last 5 tool calls

_auto_compact integration: priority compression runs first; original chronological pop(0) is preserved as a safety fallback if priority compression doesn't reduce tokens far enough.

5. Anti-stall Mechanism Optimization (Medium)

Problem: _manager_apply_anti_stall triggered "CHANGE YOUR APPROACH" after only 2 consecutive delegations to the same target, interrupting agents that were legitimately making incremental progress across multiple turns.

Changes:

  • Threshold raised from 2 → 3 consecutive same-target delegations before triggering
  • Instruction softened from the blunt "CHANGE YOUR APPROACH" to a collaborative guidance message:

    "You have been working on this for multiple rounds without visible progress. Consider: 1) Use ask_colleague to request help from another agent. 2) Try a completely different tool or approach. 3) If the subtask is complete, call finish_current_task with what you have so far."

6. Critical Bug Fixes (High)

Fix 1 — CodeIngestionService._flush_lock (AttributeError)

  • Symptom: AttributeError: 'CodeIngestionService' object has no attribute '_flush_lock' when uploading code files to the Code Library
  • Root cause: CodeIngestionService.__init__ completely overrides parent RAGIngestionService.__init__ without calling super().__init__(), so _flush_lock = threading.Lock() (initialized in the parent) was never created
  • Fix: added self._flush_lock = threading.Lock() at the end of CodeIngestionService.__init__

Fix 2 — Frontend setTaskLevel() complexity selector resets

  • Symptom: selecting a task complexity level (e.g., L4) works momentarily then reverts to "Auto" after the next message
  • Root cause: setTaskLevel() called updateLevelBtn(lvl) to update the UI button but never called scheduleSnapshot(), so the next SSE snapshot refresh overwrote the button state with stale server data
  • Fix: added scheduleSnapshot({forceFull:false, delayMs:80, allowWhenFrozen:true}) after updateLevelBtn(lvl), matching the pattern already used by applyModel()

Fix 3 — _sync_todos_from_blackboard drops worker TodoWrite items

  • Symptom: in sync mode, items written by TodoWrite from developer/explorer/reviewer agents only persist for one round, then disappear on the next blackboard sync
  • Root cause: items with owner ∈ {developer, explorer, reviewer} were being filtered out from non_system_rows (as non-system items) but were not included in system_rows either — so they were silently lost on every sync cycle
  • Fix: worker items are now collected into a separate worker_rows list and merged with priority (placed between system rows and non-system rows), protected from sync overwrites

Fix 4 — Anti-stall threshold and instruction (see item 5 above)

Fix 5 — Multi-factor context compression (see item 4 above)

2026-03-25 Summary

  • Skills ecosystem now compatible with all 5 major skill providers; Execution Guide and Chain Tracking removed to let the model follow skills naturally
  • Dual RAG architecture (Code RAG + Data RAG) with unified query_knowledge_library retrieval interface and injected retrieval guides in built-in skills
  • research-orchestrator-pro redesigned as a non-interfering analysis hub; scientific-reasoning-lab rebuilt as a 5-phase self-iterating reasoning engine
  • Context compression upgraded from chronological-only to 10-factor priority s...
Read more

v2026.03.25

25 Mar 18:10
2f23df3

Choose a tag to compare

CHANGELOG 2026-03-25


English

Headline: Universal Skills Ecosystem + Dual RAG Architecture + Core Reliability Fixes

1. Universal Skills Ecosystem Compatibility (Critical)

5-ecosystem compatibility: Clouds Coder now loads and executes skills from any of the five major skill ecosystems without any per-provider adapters:

  • awesome-claude-skills
  • Minimax-skills
  • skills-main
  • kimi-agent-internals
  • academic-pptx-skill-main

Root cause of previous failures: The Execution Guide injection (lines 11094–11131) forced read_file calls on virtual skill paths that don't exist in the filesystem, causing the model to loop indefinitely trying to read non-existent files instead of following the skill's SKILL.md instructions.

Fixes and simplifications:

  • Removed Execution Guide injection entirely — the model now follows SKILL.md instructions directly without interference
  • Removed Chain Tracking system (7 methods including _skill_chain_completion_blocker, _record_skill_chain_entry) — eliminated over-engineered interception that silently blocked skill execution
  • Simplified _broadcast_loaded_skill blackboard writes from 16 fields → 6 fields (name, path, description, loaded_at, trigger_context, source)
  • Simplified _loaded_skills_prompt_hint from ~350 tokens → ~120 tokens: compact in-context hint that tells the model which skills are loaded without cluttering its context budget
  • LLM-driven autonomous discovery: discovery prompt simplified (max 3 skills per scan, 30 catalog entries, max_tokens=120) so the model makes its own judgment about what skill fits the task type — no keyword-based forced triggers
  • Multi-skill loading with conflict detection: multiple skills can be loaded simultaneously; loading a skill that directly conflicts with an already-loaded skill is blocked
  • Sync-mode Manager TodoWrite capability: Manager in sync mode now has access to TodoWrite for coordinating plan steps with skill execution

New: _preload_skills_from_plan_steps — scans plan step text for skill name mentions and proactively preloads them before execution begins, reducing skill-load latency mid-execution.

Shell path auto-quoting (_rewrite_shell_virtual_paths): paths containing spaces are now automatically wrapped in double quotes before shell dispatch, fixing execution failures on macOS paths with spaces.

Plan expansion:

  • Plan steps limit raised from 10 → 20 steps
  • Per-step character limit raised from 400 → 600 characters
  • Anti-hallucination constraint added to plan synthesis: "Only reference scripts and files that ACTUALLY EXIST in the session filesystem"

2. Dual RAG Architecture — Code RAG + Data RAG (High)

Architecture: Two independent ingestion and retrieval engines, both built on TF_G_IDF_RAG:

  • RAGIngestionService (Data RAG): handles documents, PDFs, structured data files — general knowledge base
  • CodeIngestionService (Code RAG): handles source code files with code-aware tokenization — code-specific knowledge base

Unified retrieval: query_knowledge_library(query, top_k) searches both libraries in parallel and returns a merged ranked result, so the model always queries one interface regardless of content type.

RAG guide injection: Both research-orchestrator-pro and scientific-reasoning-lab now include a full retrieval guide documenting the query_knowledge_library interface, parameter meanings, response format, and best-practice query patterns. The model can leverage the knowledge base directly from within a loaded skill.

3. Built-in Skills Overhaul: research-orchestrator-pro & scientific-reasoning-lab (High)

research-orchestrator-pro rewritten as cooperative decision hub:

  • Previous version: conflicted with output skills (e.g., ppt) by trying to run its own analysis workflow in parallel, causing the model to synthesize hallucinated scripts
  • New design: acts as an analysis decision hub that focuses exclusively on evidence synthesis and task structuring; when loaded alongside an output skill (e.g., ppt, report-writer), it defers all output formatting to that skill
  • Includes full RAG retrieval guide for background knowledge augmentation
  • Anti-hallucination posture: "Do NOT generate file/script names that don't exist"

scientific-reasoning-lab rewritten as 5-step self-iterating reasoning engine:

  • Phase 1: Problem decomposition — defines variables, constraints, assumptions
  • Phase 2: Formal reasoning chain — step-by-step derivation with mathematical rigor (now embedded as sub-engine of research-orchestrator-pro Phase 2)
  • Phase 3: Self-verification — checks logical consistency, unit/dimension coherence, numerical ranges
  • Phase 4: Critical evaluation — identifies gaps, uncertainty bounds, and edge cases
  • Phase 5: Integration — synthesizes findings into structured conclusions with explicit confidence levels
  • Includes full RAG retrieval guide for referencing prior knowledge during reasoning

4. Multi-Factor Priority Context Compression (High)

Problem: Previous _auto_compact discarded messages chronologically (oldest first), which could drop task-critical information (current plan step, recent errors, active skills) while retaining low-value content from early in a session.

New _classify_message_priority — 10-factor scoring (0–10):

  • Recency: 0–3 points (newest messages score highest)
  • Role weight: system=3, user=2, assistant=2, tool=1
  • Task progress markers (TodoWrite, plan_step, finish_task): +2
  • Error / critical information (Error:, exception traces): +2
  • Current goal relevance: +1
  • Skill-related content (<loaded-skill>, skill loaded): +1
  • compact-resume note: forced to 10 (always preserved)

New _priority_compress_messages — priority-based three-tier compression:

  • High priority (score ≥ 7): kept intact
  • Medium priority (score 4–6): content truncated to 500-character summary
  • Low priority (score 0–3): collapsed to a one-liner or dropped if over token budget
  • Output is re-sorted by original message index to maintain conversation order

_build_state_handoff enhanced with four new structured fields:

  • PLAN_PROGRESS: completed/total plan steps
  • CURRENT_STEP: text of the active plan step
  • ACTIVE_SKILLS: list of currently loaded skills
  • RECENT_TOOLS: summary of last 5 tool calls

_auto_compact integration: priority compression runs first; original chronological pop(0) is preserved as a safety fallback if priority compression doesn't reduce tokens far enough.

5. Anti-stall Mechanism Optimization (Medium)

Problem: _manager_apply_anti_stall triggered "CHANGE YOUR APPROACH" after only 2 consecutive delegations to the same target, interrupting agents that were legitimately making incremental progress across multiple turns.

Changes:

  • Threshold raised from 2 → 3 consecutive same-target delegations before triggering
  • Instruction softened from the blunt "CHANGE YOUR APPROACH" to a collaborative guidance message:

    "You have been working on this for multiple rounds without visible progress. Consider: 1) Use ask_colleague to request help from another agent. 2) Try a completely different tool or approach. 3) If the subtask is complete, call finish_current_task with what you have so far."

6. Critical Bug Fixes (High)

Fix 1 — CodeIngestionService._flush_lock (AttributeError)

  • Symptom: AttributeError: 'CodeIngestionService' object has no attribute '_flush_lock' when uploading code files to the Code Library
  • Root cause: CodeIngestionService.__init__ completely overrides parent RAGIngestionService.__init__ without calling super().__init__(), so _flush_lock = threading.Lock() (initialized in the parent) was never created
  • Fix: added self._flush_lock = threading.Lock() at the end of CodeIngestionService.__init__

Fix 2 — Frontend setTaskLevel() complexity selector resets

  • Symptom: selecting a task complexity level (e.g., L4) works momentarily then reverts to "Auto" after the next message
  • Root cause: setTaskLevel() called updateLevelBtn(lvl) to update the UI button but never called scheduleSnapshot(), so the next SSE snapshot refresh overwrote the button state with stale server data
  • Fix: added scheduleSnapshot({forceFull:false, delayMs:80, allowWhenFrozen:true}) after updateLevelBtn(lvl), matching the pattern already used by applyModel()

Fix 3 — _sync_todos_from_blackboard drops worker TodoWrite items

  • Symptom: in sync mode, items written by TodoWrite from developer/explorer/reviewer agents only persist for one round, then disappear on the next blackboard sync
  • Root cause: items with owner ∈ {developer, explorer, reviewer} were being filtered out from non_system_rows (as non-system items) but were not included in system_rows either — so they were silently lost on every sync cycle
  • Fix: worker items are now collected into a separate worker_rows list and merged with priority (placed between system rows and non-system rows), protected from sync overwrites

Fix 4 — Anti-stall threshold and instruction (see item 5 above)

Fix 5 — Multi-factor context compression (see item 4 above)

2026-03-25 Summary

  • Skills ecosystem now compatible with all 5 major skill providers; Execution Guide and Chain Tracking removed to let the model follow skills naturally
  • Dual RAG architecture (Code RAG + Data RAG) with unified query_knowledge_library retrieval interface and injected retrieval guides in built-in skills
  • research-orchestrator-pro redesigned as a non-interfering analysis hub; scientific-reasoning-lab rebuilt as a 5-phase self-iterating reasoning engine
  • Context compression upgraded from chronological-only to 10-factor priority scoring, preserving task-critical information over session noise
  • Anti-stall ...
Read more

Clouds Coder 2026.03.20-Stable

19 Mar 15:42
5c851f2

Choose a tag to compare

CHANGELOG 2026-03-20


English

Headline: Plan Mode & Core Architecture Overhaul

1. Plan Mode — Unified Architecture (Critical)

UI Toggle: New Plan: Auto/On/Off button in the toolbar. Users can now force plan mode on (even for L1 tasks) or off (skip planning for L5 tasks).

Single + Sync Support: Plan mode now works identically in both single and sync execution modes.

  • Single mode: _single_agent_plan_step_check() auto-advances plan steps based on tool results and phase detection.
  • Sync mode: Manager delegates per-step with advance_plan_step=true.
  • Both modes share the same plan step tracking, todo sync, and progress display.

Plan Step Protection (6-layer defense):

  • _mark_all_done_silently() preserves plan_step todos — arbiter can't batch-complete them.
  • _can_auto_finish_from_approval() blocks finish when plan steps are pending.
  • Arbiter snapshot includes plan progress ("Only 2/7 steps completed — do NOT classify as TASK_COMPLETED").
  • _manager_progress_state() never returns "done" with pending plan steps.
  • _project_todo_hint_for_manager() warns "DO NOT finish until all N steps completed".
  • _manager_fallback_route() redirects to developer instead of finishing when plan steps remain.

Planner Bubble UI: Orange-red themed (#e8533f) chat bubble with full agent badge structure, matching explorer/developer/reviewer styling.

2. Tiered Context Compression + File Buffer (Critical)

Problem: _auto_compact only compressed self.messages, never touching agent_messages (800 msgs), manager_context (400), or per-role contexts (400 each). After compact, agent contexts caused immediate re-wall.

Solution: 4-tier progressive compression system:

  • Tier 0 (>40% left): Normal operation
  • Tier 1 (20-40%): Aggressive microcompact, reduced keep_recent
  • Tier 2 (10-20%): Compact agent contexts, file buffer offload
  • Tier 3 (<10%): Deep compact everything, 600-char content limit

File Buffer: Large content (>2000 chars) offloaded to file_buffer/ directory, replaced with compact references in context. Auto-pruned at 500 files.

Range Extension: ctx_left range expanded from [18000, 100000] to [4000, 1,000,000] — supports both tiny and 1M-token contexts.

State Handoff: _build_state_handoff() ensures goal, progress, active agent, round info, and code artifacts survive compaction losslessly.

3. Universal Error Architecture (High)

Unified errors list with category field replaces compilation-only detection. 6 error categories: test > lint > compilation > build_package > deploy_infra > runtime.

  • _detect_error_category(): Command keyword matching
  • _extract_error_lines(): Pattern-based error line extraction
  • _process_tool_result_errors(): Unified tool result processing (replaces inline detection in both multi-agent and single-agent paths)
  • compilation_errors maintained as backward-compatible view via _sync_compilation_errors_compat()

Reviewer DEBUG METHODOLOGY generalized: covers runtime tracebacks, test assertions, lint violations — not just compiler errors.

4. Reviewer Debug Mode (High)

Problem: When compilation/test errors occurred, system entered infinite loop — Manager routed to Explorer (read-only), Explorer asked Developer, Developer read files but didn't fix.

Solution: Reviewer Debug Mode — when errors detected, reviewer gets write_file/edit_file access:

  • _activate_reviewer_debug_mode(): Triggered when _manager_has_error_log() is true
  • Reviewer system prompt switches to debug methodology with full tool access
  • Manager capability note dynamically reflects debug mode
  • Auto-deactivates when errors resolve or after 6 rounds (falls back to developer)
  • Explorer stall detection: 3 consecutive identical delegations → forced switch to developer

5. Complexity Inheritance & Real-time Input Merge (Medium)

Complexity Inheritance: When user responds to plan proposals or L5 confirmations, system no longer re-classifies and loses the previous complexity level.

  • _is_plan_choice_response(): Detects plan choice responses → skips reclassification
  • _user_mentions_complexity(): Only changes level when user explicitly mentions complexity keywords
  • Previous level inherited when user doesn't mention complexity

Real-time User Input Merge: Live user inputs during execution now trigger _merge_user_feedback_with_plan() — injects plan-aware merge note into manager context so the plan direction can be adjusted mid-flight.

Restart Intent Fusion: _fuse_restart_intent() merges user/plan/context intents on restart with priority: user > plan > context. Pure continuation phrases ("继续", "continue") inherit plan intent fully.

6. Task Phase Independence (Medium)

Problem: Manager was overly procedural — analysis phases used implementation patterns, wasting debug time.

Solution: Phase-aware delegation:

  • _plan_step_phase_hint(): Infers phase (research/design/implement/test/review/deploy) from step content
  • _infer_current_phase_from_blackboard(): Determines current phase from active plan step or blackboard state
  • TASK_PHASE_ROUTING: Maps phases to preferred agents (research→explorer, implement→developer, etc.)
  • Manager system prompt includes PHASE INDEPENDENCE instruction: "Each phase has its own expertise. Do NOT carry over implementation patterns from previous phases."

7. TodoWrite Isolation & Multimodal Support (Medium)

TodoWrite Isolation: When plan_step todos exist, worker TodoWrite items are tagged with owner to prevent overwriting plan steps during _sync_todos_from_blackboard().

Multimodal Native Support: _run_read() now detects image/audio/video files by extension:

  • If model supports the media type → base64 encode and inject as native multimodal input via _pending_media_inputs
  • _recent_multimodal_inputs() merges pending media from read_file with upload media
  • If model doesn't support → returns metadata with guidance to use bash tools

中文

标题:Plan Mode 架构 & 内核全面升级

1. Plan Mode — 统一架构(重要)

UI 开关:工具栏新增 Plan: Auto/On/Off 按钮。用户可强制开启(L1 也走 plan)或关闭(L5 跳过规划)。

Single + Sync 双模式支持:Plan mode 在 single 和 sync 执行模式下行为一致。

  • Single 模式:_single_agent_plan_step_check() 根据工具结果和阶段检测自动推进 plan steps。
  • Sync 模式:Manager 通过 advance_plan_step=true 逐步委派。
  • 两种模式共享相同的 plan step 追踪、todo 同步和进度显示。

Plan Step 保护(6 层防线)

  • _mark_all_done_silently() 保护 plan_step todos — arbiter 无法批量完成。
  • _can_auto_finish_from_approval() 在 plan steps 未完成时阻止 finish。
  • Arbiter snapshot 注入 plan 进度("仅完成 2/7 步 — 不要判定为 TASK_COMPLETED")。
  • _manager_progress_state() 有 pending plan steps 时永不返回 "done"。
  • _project_todo_hint_for_manager() 警告 "所有 N 步完成前不要 finish"。
  • _manager_fallback_route() plan steps 未完成时重定向到 developer 而非 finish。

Planner 气泡 UI:橙红色主题(#e8533f),完整 agent badge 结构,与 explorer/developer/reviewer 风格统一。

2. 分层上下文压缩 + 文件缓冲(重要)

问题_auto_compact 只压缩 self.messages,完全不碰 agent_messages(800条)、manager_context(400条)、per-role contexts(各400条)。compact 后 agent 上下文导致立即再次撞墙。

方案:4 级渐进压缩:

  • Tier 0(>40% 剩余):正常运行
  • Tier 1(20-40%):激进 microcompact
  • Tier 2(10-20%):压缩 agent 上下文,文件缓冲卸载
  • Tier 3(<10%):深度压缩,600 字符内容限制

文件缓冲:大内容(>2000字符)卸载到 file_buffer/ 目录,上下文中只留引用。

范围扩展:ctx_left 范围从 [18000, 100000] 扩展到 [4000, 1,000,000]。

状态衔接_build_state_handoff() 确保目标、进度、活跃 agent、代码产物在 compact 后无损传递。

3. 通用错误架构(高)

统一 errors 列表 + category 字段,替代仅编译错误检测。6 个错误类别:test > lint > compilation > build_package > deploy_infra > runtime

  • _process_tool_result_errors():统一工具结果错误处理
  • compilation_errors 作为向后兼容视图通过 _sync_compilation_errors_compat() 同步
  • Reviewer DEBUG METHODOLOGY 泛化:覆盖运行时 traceback、测试断言、lint 违规

4. Reviewer Debug Mode(高)

问题:编译/测试错误时系统进入死循环 — Manager 路由到 Explorer(只读)→ Explorer 请求 Developer → Developer 只读不修。

方案:Reviewer Debug Mode — 检测到错误时 reviewer 获得写权限:

  • 触发条件:_manager_has_error_log() 为 true
  • Reviewer 系统提示切换为 debug 方法论 + 完整工具访问
  • 错误解决后自动退出,或 6 轮后降级到 developer
  • Explorer 停滞检测:连续 3 次相同委派 → 强制切换到 developer

5. 复杂度继承 & 实时输入合并(中)

复杂度继承:用户回复 plan 方案时不再重新分类丢失复杂度。

  • _is_plan_choice_response():检测 plan 选择回复 → 跳过重分类
  • _user_mentions_complexity():仅在用户明确提到复杂度关键词时改变 level

实时用户输入合并:执行中的 live input 触发 _merge_user_feedback_with_plan() — 向 manager 注入 plan-aware 合并提示。

Restart 意图融合_fuse_restart_intent() 按 user > plan > context 优先级融合意图。

6. 任务阶段独立性(中)

问题:Manager 过度程序化 — 分析阶段使用实现模式,浪费 debug 时间。

方案:阶段感知委派:

  • _plan_step_phase_hint():从步骤内容推断阶段(research/design/implement/test/review/deploy)
  • TASK_PHASE_ROUTING:阶段到 agent 的映射
  • Manager 系统提示包含 PHASE INDEPENDENCE 指令

7. TodoWrite 隔离 & 多模态支持(中)

TodoWrite 隔离:plan_step 存在时,worker TodoWrite 的 items 自动标记 owner,不覆盖 plan steps。

多模态原生支持_run_read() 检测图片/音频/视频文件,模型支持时 base64 编码并作为原生多模态输入注入。


日本語

タイトル:Plan Mode アーキテクチャ & コア全面アップグレード

1. Plan Mode — 統一アーキテクチャ(重大)

UI トグル:ツールバーに Plan: Auto/On/Off ボタンを追加。ユーザーが Plan Mode を強制 ON/OFF 可能。

Single + Sync 両モード対応:Plan Mode が single/sync 両実行モードで同一動作。

  • Single モード:_single_agent_plan_step_check() がツール結果とフェーズ検出に基づき plan step を自動推進。
  • Sync モード:Manager が advance_plan_step=true でステップごとに委任。

Plan Step 保護(6 層防御)

  • _mark_all_done_silently() が plan_step todos を保護 — arbiter による一括完了を防止。
  • _can_auto_finish_from_approval() が未完了 plan steps 存在時に finish をブロック。
  • Arbiter snapshot に plan 進捗を注入。
  • _manager_progress_state() が未完了 plan steps 存在時に "done" を返さない。
  • _manager_fallback_route() が plan steps 未完了時に developer へリダイレクト。

Planner バブル UI:オレンジレッドテーマ(#e8533f)、完全な agent badge 構造。

2. 階層型コンテキスト圧縮 + ファイルバッファ(重大)

4 段階の漸進的圧縮システム(Tier 0-3)。agent_messages/manager_context/contexts を compact 時に同期圧縮。ctx_left 範囲を [4000, 1...

Read more

Clouds Coder 2026.03.17-Stable

15 Mar 17:12
af4f33c

Choose a tag to compare

CHANGELOG 2026-03-17

Trilingual merged release notes: English / 中文 / 日本語.

English

Priority-Ordered Updates

  1. Single-mode agent leak fix (critical)
  • Fixed _manager_apply_task_policy(): when executor_mode_flag=True, the target-not-in-participants branch (L16241-16248) could append extra agents, overriding the Single-mode participants = [assigned_expert] constraint set at L16226-16227.
  • Added a hard post-guard: after all participant/target resolution, if mode == EXECUTION_MODE_SINGLE, force participants = [assigned_expert] and redirect any non-expert target back to the assigned expert.
  • Effect: Single-mode tasks are now guaranteed to use exactly one agent regardless of executor_mode_flag or LLM routing decisions.
  1. Conclusive-reply termination signal fix (critical)
  • Root cause: when an agent (e.g. developer) replied with a conclusive answer like "task complete", the Manager ignored it because (a) the conclusive-reply check in _manager_fallback_route() only ran on the fallback path, not the tool-parsed routing path; (b) _manager_apply_task_policy() had no conclusive-reply detection; (c) blackboard approval.approved was never set by text-based completion signals.
  • Fix 1 — Policy-layer interception: added conclusive-reply detection in _manager_apply_task_policy() before the can_finish_from_approval gate. When any agent's latest text is conclusive, no open todo items remain, and no error log exists, the target is forced to finish.
  • Fix 2 — Sync-loop interception: added post-turn conclusive-reply detection in _multi_agent_sync_blackboard_worker(). After each agent turn completes, if the agent gave a conclusive reply with no open tasks and no errors, the loop breaks immediately with auto-approval.
  • Fix 3 — General endpoint detection in fallback: extended _detect_endpoint_intent from simple_qa-only to all task types in _manager_fallback_route(), so developer conclusive replies in research/engineering/general tasks also trigger fallback finish.
  • Effect: four-layer defense (fallback → general-endpoint → policy → sync-loop) ensures conclusive replies are never ignored.

2026-03-16 Bug Fix Summary

  • Eliminated Single-mode multi-agent leak caused by executor_mode_flag overriding participant constraints.
  • Eliminated infinite delegation loops where Manager kept dispatching agents after a conclusive reply, by adding termination detection at four independent checkpoints.
  • Both fixes include safety guards: conclusive-reply finish is suppressed when error logs exist or open todo items remain.

中文

按优先级排序的更新内容

  1. Single 模式 Agent 泄漏修复(严重)
  • 修复 _manager_apply_task_policy():当 executor_mode_flag=True 时,target 不在 participants 的分支(L16241-16248)会 append 额外 Agent,覆盖了 L16226-16227 设置的 participants = [assigned_expert] 约束。
  • 新增硬约束后置守卫:在所有 participant/target 解析完成后,若 mode == EXECUTION_MODE_SINGLE,强制重置 participants = [assigned_expert],并将非 expert 的 target 重定向回 assigned_expert。
  • 效果:Single 模式任务无论 executor_mode_flag 或 LLM 路由决策如何,都保证只使用一个 Agent。
  1. 结论性回复终止信号修复(严重)
  • 根因:当 Agent(如 developer)回复"任务完成"等结论性文本时,Manager 忽略该信号,原因是:(a) _manager_fallback_route() 中的结论检测仅在 fallback 路径触发,LLM 工具路由路径完全绕过;(b) _manager_apply_task_policy() 没有任何结论检测逻辑;(c) 文本形式的完成信号不会设置 blackboard approval.approved
  • 修复 1 — Policy 层拦截:在 _manager_apply_task_policy()can_finish_from_approval 检查之前,新增结论性回复检测。当任意 Agent 最新文本为结论性回复、无待办事项、无错误日志时,强制 target 为 finish
  • 修复 2 — Sync 循环拦截:在 _multi_agent_sync_blackboard_worker() 每个 Agent turn 完成后,新增结论性回复检测。满足条件时立即 break 并自动 approve。
  • 修复 3 — Fallback 通用 endpoint 检测:将 _detect_endpoint_intent 从仅限 simple_qa 扩展到所有任务类型,使 research/engineering/general 类型的 developer 结论性回复也能触发 fallback finish。
  • 效果:四层防御(fallback → 通用 endpoint → policy → sync 循环)确保结论性回复不会被忽略。

2026-03-16 修复总结

  • 消除了 executor_mode_flag 覆盖 participant 约束导致的 Single 模式多 Agent 泄漏。
  • 消除了 Agent 给出结论性回复后 Manager 仍反复委派的死循环,通过在四个独立检查点添加终止检测。
  • 两项修复均包含安全守卫:存在错误日志或待办事项时,结论检测不会触发 finish(避免误杀)。

日本語

優先度順の更新内容

  1. Single モード Agent リーク修正(重大)
  • _manager_apply_task_policy() ��修正:executor_mode_flag=True の場合、target が participants に含まれない分岐(L16241-16248)が追加 Agent を append し、L16226-16227 で設定した participants = [assigned_expert] 制約を上書きしていた。
  • ハードガードを追加:全 participant/target 解決後、mode == EXECUTION_MODE_SINGLE であれば participants = [assigned_expert] を強制リセットし、expert 以外の target を assigned_expert にリダイレクト。
  • 効果:executor_mode_flag や LLM ルーティング結果に関わらず、Single モードタスクは必ず 1 Agent のみで実行。
  1. 結論的応答の終了シグナル修正(重大)
  • 根本原因:Agent(例: developer)が「タスク完了」等の結論的応答を返しても Manager が無視していた。理由:(a) _manager_fallback_route() の結論検出は fallback パスでのみ実行され、ツール解析ルーティングパスでは完全にバイパス;(b) _manager_apply_task_policy() に結論検出ロジックが皆無;(c) テキストベースの完了シグナルでは blackboard approval.approved が設定されない。
  • 修正 1 — Policy 層インターセプト:_manager_apply_task_policy()can_finish_from_approval チェック前に結論的応答検出を追加。Agent の最新テキストが結論的で、未完了タスクなし、エラーログなしの場合、target を finish に強制。
  • 修正 2 — Sync ループインターセプト:_multi_agent_sync_blackboard_worker() で各 Agent ターン完了後に結論的応答検出を追加。条件を満たせば即座に break し自動承認。
  • 修正 3 — Fallback 汎用 endpoint 検出:_detect_endpoint_intentsimple_qa 限定から全タスクタイプに拡張し、research/engineering/general タイプの developer 結論的応答でも fallback finish をトリガー。
  • 効果:4 層防御(fallback → 汎用 endpoint → policy → sync ループ)により結論的応答の見落としを排除。

2026-03-16 修正サマリー

  • executor_mode_flag が participant 制約を上書きすることで発生していた Single モードの複数 Agent リークを解消。
  • Agent が結論的応答を返した後も Manager が繰り返し委任する無限ループを、4 つの独立チェックポイントでの終了検出により解消。
  • 両修正にはセーフガードを含む:エラーログまたは未完了タスクが存在する場合、結論検出は finish をトリガーしない(誤終了防止)。