Skip to content

Gemma 4#566

Merged
arkavo-com merged 78 commits intomainfrom
feature/gemma4-support
Apr 14, 2026
Merged

Gemma 4#566
arkavo-com merged 78 commits intomainfrom
feature/gemma4-support

Conversation

@arkavo-com
Copy link
Copy Markdown
Contributor

Summary

  • Update vendor/llama.cpp to c08d28d08 with all Gemma 4 PRs (#21309, #21326, #21343, #21390, #21406, #21418)
  • Add ModelFormat::Gemma4 with detection, stop sequences, chat templates across all crates
  • Add ModelChoice variants for Gemma-4-E2B, E4B, and 26B-A4B with full registry wiring (repo IDs, GGUF filenames, size estimates, escalation paths, detail levels)
  • Add arkavo_chat_parse FFI exposing llama.cpp's native PEG output parser — provider tries native parser before our fallback chain
  • Version bump 0.69.1 → 0.70.0

Tool Bench Results (Q4_K_M, Apple Silicon)

Model Active Params Parse Avg Latency
Qwen3.5-0.8B 0.8B 8/8 525ms
Ministral-3-3B 3B 8/8 620ms
Gemma-4-E2B 2.3B (5.1B w/ PLE) 8/8 2,229ms
Gemma-4-26B-A4B 4B (26B MoE) 8/8 7,410ms
Gemma-4-E4B 4.5B (8B w/ PLE) 1/8 blocked

E4B requires non-lazy grammar sampler integration (generation_prompt prefill) which our standalone sampler doesn't support yet. Commented out from bench discovery.

Test plan

  • cargo build -q compiles cleanly
  • cargo clippy -- -D warnings passes on changed crates
  • cargo test -p arkavo-llama-cpp — 18 tests pass (including new Gemma 4 format detection)
  • cargo test -p arkavo-llm --lib — 209 tests pass
  • cargo test -p arkavo-torg --lib — 14 tests pass
  • arkavo tool-bench --model gemma-4-e2b — 8/8
  • arkavo tool-bench --model gemma-4-26b-a4b — 8/8

🤖 Generated with Claude Code

Update vendor/llama.cpp to c08d28d08 (post-April 4) picking up all Gemma 4
PRs: core model support (#21309), template fixes (#21326), tokenizer fix
(#21343), logit softcapping (#21390), newline split (#21406), and dedicated
tool-call parser (#21418).

Add ModelFormat::Gemma4 with detection, stop sequences, and chat templates.
Add ModelChoice variants for E2B, E4B, and 26B-A4B with full registry wiring.
Add arkavo_chat_parse FFI exposing llama.cpp's native PEG output parser for
Gemma 4 tool calls. Provider tries native parser before fallback chain.

Tool bench results (Q4_K_M, Apple Silicon):
- Gemma-4-E2B (2.3B active): 8/8, 2,229ms
- Gemma-4-26B-A4B (4B active MoE): 8/8, 7,410ms
- Gemma-4-E4B (4.5B active): 1/8 — blocked on non-lazy grammar sampler

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Spec Coverage Delta

Metric main PR Delta
Coverage 20.6% 20.7% +0.2%
Scenarios 632 632 0
Covered 130 131 +1
Partial 14 14 0
Missing 464 463 -1

Newly Covered

  • SRV-010: missing -> covered

Spec Coverage Report

Spec Scenarios Covered Partial Missing Coverage
snpe 6 0 0 6 0%
wallet 7 0 0 7 0%
mcp-tools 10 0 0 10 0%
config-encryption 5 0 0 5 0%
git 6 0 0 6 0%
gemini 10 0 0 10 0%
sbe 6 0 0 6 0%
tdf-iroh 8 0 0 8 0%
autolearn 8 0 5 3 31%
titan 7 0 0 7 0%
budget 7 0 0 7 0%
events 8 1 0 7 12%
crypto 11 9 1 1 86%
mcp-mesh 8 0 0 8 0%
tdf-security 15 0 0 15 0%
kimi 10 0 0 10 0%
observability 8 0 0 8 0%
critic 17 0 0 17 0%
mcp-macos 9 0 0 9 0%
protocol 19 5 0 14 26%
ensemble 7 0 0 7 0%
qr-registration 5 3 1 1 70%
llama-cpp 10 4 2 4 50%
config-transport 4 0 0 4 0%
dataflow 6 0 0 6 0%
kv-cache 5 5 0 0 100%
repo 5 0 0 5 0%
browser 6 0 0 6 0%
security 6 4 2 0 83%
agent-sdk 5 4 0 1 80%
ui-generator 6 0 0 6 0%
ucp 8 0 0 8 0%
orchestrator 11 0 0 11 0%
terminal 8 0 0 8 0%
cef 7 0 0 7 0%
agent 5 1 0 4 20%
ui-core 4 0 0 4 0%
agui 12 2 0 10 16%
mcp-runtime 7 0 0 7 0%
device-identity 6 0 0 6 0%
code-search 5 0 0 5 0%
sequence-integrity 17 15 0 2 88%
llm-core 6 0 0 6 0%
evofabric 8 8 0 0 100%
memory 7 0 0 7 0%
router 17 0 0 17 0%
session-security 21 20 1 0 97%
config-bundle 4 0 0 4 0%
mcp-traits 6 2 0 4 33%
context 15 2 1 12 16%
torg-circuits 5 0 0 5 0%
workspace 6 0 0 6 0%
server 10 7 1 2 75%
task-orchestration 8 0 0 8 0%
debugger 6 0 0 6 0%
tdf 9 0 0 9 0%
gossip-protocol 8 7 0 1 87%
openclaw 6 0 0 6 0%
agent-auth 6 0 0 6 0%
qwen 9 0 0 9 0%
network-security 17 8 0 9 47%
chat-session 24 0 0 24 0%
validation 7 7 0 0 100%
registration 12 12 0 0 100%
cli 17 0 0 17 0%
trusted-agent 7 0 0 7 0%
sat 6 0 0 6 0%
torg 8 0 0 8 0%
attestation 6 0 0 6 0%
deepseek 5 0 0 5 0%
hrm 6 5 0 1 83%
authorization 6 0 0 6 0%
github 5 0 0 5 0%
mcp-claude 9 0 0 9 0%
Total 632 131 14 487 20.7%

Quality Gate

  • ✅ Overall coverage: 20.7% (threshold: 1%)
WIP Scenarios (24) — tracked via issues
ID Spec Scenario Issue
SEQ-001 sequence-integrity.spec Tag data with provenance at ingestion #549
SEQ-002 sequence-integrity.spec Propagate taint through data transformations #549
SEQ-003 sequence-integrity.spec Block tainted data exfiltration at egress #548
SEQ-004 sequence-integrity.spec Build directed action graph per session #556
SEQ-005 sequence-integrity.spec Learn behavioral baselines per agent skill #552
SEQ-006 sequence-integrity.spec Detect sequence divergence from baseline #553
SEQ-007 sequence-integrity.spec Persist sequence fragments to cross-session ledger #554
SEQ-008 sequence-integrity.spec Detect multi-session decomposition attacks #554
SEQ-009 sequence-integrity.spec Correlate decomposition across agent identities #554
SEQ-010 sequence-integrity.spec Evaluate sequence-aware TØR-G circuit before action #555
SEQ-011 sequence-integrity.spec Synchronous gate on high-consequence actions #555
SEQ-012 sequence-integrity.spec Async detection for low-consequence action coverage #553
SEQ-013 sequence-integrity.spec Titan integration for statistical sequence drift #553
SEQ-014 sequence-integrity.spec Egress filter enhanced with provenance #548
SEQ-015 sequence-integrity.spec Sequence evidence in audit events #550
SEQ-016 sequence-integrity.spec Configure sequence integrity per agent role #551
SEQ-017 sequence-integrity.spec Handle sequence tracking errors gracefully #551
TA-001 trusted-agent.spec Happy path first boot trust establishment #558
TA-002 trusted-agent.spec Attestation rejection blocks mesh access #558
TA-003 trusted-agent.spec Re-attestation refresh for trusted agent #558
TA-004 trusted-agent.spec Budget exhaustion suspends trusted agent #558
TA-005 trusted-agent.spec Hard revocation of agent #558
TA-006 trusted-agent.spec Config transport failure with retry #558
TA-007 trusted-agent.spec Offline agent has no mesh trust #558
Uncovered Scenarios (478)
ID Spec Criticality Scenario
SNPE-001 snpe.spec critical Initialize SNPE runtime dynamically
SNPE-002 snpe.spec critical Load DLC model for inference
SNPE-003 snpe.spec critical Execute inference on target
SNPE-004 snpe.spec high Convert tensors for SNPE format
SNPE-005 snpe.spec medium Detect available acceleration targets
SNPE-006 snpe.spec high Handle SNPE errors gracefully
WAL-001 wallet.spec critical Generate BIP39 mnemonic
WAL-002 wallet.spec critical Create HD wallet from mnemonic
WAL-003 wallet.spec critical Derive Ethereum keypair
WAL-004 wallet.spec critical Build and sign transaction
WAL-005 wallet.spec high Recover signer from transaction
WAL-006 wallet.spec medium Generate EIP-55 checksummed address
WAL-007 wallet.spec low Export to DID key format
MCP-001 mcp-tools.spec high Register built-in tools in registry
MCP-002 mcp-tools.spec high Discover tools with detail level
MCP-003 mcp-tools.spec critical Execute tool with parameters
MCP-004 mcp-tools.spec high Filesystem tool operations
MCP-005 mcp-tools.spec medium Git tool operations
MCP-006 mcp-tools.spec medium GitHub API tool operations
MCP-007 mcp-tools.spec medium Code analysis with Semgrep
MCP-008 mcp-tools.spec high Shell execution with safety
MCP-009 mcp-tools.spec low Web search tool
MCP-010 mcp-tools.spec high TDF encryption tool
CFGE-001 config-encryption.spec high Create encryptor with KAS URL
CFGE-002 config-encryption.spec critical Encrypt configuration bundle
CFGE-003 config-encryption.spec critical Decrypt encrypted bundle
CFGE-004 config-encryption.spec high Create policy with attributes
CFGE-005 config-encryption.spec medium Generate ephemeral keypair
GIT-001 git.spec medium Initialize new repository
GIT-002 git.spec high Get repository status
GIT-003 git.spec high Create commit with AI-generated message
GIT-004 git.spec medium Safely undo last commit
GIT-005 git.spec medium Sync with upstream remote
GIT-006 git.spec medium Create and checkout branch
GEM-001 gemini.spec critical Initialize REST client with API key
GEM-002 gemini.spec high Execute tool-based conversation
GEM-003 gemini.spec high Stream response via SSE
GEM-004 gemini.spec critical Establish live session connection
GEM-005 gemini.spec medium Send audio content in live session
GEM-006 gemini.spec high Handle server tool calls in live session
GEM-007 gemini.spec high Register and dispatch tools
GEM-008 gemini.spec high Handle Gemini API errors
GEM-009 gemini.spec medium Configure generation parameters
GEM-010 gemini.spec high Parse streaming response chunks
SBE-001 sbe.spec critical Create hierarchical graph with layers
SBE-002 sbe.spec high Register nodes to specific layers
SBE-003 sbe.spec critical Apply adaptive patchlet
SBE-004 sbe.spec high Rollback adaptive changes
SBE-005 sbe.spec critical Evaluate hierarchical graph
SBE-006 sbe.spec high Define invariant contract
IROH-001 tdf-iroh.spec critical Create Iroh transport
IROH-002 tdf-iroh.spec critical Stage blob data
IROH-003 tdf-iroh.spec critical Fetch blob via ticket
IROH-004 tdf-iroh.spec high Serialize and deserialize ticket
IROH-005 tdf-iroh.spec high Manage Iroh node lifecycle
IROH-006 tdf-iroh.spec medium Configure node parameters
IROH-007 tdf-iroh.spec high Handle transport errors
IROH-008 tdf-iroh.spec high Integrate with TDF encryptor
AUTO-006 autolearn.spec medium Burst feedback for rapid learning
AUTO-007 autolearn.spec low Agent contribution tracking
AUTO-008 autolearn.spec high Patchlet rollback on degradation
TITAN-001 titan.spec critical Create Titan monitor
TITAN-002 titan.spec critical Evaluate with anomaly detection
TITAN-003 titan.spec high Detect hard failures
TITAN-004 titan.spec high Detect boundary violations
TITAN-005 titan.spec medium Detect statistical drift
TITAN-006 titan.spec high Receive anomaly evidence
TITAN-007 titan.spec medium Update EMA accumulator
BUDGET-001 budget.spec critical Track token cost for LLM call
BUDGET-002 budget.spec critical Enforce budget limit before call
BUDGET-003 budget.spec high Model selection based on cost policy
BUDGET-004 budget.spec high Alert when threshold exceeded
BUDGET-005 budget.spec medium Provider cost configuration
BUDGET-006 budget.spec medium Budget status with projections
BUDGET-007 budget.spec low Architect savings report
EVENT-001 events.spec high Create event with payload
EVENT-002 events.spec medium Event types match payloads
EVENT-003 events.spec medium Parent-child event relationships
EVENT-004 events.spec high Correlation across services
EVENT-006 events.spec high Payload serialization
EVENT-007 events.spec medium Session lifecycle events
EVENT-008 events.spec high Tool call and result events
CRYPTO-011 crypto.spec high ECDH key agreement for KAS operations
MESH-001 mcp-mesh.spec high Create mesh tools state
MESH-002 mcp-mesh.spec high Register mesh tools
MESH-003 mcp-mesh.spec high List discovered agents
MESH-004 mcp-mesh.spec high Query agents by capability
MESH-005 mcp-mesh.spec critical Delegate task to agent
MESH-006 mcp-mesh.spec high Get delegated task status
MESH-007 mcp-mesh.spec medium Cache discovered agent addresses
MESH-008 mcp-mesh.spec high Handle mesh tool errors
TDFS-001 tdf-security.spec critical Control plane commands encrypted with TDF
TDFS-002 tdf-security.spec critical Configuration bundle TDF encryption
TDFS-003 tdf-security.spec high TDF-JSON format for API compatibility
TDFS-004 tdf-security.spec high TDF-CBOR format for efficiency
TDFS-005 tdf-security.spec medium Format negotiation between agents
TDFS-006 tdf-security.spec critical Policy binding prevents policy tampering
TDFS-007 tdf-security.spec high Key escrow for data recovery
TDFS-008 tdf-security.spec critical Attribute authority verification
TDFS-009 tdf-security.spec high Time-based policy enforcement
TDFS-010 tdf-security.spec high Data residency enforcement
TDFS-011 tdf-security.spec critical Secure key hierarchy
TDFS-012 tdf-security.spec medium Offline policy evaluation
TDFS-013 tdf-security.spec critical Forward secrecy for key agreement
TDFS-014 tdf-security.spec high Policy update propagation
TDFS-015 tdf-security.spec medium TDF payload obfuscation
KIMI-001 kimi.spec critical Send native Kimi chat request
KIMI-002 kimi.spec critical Stream responses with native format
KIMI-003 kimi.spec medium Select model by context needs
KIMI-004 kimi.spec high Retry failed requests with backoff
KIMI-005 kimi.spec medium Handle partial message generation
KIMI-006 kimi.spec critical Use Kimi K2.5 series models
KIMI-007 kimi.spec medium Enable thinking mode on K2.5
KIMI-008 kimi.spec medium Disable thinking mode on K2.5
KIMI-009 kimi.spec medium Stream with reasoning content
KIMI-010 kimi.spec medium Select appropriate model variant
OBS-001 observability.spec high Initialize observability with config
OBS-002 observability.spec high Session metrics track active sessions
OBS-003 observability.spec medium Metrics collector aggregates globally
OBS-004 observability.spec high Health reporter checks components
OBS-005 observability.spec medium Task tracker monitors async operations
OBS-006 observability.spec low Agent detection identifies AI agents
OBS-007 observability.spec medium OTLP export when collector available
OBS-008 observability.spec medium Metrics snapshot captures current state
CRIT-001 critic.spec critical Create default verification pipeline
CRIT-002 critic.spec high Add custom check to pipeline
CRIT-003 critic.spec critical Run circuit check
CRIT-004 critic.spec high Run schema validation check
CRIT-005 critic.spec medium Run semantic coherence check
CRIT-006 critic.spec high Collect verification evidence
CRIT-007 critic.spec medium Judge response quality
CRIT-008 critic.spec medium Configure critic behavior
CRIT-009 critic.spec high Analyze response for code fence issues
CRIT-010 critic.spec high Detect output loops in model responses
CRIT-011 critic.spec high Record feedback as learning episode
CRIT-012 critic.spec medium Check for pattern-based prompt adjustment
CRIT-013 critic.spec medium Detect wrong expert routing for GLM models
CRIT-014 critic.spec medium Extract first answer from loopy response
CRIT-015 critic.spec high Record timeout feedback
CRIT-016 critic.spec medium Get model issue counts by category
CRIT-017 critic.spec low Detect model family from name
MCPM-001 mcp-macos.spec critical Initialize test harness
MCPM-002 mcp-macos.spec high Parse Gherkin feature file
MCPM-003 mcp-macos.spec critical Execute test scenario
MCPM-004 mcp-macos.spec high Manage execution state
MCPM-005 mcp-macos.spec high Launch iOS simulator
MCPM-006 mcp-macos.spec high Integrate with MCP protocol
MCPM-007 mcp-macos.spec medium Generate test report
MCPM-008 mcp-macos.spec medium Use AI for test assistance
MCPM-009 mcp-macos.spec medium Handle memory operations
PROTO-004 protocol.spec high Handle agent discovery via mDNS
PROTO-005 protocol.spec high Bridge A2A to MCP
PROTO-007 protocol.spec high Enforce rate limiting
PROTO-009 protocol.spec medium Collect RPC metrics
PROTO-010 protocol.spec high Handle protocol errors
PROTO-011 protocol.spec high Create peer manager with configuration
PROTO-012 protocol.spec high Connect to peer with HTTP transport
PROTO-013 protocol.spec high Connect to peer with WebSocket transport
PROTO-014 protocol.spec high Auto-upgrade transport for streaming methods
PROTO-015 protocol.spec high Broadcast message to all connected peers
PROTO-016 protocol.spec high Send request to specific peer
PROTO-017 protocol.spec medium Get connected peer information
PROTO-018 protocol.spec medium Check peer connection status
PROTO-019 protocol.spec medium Connect to multiple peers at once
ENS-001 ensemble.spec critical Create policy ensemble with production policy
ENS-002 ensemble.spec high Add candidate policy to ensemble
ENS-003 ensemble.spec critical Evaluate counterfactually on real input
ENS-004 ensemble.spec high Accumulate regret over attribution window
ENS-005 ensemble.spec critical Check for promotion candidates
ENS-006 ensemble.spec medium Generate candidate via LLM synthesis
ENS-007 ensemble.spec medium Compute weighted cost across multiple objectives
QREG-003 qr-registration.spec high Descriptor with entitlements
LLAMA-003 llama-cpp.spec critical Create inference context
LLAMA-005 llama-cpp.spec high Tokenize and detokenize round-trip
LLAMA-008 llama-cpp.spec medium Parameters fit check
LLAMA-009 llama-cpp.spec medium Musl target stub behavior
CFGT-001 config-transport.spec high Create transport client
CFGT-002 config-transport.spec critical Send encrypted bundle to agent
CFGT-003 config-transport.spec critical Receive bundle on transport server
CFGT-004 config-transport.spec medium Request config from agent
DATA-001 dataflow.spec high Create pipeline from blueprint
DATA-002 dataflow.spec high Create pipeline from natural language
DATA-003 dataflow.spec high Start pipeline execution
DATA-004 dataflow.spec high Stop pipeline gracefully
DATA-005 dataflow.spec medium Export blueprint to JSON
DATA-006 dataflow.spec medium Import blueprint from JSON
REPO-001 repo.spec high Get repository info from path
REPO-002 repo.spec medium Calculate file count recursively
REPO-003 repo.spec medium Extract git metadata
REPO-004 repo.spec low Detect primary language
REPO-005 repo.spec high Build repository context for agent
BROWS-001 browser.spec high Create browser tool instance
BROWS-002 browser.spec high Navigate to URL
BROWS-003 browser.spec high Inject script into page
BROWS-004 browser.spec medium Click element by selector
BROWS-005 browser.spec medium Extract page content
BROWS-006 browser.spec high Handle browser errors
CASDK-003 agent-sdk.spec high MCP tool registration via rmcp
UIG-001 ui-generator.spec high Initialize UI generator
UIG-002 ui-generator.spec critical Generate UI from intent
UIG-003 ui-generator.spec medium Build generation prompt
UIG-004 ui-generator.spec medium Render generated code
UIG-005 ui-generator.spec low Track generation metadata
UIG-006 ui-generator.spec medium Stream UI generation progress
UCP-001 ucp.spec critical Create payment intent
UCP-002 ucp.spec critical Evaluate payment against commerce policy
UCP-003 ucp.spec critical Execute budget payment (USD)
UCP-004 ucp.spec critical Execute EVM payment (ETH)
UCP-005 ucp.spec high Track payment status
UCP-006 ucp.spec high Complete payment lifecycle
UCP-007 ucp.spec medium Register MCP tools for payments
UCP-008 ucp.spec low Get payment statistics
ORCH-001 orchestrator.spec critical Initialize orchestrator
ORCH-002 orchestrator.spec critical Analyze GitHub issue
ORCH-003 orchestrator.spec critical Route issue to execution strategy
ORCH-004 orchestrator.spec critical Create execution plan with cognitive engine
ORCH-005 orchestrator.spec critical Execute plan with verification
ORCH-006 orchestrator.spec high Assign agents to tasks
ORCH-007 orchestrator.spec high Process code chunks
ORCH-008 orchestrator.spec high Solve code problems
ORCH-009 orchestrator.spec high Create collaborative task plan
ORCH-010 orchestrator.spec critical Handle GitHub webhook
ORCH-011 orchestrator.spec high Handle orchestrator errors
TERM-001 terminal.spec critical Run terminal UI application
TERM-002 terminal.spec high Handle application events
TERM-003 terminal.spec high Send LLM request
TERM-004 terminal.spec high Receive LLM response
TERM-005 terminal.spec medium Spawn multiple terminals
TERM-006 terminal.spec medium Render diff view
TERM-007 terminal.spec low Integrate Vim editor
TERM-008 terminal.spec low Integrate Helix editor
CEF-001 cef.spec critical Spawn CEF renderer process
CEF-002 cef.spec high Execute DOM command
CEF-003 cef.spec high Track command health
CEF-004 cef.spec medium Handle async DOM operations
CEF-005 cef.spec medium Receive DOM events
CEF-006 cef.spec critical Communicate via UDS transport
CEF-007 cef.spec high Handle CEF errors
AGENT-001 agent.spec critical Load and validate agent configuration
AGENT-002 agent.spec critical Register agent with control plane
AGENT-003 agent.spec high Discover peers via mDNS
AGENT-004 agent.spec medium Report device capabilities
UIC-001 ui-core.spec high Create UI content
UIC-002 ui-core.spec high Handle UI event
UIC-003 ui-core.spec medium Integrate with LLM for UI generation
UIC-004 ui-core.spec medium Adapt UI for different backends
AGUI-001 agui.spec critical Initialize AGUI gateway
AGUI-002 agui.spec high Discover agents via mDNS
AGUI-003 agui.spec high Establish agent connection
AGUI-004 agui.spec medium Collect command health data
AGUI-005 agui.spec medium Analyze timeout patterns
AGUI-006 agui.spec low Calculate ROI metrics
AGUI-007 agui.spec high Handle UI events
AGUI-008 agui.spec medium Stream dataflow updates
AGUI-011 agui.spec high Cache context topology per agent
AGUI-012 agui.spec high Push context utilization via telemetry stream
MCPR-001 mcp-runtime.spec critical Create MCP server
MCPR-002 mcp-runtime.spec critical Accept client connection
MCPR-003 mcp-runtime.spec critical Execute tool with timeout
MCPR-004 mcp-runtime.spec high Create stdio transport
MCPR-005 mcp-runtime.spec high Create SSE transport
MCPR-006 mcp-runtime.spec medium Poll endpoint with adaptive backoff
MCPR-007 mcp-runtime.spec high Handle runtime errors
DEVICE-001 device-identity.spec high Get or create device ID on first launch
DEVICE-002 device-identity.spec high Retrieve existing device ID
DEVICE-003 device-identity.spec medium Store device ID explicitly
DEVICE-004 device-identity.spec high Create agent identity with device
DEVICE-005 device-identity.spec medium Device ID roundtrip conversion
DEVICE-006 device-identity.spec high Platform-specific secure storage
CS-001 code-search.spec high Register code search tools
CS-002 code-search.spec high Search code with regex pattern
CS-003 code-search.spec high Perform structural refactoring with Comby
CS-004 code-search.spec high Parse code with TreeSitter
CS-005 code-search.spec medium Handle code search errors
LLM-001 llm-core.spec critical Send chat request to provider
LLM-002 llm-core.spec critical Stream chat responses
LLM-003 llm-core.spec high Parse tool calls from response
LLM-004 llm-core.spec high Execute tool with results
LLM-005 llm-core.spec medium Switch provider dynamically
LLM-006 llm-core.spec high Handle provider errors gracefully
MEM-001 memory.spec high Store memory with embedding
MEM-002 memory.spec high Search memories by semantic similarity
MEM-003 memory.spec high Context ledger appends conversation turns
MEM-004 memory.spec high Retrieve conversation context
MEM-005 memory.spec medium Plan state persistence across restarts
MEM-006 memory.spec medium Orchestrator state tracks issue processing
MEM-007 memory.spec medium Workspace config per organization
ROUTER-001 router.spec critical Route task to optimal model
ROUTER-002 router.spec critical Quality gate with retry and escalation
ROUTER-003 router.spec high Offline mode restricts to local models
ROUTER-004 router.spec critical Preflight moderation blocks policy violations
ROUTER-005 router.spec high Stream backpressure handling
ROUTER-006 router.spec medium Model discovery finds available models
ROUTER-007 router.spec high Connectivity checker marks providers unavailable
ROUTER-008 router.spec high Tool request parsing and routing
ROUTER-009 router.spec medium RLM decomposition for complex tasks
ROUTER-010 router.spec medium Architect planner for multi-step workflows
ROUTER-011 router.spec high Strip think blocks from response
ROUTER-012 router.spec high Strip tool blocks from response
ROUTER-013 router.spec high Sanitize response output
ROUTER-014 router.spec high Apply sampling parameters
ROUTER-015 router.spec medium Handle vision queries with image input
ROUTER-016 router.spec medium Estimate token count for context
ROUTER-017 router.spec medium Get model context size
CFG-001 config-bundle.spec high Create configuration bundle
CFG-002 config-bundle.spec critical Validate bundle before distribution
CFG-003 config-bundle.spec medium Define agent role with capabilities
CFG-004 config-bundle.spec high Grant resource entitlements
MCPD-001 mcp-traits.spec critical Implement Tool trait
MCPD-002 mcp-traits.spec critical JSON-RPC request/response cycle
MCPD-003 mcp-traits.spec high Tool schema validation
MCPD-006 mcp-traits.spec high McpClient remote tool discovery
CTX-001 context.spec high Create semantic chunker
CTX-002 context.spec critical Chunk document semantically
CTX-003 context.spec high Deduplicate chunks
CTX-004 context.spec high Compress context with summarization
CTX-005 context.spec medium Build compression pipeline
CTX-006 context.spec high Enrich prompt with context
CTX-008 context.spec high Calculate context offload threshold by model size
CTX-009 context.spec high Offload context to ledger when threshold exceeded
CTX-010 context.spec high Restore archived context from ledger
CTX-011 context.spec medium Generate summary for archived context
CTX-012 context.spec medium Skip offload for small contexts
CTX-013 context.spec medium Estimate tokens for content
TRC-001 torg-circuits.spec critical Compile circuit from graph and features
TRC-002 torg-circuits.spec critical Evaluate circuit with feature extraction
TRC-003 torg-circuits.spec high Implement CircuitFeature trait
TRC-004 torg-circuits.spec high Thread-safe concurrent evaluations
TRC-005 torg-circuits.spec medium Zero-allocation evaluation performance
WORKSPACE-001 workspace.spec high Create isolated workspace container
WORKSPACE-002 workspace.spec critical Execute command in workspace
WORKSPACE-003 workspace.spec high Clone repository into workspace
WORKSPACE-004 workspace.spec high Enforce resource quotas
WORKSPACE-005 workspace.spec medium Cleanup workspace after use
WORKSPACE-006 workspace.spec medium Detect container runtime
SRV-007 server.spec high Agent goal lifecycle
SRV-008 server.spec medium Well-known state endpoint
ORCH-001 task-orchestration.spec high Plan task with dependencies
ORCH-002 task-orchestration.spec critical Execute task with executor
ORCH-003 task-orchestration.spec high Store task state persistently
ORCH-004 task-orchestration.spec high Retry failed task with backoff
ORCH-005 task-orchestration.spec medium Human review for ambiguous tasks
ORCH-006 task-orchestration.spec medium Parallel subtask execution
ORCH-007 task-orchestration.spec medium Task cancellation
ORCH-008 task-orchestration.spec low Task progress tracking
DBG-001 debugger.spec high Start session recording
DBG-002 debugger.spec high Replay recorded session
DBG-003 debugger.spec medium Analyze error patterns
DBG-004 debugger.spec medium Generate health report
DBG-005 debugger.spec high Check system health
DBG-006 debugger.spec medium Handle debugger errors
TDF-001 tdf.spec critical Encrypt data with policy
TDF-002 tdf.spec critical Decrypt data with KAS rewrap
TDF-003 tdf.spec high Policy builder with ABAC attributes
TDF-004 tdf.spec critical ABAC evaluation for access decision
TDF-005 tdf.spec high Delegation token verification
TDF-006 tdf.spec high Streaming encryption for large files
TDF-007 tdf.spec medium Blob transport stages encrypted payload
TDF-008 tdf.spec high A2A KAS handler processes rewrap requests
TDF-009 tdf.spec medium OpenTDF integration for standard compliance
GOSSIP-003 gossip-protocol.spec high Deduplicate messages by content hash
OCLAW-001 openclaw.spec critical WebSocket handshake with device auth
OCLAW-002 openclaw.spec critical Frame translation OpenClaw to A2A
OCLAW-003 openclaw.spec critical Frame translation A2A to OpenClaw
OCLAW-004 openclaw.spec critical Device identity verification
OCLAW-005 openclaw.spec high Listener accepts connections
OCLAW-006 openclaw.spec high Dispatcher routes events
AAUTH-001 agent-auth.spec critical Request authentication token
AAUTH-002 agent-auth.spec critical Store token securely
AAUTH-003 agent-auth.spec high Load stored token
AAUTH-004 agent-auth.spec high Refresh expired token
AAUTH-005 agent-auth.spec critical Complete challenge-response auth
AAUTH-006 agent-auth.spec medium Delete token on logout
QWE-001 qwen.spec critical Initialize Qwen client with region
QWE-002 qwen.spec high Execute chat completion
QWE-003 qwen.spec high Stream chat completion responses
QWE-004 qwen.spec medium Process vision input with image
QWE-005 qwen.spec high Execute tool calls in conversation
QWE-006 qwen.spec medium Switch between Qwen models
QWE-007 qwen.spec high Handle Qwen API errors
QWE-008 qwen.spec low Create image URL from file path
QWE-009 qwen.spec medium Format messages for provider
NET-003 network-security.spec high Public binding requires explicit acknowledgment
NET-010 network-security.spec high Rate limiting per IP
NET-011 network-security.spec high Admin interface separate from public API
NET-012 network-security.spec high Security configuration audit on startup
NET-013 network-security.spec medium Service fingerprint minimization
NET-014 network-security.spec high DNS resolution validation before connection
NET-015 network-security.spec critical Prompt injection attack prevention
NET-016 network-security.spec critical Command injection via LLM output is prevented
NET-017 network-security.spec high Egress audit logging
CHAT-001 chat-session.spec high Create authenticated chat session
CHAT-002 chat-session.spec high Send message to active session
CHAT-003 chat-session.spec high Reject message to non-active session
CHAT-004 chat-session.spec high Stream LLM deltas with back-pressure
CHAT-005 chat-session.spec medium Process metrics acknowledgment
CHAT-006 chat-session.spec high Close session gracefully
CHAT-007 chat-session.spec medium TTL cleaner removes expired sessions
CHAT-008 chat-session.spec high Get delta stream for session
CHAT-009 chat-session.spec high Handle session with router and tools
CHAT-010 chat-session.spec medium Session enters zombie state on abnormal exit
CHAT-011 chat-session.spec high Handle router service unavailability
CHAT-012 chat-session.spec high Handle tool execution timeout
CHAT-013 chat-session.spec medium Reject malformed delta message
CHAT-014 chat-session.spec high Create conversation manager with storage
CHAT-015 chat-session.spec high Start conversation session with metadata
CHAT-016 chat-session.spec high Restore last conversation session with compatibility check
CHAT-017 chat-session.spec high Add message to conversation
CHAT-018 chat-session.spec high Get context messages with limits
CHAT-019 chat-session.spec medium Create conversation summary
CHAT-020 chat-session.spec medium List available conversation sessions
CHAT-021 chat-session.spec medium Switch to different conversation session
CHAT-022 chat-session.spec medium Get session statistics
CHAT-023 chat-session.spec low Clear current session
CHAT-024 chat-session.spec high Sanitize message content for small models
CLI-001 cli.spec high Initialize CLI with tracing
CLI-002 cli.spec high Execute first-run flow
CLI-003 cli.spec high Dispatch agent command
CLI-004 cli.spec high Dispatch chat command
CLI-005 cli.spec high Dispatch task command
CLI-006 cli.spec medium Handle missing command (default to agent)
CLI-007 cli.spec low Platform-specific test command
CLI-008 cli.spec high Handle /new command in REPL
CLI-009 cli.spec high Handle /clear command in REPL
CLI-010 cli.spec medium Handle /context command in REPL
CLI-011 cli.spec medium Handle /history command in REPL
CLI-012 cli.spec medium Handle /switch command in REPL
CLI-013 cli.spec medium Handle /read command in REPL
CLI-014 cli.spec medium Handle /list command in REPL
CLI-015 cli.spec high Handle chat with model parameters
CLI-016 cli.spec medium Handle chat with image for vision
CLI-017 cli.spec high Handle /exit or /quit command
SAT-001 sat.spec critical Extract CNF from TØRG policy graph
SAT-002 sat.spec high Find boundary probes for output
SAT-003 sat.spec high Stress test policy for holes
SAT-004 sat.spec medium Cache boundary probe results
SAT-005 sat.spec medium Schedule probe tasks with CPU budget
SAT-006 sat.spec medium Prioritize anomalies for probing
TORG-001 torg.spec high Create Qwen3 token mapping from vocabulary
TORG-002 torg.spec high Create Ministral token mapping
TORG-003 torg.spec critical Initialize TorgLlamaSampler
TORG-004 torg.spec critical Get logit bias for current decoder state
TORG-005 torg.spec critical Feed sampled token to advance state
TORG-006 torg.spec critical Finish sampling and extract graph
TORG-007 torg.spec critical Evaluate graph on inputs
TORG-008 torg.spec medium Format TØRG system prompt
ATT-001 attestation.spec high Detect platform code
ATT-002 attestation.spec critical Create platform evidence from identity
ATT-003 attestation.spec critical Collect evidence via platform attestor
ATT-004 attestation.spec high Get security state
ATT-005 attestation.spec medium Report attestation capabilities
ATT-006 attestation.spec high Validate evidence freshness
DEEP-001 deepseek.spec critical Send chat completion request
DEEP-002 deepseek.spec critical Stream chat responses via SSE
DEEP-003 deepseek.spec medium Use reasoning model for complex tasks
DEEP-004 deepseek.spec high Handle API errors with retry
DEEP-005 deepseek.spec medium Strict mode for JSON output
HRM-001 hrm.spec critical Create conductor with task store
AUTHZ-001 authorization.spec critical Get decision for single resource access
AUTHZ-002 authorization.spec high Check bulk resource permissions
AUTHZ-003 authorization.spec high Multi-resource authorization request
AUTHZ-004 authorization.spec medium Decision caching improves performance
AUTHZ-005 authorization.spec high Entity identification from JWT
AUTHZ-006 authorization.spec medium Resource specification with attributes
GITHUB-001 github.spec critical Authenticate as GitHub App
GITHUB-002 github.spec high Create issue with labels
GITHUB-003 github.spec medium Poll organization for new issues
GITHUB-004 github.spec high Handle issue with AI-generated response
GITHUB-005 github.spec high Merge pull request with checks
MCPC-001 mcp-claude.spec critical Check authentication availability
MCPC-002 mcp-claude.spec critical Initialize Claude Code capability
MCPC-003 mcp-claude.spec high Register Claude Code tools
MCPC-004 mcp-claude.spec high Configure tool permissions
MCPC-005 mcp-claude.spec high Bridge SDK to MCP protocol
MCPC-006 mcp-claude.spec medium Map events to Claude format
MCPC-007 mcp-claude.spec critical Enforce policy on requests
MCPC-008 mcp-claude.spec high Handle Claude SDK errors
MCPC-009 mcp-claude.spec medium Load configuration from file

arkavo-com and others added 28 commits April 5, 2026 11:52
Addresses three critical bugs from RimWorld Gemma 4 testing:
A2A messages bypassing MCP tool pipeline, conversation context
resetting every cycle, and fragmented event processing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
send_advisory_task confirmed self-contained (mesh_state + protocol only).
BPE merge table sourced from Llama 3.1 tokenizer.json (Apache 2.0).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
12 tasks across 3 phases. Phase 1: standalone modules (agent_event,
token_estimator, conversation_window). Phase 2: event loop integration
with overnight test gate. Phase 3: ToolMemory cleanup.

Updated spec: LlamaTokenEstimator wraps loaded model's tokenizer
instead of vendoring BPE merge table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dingMessage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sationWindow

Add two new Router methods for Phase 2 agent loop context budgeting:
- min_feasible_context_size(): iterates loaded models via model_registry.model_names(),
  returns minimum trained context size (default 4096 when no models loaded)
- any_loaded_model(): returns Arc<LlamaModel> from first loaded model for token estimation

Also re-exports LlamaModel from arkavo-llm so router can reference it without
a direct arkavo-llama-cpp dependency. Both methods are feature-gated behind
llama-cpp with a fallback returning 4096 for non-llama builds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… history

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gemma/Llama chat templates require alternating user/assistant roles.
When error cycles push user messages without assistant responses,
consecutive user messages break the template. build_messages() now
merges consecutive same-role messages before returning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Narrow pub to pub(super) for ConversationWindow and MockEstimator.
Remove ToolMemory.pending_instructions (never written to or read).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes found during playtest:
- Conductor: augment last user message with learning guidance when
  existing_messages is provided (was computing but discarding it)
- Agent loop: pass purpose as system_prompt for classification_content
  hint (was None, losing domain context for the classifier)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The classifier's task.len() > 300 rule falsely classified every
orchestrator cycle as complex (cycle prompts include ToolMemory
output), causing 7+ minute startup from unnecessary task decomposition.

Replace with a 0.8B model call via route_chat (chat_semaphore, won't
block main inference). The model classifies SINGLE vs MULTI in ~50ms.
Falls back to the heuristic classifier on timeout or model error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clarify that sequential workflows (register→observe→act) are SINGLE.
Default to SINGLE on error/timeout — false negatives are cheap,
false positives cost 80+ seconds of decomposition overhead.
Bump logging to INFO for production visibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The RPC handler was constructed before start_orchestrator_loop set
the event sender, so it cloned None and never received A2A messages
through the event channel. A2A messages kept spawning separate
conductor calls, racing the orchestrator for the GPU.

Now A2aRpcImpl holds Arc<Mutex<Option<Sender>>> (shared reference)
instead of Option<Sender> (snapshot). The handler locks the mutex
at call time to get the live sender.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both stdio and HTTP MCP clients only checked JSON-RPC level errors
(response.error) and ignored the MCP-spec isError field in
CallToolResult. Tool errors like "Serialization error: missing field
AgentType" were returned as Ok({"result": "error text"}), making
the executor and ToolMemory treat them as successful calls.

Now returns Err when isError is true, so the executor records it
as a failure and the model gets error feedback for retry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gemma-4-E2B (2.3B active, 8/8 tool accuracy) replaces Ministral-3B
as the preferred fast model for judge, synthesis, and classification
tasks. Falls back to Ministral-3B if Gemma 4 is not cached.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gemma-4-E2B, E4B, and 26B-A4B were missing from feasible_models(),
so Thompson Sampling never considered them as candidates. The models
loaded but only the hardcoded Ministral/Qwen variants were eligible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gemma-4-E4B produces non-lazy GBNF grammar that our standalone
sampler can't handle, resulting in 1/8 tool calling accuracy.
Thompson Sampling wastes 30+ seconds on validation retries before
learning to avoid it. Exclude until PEG output parser is ready.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace format_for_prompt() with format_control_signals() -> Option<String>.
ToolMemory now emits only derived signals the model can't see in raw
conversation history: setup state, duplicate warnings, action variety,
and error pattern escalation. Silent (None) when everything is fine.

ConversationWindow carries the raw history. Control signals go through
system_suffix — separate token budget, not in the cycle prompt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Case-insensitive model name matching in from_name()
- Strip "call:" prefix from Gemma 4 curly-brace tool call format
- Planner waits for executor/judge feedback before next round

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Specialists commented out in launch script until they have proactive
mesh tool usage. Commander AGENTS.md updated for Gemma 4.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On single GPU, the judge's route_fast() call contended with the
planner for Metal compute, adding 3-8s per tool result to the
feedback loop. Replace with condense_tool_result() which extracts
Delta sections and truncates — zero GPU time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Orchestrator cycle prompts and notification events are always single
tasks — the LLM complexity assessment (route_chat GPU call) was pure
overhead. New skip_complexity parameter bypasses it for these callers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Try structural condensation first (Delta extraction, free). If it
meaningfully reduces size (>50% reduction), use it. Otherwise fall
back to LLM distillation for unstructured text. Feedback budget
increased from 200 to 800 chars so the planner sees useful data.

Works for any MCP server output, not just game-rl JSON.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
arkavo-com and others added 6 commits April 6, 2026 19:43
The raw ModelRegistry path hung for 15+ min on 26B MoE due to Metal
shader compilation. The Router path matches what `arkavo chat` uses —
same model loading, context pool, and inference semaphore.

Verified: Ministral-3B completes 3 scenarios in 5.4s via Router path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Round 0 (planning) keeps full budget: thinking on, 16K tokens, full schema.
Round 1+ (execution) switches to execution profile: temp 0.1, thinking off,
max 200 tokens. Execution mode now respects model hints from AGENTS.md
instead of always falling back to the fastest local model.

This fixes the 26B MoE generating 13.5K tokens on round 1 when it only
needed ~200 for a tool call. Expected round 1 time: ~3-4s instead of 7min.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Registration responses carry 5KB+ ActionSpace/ObservationSpace schemas
that blow up context windows (14K tokens after Jinja expansion). The
condenser now replaces large arrays (>5 items) and objects (>5 fields,
>500 chars) with count summaries like "[30 items]" or "{6 fields}".

This fixes the GPU fault at position 16384 caused by context overflow
when the 26B model processes registerAgent results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…flow

Round 1 passes None for tool_registry so the Jinja template doesn't
inject 8 tool schemas (which expand 567 content tokens to 16K+ actual
tokens, causing GPU faults at position 16384). The model already has
tool schemas from round 0's conversation history.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…esult roles

RLM context size check now uses the actual model hint instead of defaulting
to 8K, preventing false RLM activation on models with larger context windows
(e.g., gemma-4-26b-a4b at 16K). Also eliminates a redundant RlmBridge
instance that was created just for system prompt generation.

Schema stripping in condense_tool_result now uses a generic is_schema_shaped()
heuristic that detects JSON Schema patterns (arrays of objects with
type+description fields) instead of stripping any large object. Observation
data (colonists, resources, alerts) now survives condensation (791 chars vs
92 chars previously), giving the planner enough context to formulate actions.

Parallel planner tool results now use Message::tool_result() with proper
call_id and tool_name instead of Message::user(). Jinja templates (especially
Gemma 4) render correct <|tool_response> tokens, fixing garbled model output
on round 1+ caused by missing conversation context.

Commander AGENTS.md updated to use gemma-4-26b-a4b model with colony-lost
reset policy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@arkavo-com
Copy link
Copy Markdown
Contributor Author

Tested llama.cpp PR ggml-org/llama.cpp#21760

Cherry-picked all 5 commits onto our vendored llama.cpp (c08d28d08). The PR fixes three Gemma 4 26B A4B parsing edge cases:

  1. Missing <|turn>model\n generation prompt after content+tool_call turns
  2. Greedy content rule eating <channel|> tokens before <|tool_call> can match
  3. Duplicate <|channel> tokens at generation start

Test results

  • Round 0 PEG parsing: Works correctly — native tool call extraction on every cycle
  • Round 1 PEG parsing: Now works after fixing route_with_tools_execution to bypass classify() and pass all tools directly to the Jinja template (was our bug, not the PR's)
  • Grammar output: Confirms scan-to-toolcall rule and <channel|> in content stop set are active
  • Live gameplay: Multi-cycle tool calling confirmed with registerAgentresetobservestep sequences

Our fix needed alongside

route_with_tools_execution was routing through classify() + keyword-based tool search, which returned 0 tools on round 1+. This caused the Jinja template to render without tools (format=2 generic instead of format=3 Gemma4). Rewrote to bypass classification entirely for execution mode — directly passes all tools with NameAndDescription detail level.

Recommendation

Approve the upstream PR. The PEG parser fixes are correct and working.

arkavo-com and others added 20 commits April 13, 2026 12:11
The Err branch of execute_tool_calls was missing add_fast_lesson,
so schema violations (e.g. missing "Type" field) were never persisted
as corrective lessons. The model saw the error within one tool loop
iteration but lost it when the conversation cleared between agent
cycles, repeating the same mistake indefinitely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cdc11a

Learning system:
- Add fast-path lesson for MCP transport errors (Err branch was missing
  add_fast_lesson, so schema violations were lost between agent cycles)
- Add route_synthesis: plain completion with largest loaded model for
  lesson synthesis (0.8B/3B models can't produce structured JSON reliably)
- Restructure synthesis prompt to focus on action→outcome pairs, strip
  read-only tool calls from episode data before synthesis
- Add is_procedural_lesson filter to reject lessons about observation
  sequencing — the agent loop handles that, lessons should be strategic
- Derive Default for Message and Role to reduce struct literal boilerplate

Model support:
- Add Gemma 4 31B dense model variant (LocalGemma4_31B) across router,
  selector, quality, tool extraction, architect executor, and UI
- Categorize 31B as XLarge tier (15-45s inference, best for background
  synthesis tasks, too slow for real-time agent loops)

llama.cpp:
- Update to e21cdc11a (merged PR #21760: Gemma 4 parsing edge cases)
- Update all CI workflow files to pin new commit
- Fix mtmd_decode_use_non_causal API change (now takes chunk parameter)

RimWorld agent mesh:
- Update commander AGENTS.md with intent-based spatial actions
  (PlaceBuildingNear, EstablishFarm, etc.) and alert→action mappings
- Fix crop name (RawPotatoes → Potato), anchor (Stockpile → MapCenter)
- Add combat response mapping for raids

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p test

- test_feasible_models_gemini_only: GeminiPro was removed from feasible
  set in d422770 but test still asserted its presence
- test_min_feasible_context_size_default: gracefully skip when llama-cpp
  feature is not enabled (CI runs with --no-default-features)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With GeminiPro removed from feasible set, gemini-only gives a single
model (GeminiFlash) which takes the single-model shortcut, bypassing
Thompson Sampling entirely. Test now enables both Gemini and Anthropic
to ensure multiple feasible models.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- prefer_capable parameter unused when llama-cpp feature is off
- ui.rs match missing Gemma 4 E2B/E4B/26B/31B variants

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clippy correctly flags public functions that dereference raw pointers
without being marked unsafe.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rand unsoundness requires a custom logger that accesses ThreadRng
inside the log handler — not applicable to our usage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LlamaModel is only exported from arkavo_llm on non-musl targets, but
any_loaded_model was gated on just llama-cpp feature without the musl
exclusion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
model_registry.get() returns Arc<LlamaModel> which is () on musl.
Both the implementation and fallback cfg gates need to account for
the musl case where llama-cpp feature is on but LlamaModel is not.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Newer clippy on CI (1.94) catches these:
- Default::default() → Map::default() for clarity
- raw pointer as-cast → .cast() for constness safety
- usize as i32 → i32::try_from().unwrap_or() for wrapping safety

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…teln, pub visibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
llama.cpp update to e21cdc11a added Gemma 4 and DeepSeek v3.2 parsers,
pushing the release binary from 59MB to 61MB. Bump limit from 60MB to
65MB to accommodate upstream growth.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@arkavo-com arkavo-com force-pushed the feature/gemma4-support branch from d78c1a3 to c5316e1 Compare April 14, 2026 14:37
@arkavo-com arkavo-com merged commit 82bcbc7 into main Apr 14, 2026
45 of 46 checks passed
@arkavo-com arkavo-com deleted the feature/gemma4-support branch April 14, 2026 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant