Gemma 4 by arkavo-com · Pull Request #566 · arkavo-org/arkavo-edge

arkavo-com · 2026-04-05T03:33:19Z

Summary

Update vendor/llama.cpp to c08d28d08 with all Gemma 4 PRs (#21309, #21326, #21343, #21390, #21406, #21418)
Add ModelFormat::Gemma4 with detection, stop sequences, chat templates across all crates
Add ModelChoice variants for Gemma-4-E2B, E4B, and 26B-A4B with full registry wiring (repo IDs, GGUF filenames, size estimates, escalation paths, detail levels)
Add arkavo_chat_parse FFI exposing llama.cpp's native PEG output parser — provider tries native parser before our fallback chain
Version bump 0.69.1 → 0.70.0

Tool Bench Results (Q4_K_M, Apple Silicon)

Model	Active Params	Parse	Avg Latency
Qwen3.5-0.8B	0.8B	8/8	525ms
Ministral-3-3B	3B	8/8	620ms
Gemma-4-E2B	2.3B (5.1B w/ PLE)	8/8	2,229ms
Gemma-4-26B-A4B	4B (26B MoE)	8/8	7,410ms
Gemma-4-E4B	4.5B (8B w/ PLE)	1/8	blocked

E4B requires non-lazy grammar sampler integration (generation_prompt prefill) which our standalone sampler doesn't support yet. Commented out from bench discovery.

Test plan

cargo build -q compiles cleanly
cargo clippy -- -D warnings passes on changed crates
cargo test -p arkavo-llama-cpp — 18 tests pass (including new Gemma 4 format detection)
cargo test -p arkavo-llm --lib — 209 tests pass
cargo test -p arkavo-torg --lib — 14 tests pass
arkavo tool-bench --model gemma-4-e2b — 8/8
arkavo tool-bench --model gemma-4-26b-a4b — 8/8

🤖 Generated with Claude Code

Update vendor/llama.cpp to c08d28d08 (post-April 4) picking up all Gemma 4 PRs: core model support (#21309), template fixes (#21326), tokenizer fix (#21343), logit softcapping (#21390), newline split (#21406), and dedicated tool-call parser (#21418). Add ModelFormat::Gemma4 with detection, stop sequences, and chat templates. Add ModelChoice variants for E2B, E4B, and 26B-A4B with full registry wiring. Add arkavo_chat_parse FFI exposing llama.cpp's native PEG output parser for Gemma 4 tool calls. Provider tries native parser before fallback chain. Tool bench results (Q4_K_M, Apple Silicon): - Gemma-4-E2B (2.3B active): 8/8, 2,229ms - Gemma-4-26B-A4B (4B active MoE): 8/8, 7,410ms - Gemma-4-E4B (4.5B active): 1/8 — blocked on non-lazy grammar sampler Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-05T03:34:40Z

Spec Coverage Delta

Metric	main	PR	Delta
Coverage	20.6%	20.7%	+0.2%
Scenarios	632	632	0
Covered	130	131	+1
Partial	14	14	0
Missing	464	463	-1

Newly Covered

SRV-010: missing -> covered

Spec Coverage Report

Spec	Scenarios	Covered	Partial	Missing	Coverage
snpe	6	0	0	6	0%
wallet	7	0	0	7	0%
mcp-tools	10	0	0	10	0%
config-encryption	5	0	0	5	0%
git	6	0	0	6	0%
gemini	10	0	0	10	0%
sbe	6	0	0	6	0%
tdf-iroh	8	0	0	8	0%
autolearn	8	0	5	3	31%
titan	7	0	0	7	0%
budget	7	0	0	7	0%
events	8	1	0	7	12%
crypto	11	9	1	1	86%
mcp-mesh	8	0	0	8	0%
tdf-security	15	0	0	15	0%
kimi	10	0	0	10	0%
observability	8	0	0	8	0%
critic	17	0	0	17	0%
mcp-macos	9	0	0	9	0%
protocol	19	5	0	14	26%
ensemble	7	0	0	7	0%
qr-registration	5	3	1	1	70%
llama-cpp	10	4	2	4	50%
config-transport	4	0	0	4	0%
dataflow	6	0	0	6	0%
kv-cache	5	5	0	0	100%
repo	5	0	0	5	0%
browser	6	0	0	6	0%
security	6	4	2	0	83%
agent-sdk	5	4	0	1	80%
ui-generator	6	0	0	6	0%
ucp	8	0	0	8	0%
orchestrator	11	0	0	11	0%
terminal	8	0	0	8	0%
cef	7	0	0	7	0%
agent	5	1	0	4	20%
ui-core	4	0	0	4	0%
agui	12	2	0	10	16%
mcp-runtime	7	0	0	7	0%
device-identity	6	0	0	6	0%
code-search	5	0	0	5	0%
sequence-integrity	17	15	0	2	88%
llm-core	6	0	0	6	0%
evofabric	8	8	0	0	100%
memory	7	0	0	7	0%
router	17	0	0	17	0%
session-security	21	20	1	0	97%
config-bundle	4	0	0	4	0%
mcp-traits	6	2	0	4	33%
context	15	2	1	12	16%
torg-circuits	5	0	0	5	0%
workspace	6	0	0	6	0%
server	10	7	1	2	75%
task-orchestration	8	0	0	8	0%
debugger	6	0	0	6	0%
tdf	9	0	0	9	0%
gossip-protocol	8	7	0	1	87%
openclaw	6	0	0	6	0%
agent-auth	6	0	0	6	0%
qwen	9	0	0	9	0%
network-security	17	8	0	9	47%
chat-session	24	0	0	24	0%
validation	7	7	0	0	100%
registration	12	12	0	0	100%
cli	17	0	0	17	0%
trusted-agent	7	0	0	7	0%
sat	6	0	0	6	0%
torg	8	0	0	8	0%
attestation	6	0	0	6	0%
deepseek	5	0	0	5	0%
hrm	6	5	0	1	83%
authorization	6	0	0	6	0%
github	5	0	0	5	0%
mcp-claude	9	0	0	9	0%
Total	632	131	14	487	20.7%

Quality Gate

✅ Overall coverage: 20.7% (threshold: 1%)

WIP Scenarios (24) — tracked via issues

ID	Spec	Scenario	Issue
SEQ-001	sequence-integrity.spec	Tag data with provenance at ingestion	#549
SEQ-002	sequence-integrity.spec	Propagate taint through data transformations	#549
SEQ-003	sequence-integrity.spec	Block tainted data exfiltration at egress	#548
SEQ-004	sequence-integrity.spec	Build directed action graph per session	#556
SEQ-005	sequence-integrity.spec	Learn behavioral baselines per agent skill	#552
SEQ-006	sequence-integrity.spec	Detect sequence divergence from baseline	#553
SEQ-007	sequence-integrity.spec	Persist sequence fragments to cross-session ledger	#554
SEQ-008	sequence-integrity.spec	Detect multi-session decomposition attacks	#554
SEQ-009	sequence-integrity.spec	Correlate decomposition across agent identities	#554
SEQ-010	sequence-integrity.spec	Evaluate sequence-aware TØR-G circuit before action	#555
SEQ-011	sequence-integrity.spec	Synchronous gate on high-consequence actions	#555
SEQ-012	sequence-integrity.spec	Async detection for low-consequence action coverage	#553
SEQ-013	sequence-integrity.spec	Titan integration for statistical sequence drift	#553
SEQ-014	sequence-integrity.spec	Egress filter enhanced with provenance	#548
SEQ-015	sequence-integrity.spec	Sequence evidence in audit events	#550
SEQ-016	sequence-integrity.spec	Configure sequence integrity per agent role	#551
SEQ-017	sequence-integrity.spec	Handle sequence tracking errors gracefully	#551
TA-001	trusted-agent.spec	Happy path first boot trust establishment	#558
TA-002	trusted-agent.spec	Attestation rejection blocks mesh access	#558
TA-003	trusted-agent.spec	Re-attestation refresh for trusted agent	#558
TA-004	trusted-agent.spec	Budget exhaustion suspends trusted agent	#558
TA-005	trusted-agent.spec	Hard revocation of agent	#558
TA-006	trusted-agent.spec	Config transport failure with retry	#558
TA-007	trusted-agent.spec	Offline agent has no mesh trust	#558

Uncovered Scenarios (478)

ID	Spec	Criticality	Scenario
SNPE-001	snpe.spec	critical	Initialize SNPE runtime dynamically
SNPE-002	snpe.spec	critical	Load DLC model for inference
SNPE-003	snpe.spec	critical	Execute inference on target
SNPE-004	snpe.spec	high	Convert tensors for SNPE format
SNPE-005	snpe.spec	medium	Detect available acceleration targets
SNPE-006	snpe.spec	high	Handle SNPE errors gracefully
WAL-001	wallet.spec	critical	Generate BIP39 mnemonic
WAL-002	wallet.spec	critical	Create HD wallet from mnemonic
WAL-003	wallet.spec	critical	Derive Ethereum keypair
WAL-004	wallet.spec	critical	Build and sign transaction
WAL-005	wallet.spec	high	Recover signer from transaction
WAL-006	wallet.spec	medium	Generate EIP-55 checksummed address
WAL-007	wallet.spec	low	Export to DID key format
MCP-001	mcp-tools.spec	high	Register built-in tools in registry
MCP-002	mcp-tools.spec	high	Discover tools with detail level
MCP-003	mcp-tools.spec	critical	Execute tool with parameters
MCP-004	mcp-tools.spec	high	Filesystem tool operations
MCP-005	mcp-tools.spec	medium	Git tool operations
MCP-006	mcp-tools.spec	medium	GitHub API tool operations
MCP-007	mcp-tools.spec	medium	Code analysis with Semgrep
MCP-008	mcp-tools.spec	high	Shell execution with safety
MCP-009	mcp-tools.spec	low	Web search tool
MCP-010	mcp-tools.spec	high	TDF encryption tool
CFGE-001	config-encryption.spec	high	Create encryptor with KAS URL
CFGE-002	config-encryption.spec	critical	Encrypt configuration bundle
CFGE-003	config-encryption.spec	critical	Decrypt encrypted bundle
CFGE-004	config-encryption.spec	high	Create policy with attributes
CFGE-005	config-encryption.spec	medium	Generate ephemeral keypair
GIT-001	git.spec	medium	Initialize new repository
GIT-002	git.spec	high	Get repository status
GIT-003	git.spec	high	Create commit with AI-generated message
GIT-004	git.spec	medium	Safely undo last commit
GIT-005	git.spec	medium	Sync with upstream remote
GIT-006	git.spec	medium	Create and checkout branch
GEM-001	gemini.spec	critical	Initialize REST client with API key
GEM-002	gemini.spec	high	Execute tool-based conversation
GEM-003	gemini.spec	high	Stream response via SSE
GEM-004	gemini.spec	critical	Establish live session connection
GEM-005	gemini.spec	medium	Send audio content in live session
GEM-006	gemini.spec	high	Handle server tool calls in live session
GEM-007	gemini.spec	high	Register and dispatch tools
GEM-008	gemini.spec	high	Handle Gemini API errors
GEM-009	gemini.spec	medium	Configure generation parameters
GEM-010	gemini.spec	high	Parse streaming response chunks
SBE-001	sbe.spec	critical	Create hierarchical graph with layers
SBE-002	sbe.spec	high	Register nodes to specific layers
SBE-003	sbe.spec	critical	Apply adaptive patchlet
SBE-004	sbe.spec	high	Rollback adaptive changes
SBE-005	sbe.spec	critical	Evaluate hierarchical graph
SBE-006	sbe.spec	high	Define invariant contract
IROH-001	tdf-iroh.spec	critical	Create Iroh transport
IROH-002	tdf-iroh.spec	critical	Stage blob data
IROH-003	tdf-iroh.spec	critical	Fetch blob via ticket
IROH-004	tdf-iroh.spec	high	Serialize and deserialize ticket
IROH-005	tdf-iroh.spec	high	Manage Iroh node lifecycle
IROH-006	tdf-iroh.spec	medium	Configure node parameters
IROH-007	tdf-iroh.spec	high	Handle transport errors
IROH-008	tdf-iroh.spec	high	Integrate with TDF encryptor
AUTO-006	autolearn.spec	medium	Burst feedback for rapid learning
AUTO-007	autolearn.spec	low	Agent contribution tracking
AUTO-008	autolearn.spec	high	Patchlet rollback on degradation
TITAN-001	titan.spec	critical	Create Titan monitor
TITAN-002	titan.spec	critical	Evaluate with anomaly detection
TITAN-003	titan.spec	high	Detect hard failures
TITAN-004	titan.spec	high	Detect boundary violations
TITAN-005	titan.spec	medium	Detect statistical drift
TITAN-006	titan.spec	high	Receive anomaly evidence
TITAN-007	titan.spec	medium	Update EMA accumulator
BUDGET-001	budget.spec	critical	Track token cost for LLM call
BUDGET-002	budget.spec	critical	Enforce budget limit before call
BUDGET-003	budget.spec	high	Model selection based on cost policy
BUDGET-004	budget.spec	high	Alert when threshold exceeded
BUDGET-005	budget.spec	medium	Provider cost configuration
BUDGET-006	budget.spec	medium	Budget status with projections
BUDGET-007	budget.spec	low	Architect savings report
EVENT-001	events.spec	high	Create event with payload
EVENT-002	events.spec	medium	Event types match payloads
EVENT-003	events.spec	medium	Parent-child event relationships
EVENT-004	events.spec	high	Correlation across services
EVENT-006	events.spec	high	Payload serialization
EVENT-007	events.spec	medium	Session lifecycle events
EVENT-008	events.spec	high	Tool call and result events
CRYPTO-011	crypto.spec	high	ECDH key agreement for KAS operations
MESH-001	mcp-mesh.spec	high	Create mesh tools state
MESH-002	mcp-mesh.spec	high	Register mesh tools
MESH-003	mcp-mesh.spec	high	List discovered agents
MESH-004	mcp-mesh.spec	high	Query agents by capability
MESH-005	mcp-mesh.spec	critical	Delegate task to agent
MESH-006	mcp-mesh.spec	high	Get delegated task status
MESH-007	mcp-mesh.spec	medium	Cache discovered agent addresses
MESH-008	mcp-mesh.spec	high	Handle mesh tool errors
TDFS-001	tdf-security.spec	critical	Control plane commands encrypted with TDF
TDFS-002	tdf-security.spec	critical	Configuration bundle TDF encryption
TDFS-003	tdf-security.spec	high	TDF-JSON format for API compatibility
TDFS-004	tdf-security.spec	high	TDF-CBOR format for efficiency
TDFS-005	tdf-security.spec	medium	Format negotiation between agents
TDFS-006	tdf-security.spec	critical	Policy binding prevents policy tampering
TDFS-007	tdf-security.spec	high	Key escrow for data recovery
TDFS-008	tdf-security.spec	critical	Attribute authority verification
TDFS-009	tdf-security.spec	high	Time-based policy enforcement
TDFS-010	tdf-security.spec	high	Data residency enforcement
TDFS-011	tdf-security.spec	critical	Secure key hierarchy
TDFS-012	tdf-security.spec	medium	Offline policy evaluation
TDFS-013	tdf-security.spec	critical	Forward secrecy for key agreement
TDFS-014	tdf-security.spec	high	Policy update propagation
TDFS-015	tdf-security.spec	medium	TDF payload obfuscation
KIMI-001	kimi.spec	critical	Send native Kimi chat request
KIMI-002	kimi.spec	critical	Stream responses with native format
KIMI-003	kimi.spec	medium	Select model by context needs
KIMI-004	kimi.spec	high	Retry failed requests with backoff
KIMI-005	kimi.spec	medium	Handle partial message generation
KIMI-006	kimi.spec	critical	Use Kimi K2.5 series models
KIMI-007	kimi.spec	medium	Enable thinking mode on K2.5
KIMI-008	kimi.spec	medium	Disable thinking mode on K2.5
KIMI-009	kimi.spec	medium	Stream with reasoning content
KIMI-010	kimi.spec	medium	Select appropriate model variant
OBS-001	observability.spec	high	Initialize observability with config
OBS-002	observability.spec	high	Session metrics track active sessions
OBS-003	observability.spec	medium	Metrics collector aggregates globally
OBS-004	observability.spec	high	Health reporter checks components
OBS-005	observability.spec	medium	Task tracker monitors async operations
OBS-006	observability.spec	low	Agent detection identifies AI agents
OBS-007	observability.spec	medium	OTLP export when collector available
OBS-008	observability.spec	medium	Metrics snapshot captures current state
CRIT-001	critic.spec	critical	Create default verification pipeline
CRIT-002	critic.spec	high	Add custom check to pipeline
CRIT-003	critic.spec	critical	Run circuit check
CRIT-004	critic.spec	high	Run schema validation check
CRIT-005	critic.spec	medium	Run semantic coherence check
CRIT-006	critic.spec	high	Collect verification evidence
CRIT-007	critic.spec	medium	Judge response quality
CRIT-008	critic.spec	medium	Configure critic behavior
CRIT-009	critic.spec	high	Analyze response for code fence issues
CRIT-010	critic.spec	high	Detect output loops in model responses
CRIT-011	critic.spec	high	Record feedback as learning episode
CRIT-012	critic.spec	medium	Check for pattern-based prompt adjustment
CRIT-013	critic.spec	medium	Detect wrong expert routing for GLM models
CRIT-014	critic.spec	medium	Extract first answer from loopy response
CRIT-015	critic.spec	high	Record timeout feedback
CRIT-016	critic.spec	medium	Get model issue counts by category
CRIT-017	critic.spec	low	Detect model family from name
MCPM-001	mcp-macos.spec	critical	Initialize test harness
MCPM-002	mcp-macos.spec	high	Parse Gherkin feature file
MCPM-003	mcp-macos.spec	critical	Execute test scenario
MCPM-004	mcp-macos.spec	high	Manage execution state
MCPM-005	mcp-macos.spec	high	Launch iOS simulator
MCPM-006	mcp-macos.spec	high	Integrate with MCP protocol
MCPM-007	mcp-macos.spec	medium	Generate test report
MCPM-008	mcp-macos.spec	medium	Use AI for test assistance
MCPM-009	mcp-macos.spec	medium	Handle memory operations
PROTO-004	protocol.spec	high	Handle agent discovery via mDNS
PROTO-005	protocol.spec	high	Bridge A2A to MCP
PROTO-007	protocol.spec	high	Enforce rate limiting
PROTO-009	protocol.spec	medium	Collect RPC metrics
PROTO-010	protocol.spec	high	Handle protocol errors
PROTO-011	protocol.spec	high	Create peer manager with configuration
PROTO-012	protocol.spec	high	Connect to peer with HTTP transport
PROTO-013	protocol.spec	high	Connect to peer with WebSocket transport
PROTO-014	protocol.spec	high	Auto-upgrade transport for streaming methods
PROTO-015	protocol.spec	high	Broadcast message to all connected peers
PROTO-016	protocol.spec	high	Send request to specific peer
PROTO-017	protocol.spec	medium	Get connected peer information
PROTO-018	protocol.spec	medium	Check peer connection status
PROTO-019	protocol.spec	medium	Connect to multiple peers at once
ENS-001	ensemble.spec	critical	Create policy ensemble with production policy
ENS-002	ensemble.spec	high	Add candidate policy to ensemble
ENS-003	ensemble.spec	critical	Evaluate counterfactually on real input
ENS-004	ensemble.spec	high	Accumulate regret over attribution window
ENS-005	ensemble.spec	critical	Check for promotion candidates
ENS-006	ensemble.spec	medium	Generate candidate via LLM synthesis
ENS-007	ensemble.spec	medium	Compute weighted cost across multiple objectives
QREG-003	qr-registration.spec	high	Descriptor with entitlements
LLAMA-003	llama-cpp.spec	critical	Create inference context
LLAMA-005	llama-cpp.spec	high	Tokenize and detokenize round-trip
LLAMA-008	llama-cpp.spec	medium	Parameters fit check
LLAMA-009	llama-cpp.spec	medium	Musl target stub behavior
CFGT-001	config-transport.spec	high	Create transport client
CFGT-002	config-transport.spec	critical	Send encrypted bundle to agent
CFGT-003	config-transport.spec	critical	Receive bundle on transport server
CFGT-004	config-transport.spec	medium	Request config from agent
DATA-001	dataflow.spec	high	Create pipeline from blueprint
DATA-002	dataflow.spec	high	Create pipeline from natural language
DATA-003	dataflow.spec	high	Start pipeline execution
DATA-004	dataflow.spec	high	Stop pipeline gracefully
DATA-005	dataflow.spec	medium	Export blueprint to JSON
DATA-006	dataflow.spec	medium	Import blueprint from JSON
REPO-001	repo.spec	high	Get repository info from path
REPO-002	repo.spec	medium	Calculate file count recursively
REPO-003	repo.spec	medium	Extract git metadata
REPO-004	repo.spec	low	Detect primary language
REPO-005	repo.spec	high	Build repository context for agent
BROWS-001	browser.spec	high	Create browser tool instance
BROWS-002	browser.spec	high	Navigate to URL
BROWS-003	browser.spec	high	Inject script into page
BROWS-004	browser.spec	medium	Click element by selector
BROWS-005	browser.spec	medium	Extract page content
BROWS-006	browser.spec	high	Handle browser errors
CASDK-003	agent-sdk.spec	high	MCP tool registration via rmcp
UIG-001	ui-generator.spec	high	Initialize UI generator
UIG-002	ui-generator.spec	critical	Generate UI from intent
UIG-003	ui-generator.spec	medium	Build generation prompt
UIG-004	ui-generator.spec	medium	Render generated code
UIG-005	ui-generator.spec	low	Track generation metadata
UIG-006	ui-generator.spec	medium	Stream UI generation progress
UCP-001	ucp.spec	critical	Create payment intent
UCP-002	ucp.spec	critical	Evaluate payment against commerce policy
UCP-003	ucp.spec	critical	Execute budget payment (USD)
UCP-004	ucp.spec	critical	Execute EVM payment (ETH)
UCP-005	ucp.spec	high	Track payment status
UCP-006	ucp.spec	high	Complete payment lifecycle
UCP-007	ucp.spec	medium	Register MCP tools for payments
UCP-008	ucp.spec	low	Get payment statistics
ORCH-001	orchestrator.spec	critical	Initialize orchestrator
ORCH-002	orchestrator.spec	critical	Analyze GitHub issue
ORCH-003	orchestrator.spec	critical	Route issue to execution strategy
ORCH-004	orchestrator.spec	critical	Create execution plan with cognitive engine
ORCH-005	orchestrator.spec	critical	Execute plan with verification
ORCH-006	orchestrator.spec	high	Assign agents to tasks
ORCH-007	orchestrator.spec	high	Process code chunks
ORCH-008	orchestrator.spec	high	Solve code problems
ORCH-009	orchestrator.spec	high	Create collaborative task plan
ORCH-010	orchestrator.spec	critical	Handle GitHub webhook
ORCH-011	orchestrator.spec	high	Handle orchestrator errors
TERM-001	terminal.spec	critical	Run terminal UI application
TERM-002	terminal.spec	high	Handle application events
TERM-003	terminal.spec	high	Send LLM request
TERM-004	terminal.spec	high	Receive LLM response
TERM-005	terminal.spec	medium	Spawn multiple terminals
TERM-006	terminal.spec	medium	Render diff view
TERM-007	terminal.spec	low	Integrate Vim editor
TERM-008	terminal.spec	low	Integrate Helix editor
CEF-001	cef.spec	critical	Spawn CEF renderer process
CEF-002	cef.spec	high	Execute DOM command
CEF-003	cef.spec	high	Track command health
CEF-004	cef.spec	medium	Handle async DOM operations
CEF-005	cef.spec	medium	Receive DOM events
CEF-006	cef.spec	critical	Communicate via UDS transport
CEF-007	cef.spec	high	Handle CEF errors
AGENT-001	agent.spec	critical	Load and validate agent configuration
AGENT-002	agent.spec	critical	Register agent with control plane
AGENT-003	agent.spec	high	Discover peers via mDNS
AGENT-004	agent.spec	medium	Report device capabilities
UIC-001	ui-core.spec	high	Create UI content
UIC-002	ui-core.spec	high	Handle UI event
UIC-003	ui-core.spec	medium	Integrate with LLM for UI generation
UIC-004	ui-core.spec	medium	Adapt UI for different backends
AGUI-001	agui.spec	critical	Initialize AGUI gateway
AGUI-002	agui.spec	high	Discover agents via mDNS
AGUI-003	agui.spec	high	Establish agent connection
AGUI-004	agui.spec	medium	Collect command health data
AGUI-005	agui.spec	medium	Analyze timeout patterns
AGUI-006	agui.spec	low	Calculate ROI metrics
AGUI-007	agui.spec	high	Handle UI events
AGUI-008	agui.spec	medium	Stream dataflow updates
AGUI-011	agui.spec	high	Cache context topology per agent
AGUI-012	agui.spec	high	Push context utilization via telemetry stream
MCPR-001	mcp-runtime.spec	critical	Create MCP server
MCPR-002	mcp-runtime.spec	critical	Accept client connection
MCPR-003	mcp-runtime.spec	critical	Execute tool with timeout
MCPR-004	mcp-runtime.spec	high	Create stdio transport
MCPR-005	mcp-runtime.spec	high	Create SSE transport
MCPR-006	mcp-runtime.spec	medium	Poll endpoint with adaptive backoff
MCPR-007	mcp-runtime.spec	high	Handle runtime errors
DEVICE-001	device-identity.spec	high	Get or create device ID on first launch
DEVICE-002	device-identity.spec	high	Retrieve existing device ID
DEVICE-003	device-identity.spec	medium	Store device ID explicitly
DEVICE-004	device-identity.spec	high	Create agent identity with device
DEVICE-005	device-identity.spec	medium	Device ID roundtrip conversion
DEVICE-006	device-identity.spec	high	Platform-specific secure storage
CS-001	code-search.spec	high	Register code search tools
CS-002	code-search.spec	high	Search code with regex pattern
CS-003	code-search.spec	high	Perform structural refactoring with Comby
CS-004	code-search.spec	high	Parse code with TreeSitter
CS-005	code-search.spec	medium	Handle code search errors
LLM-001	llm-core.spec	critical	Send chat request to provider
LLM-002	llm-core.spec	critical	Stream chat responses
LLM-003	llm-core.spec	high	Parse tool calls from response
LLM-004	llm-core.spec	high	Execute tool with results
LLM-005	llm-core.spec	medium	Switch provider dynamically
LLM-006	llm-core.spec	high	Handle provider errors gracefully
MEM-001	memory.spec	high	Store memory with embedding
MEM-002	memory.spec	high	Search memories by semantic similarity
MEM-003	memory.spec	high	Context ledger appends conversation turns
MEM-004	memory.spec	high	Retrieve conversation context
MEM-005	memory.spec	medium	Plan state persistence across restarts
MEM-006	memory.spec	medium	Orchestrator state tracks issue processing
MEM-007	memory.spec	medium	Workspace config per organization
ROUTER-001	router.spec	critical	Route task to optimal model
ROUTER-002	router.spec	critical	Quality gate with retry and escalation
ROUTER-003	router.spec	high	Offline mode restricts to local models
ROUTER-004	router.spec	critical	Preflight moderation blocks policy violations
ROUTER-005	router.spec	high	Stream backpressure handling
ROUTER-006	router.spec	medium	Model discovery finds available models
ROUTER-007	router.spec	high	Connectivity checker marks providers unavailable
ROUTER-008	router.spec	high	Tool request parsing and routing
ROUTER-009	router.spec	medium	RLM decomposition for complex tasks
ROUTER-010	router.spec	medium	Architect planner for multi-step workflows
ROUTER-011	router.spec	high	Strip think blocks from response
ROUTER-012	router.spec	high	Strip tool blocks from response
ROUTER-013	router.spec	high	Sanitize response output
ROUTER-014	router.spec	high	Apply sampling parameters
ROUTER-015	router.spec	medium	Handle vision queries with image input
ROUTER-016	router.spec	medium	Estimate token count for context
ROUTER-017	router.spec	medium	Get model context size
CFG-001	config-bundle.spec	high	Create configuration bundle
CFG-002	config-bundle.spec	critical	Validate bundle before distribution
CFG-003	config-bundle.spec	medium	Define agent role with capabilities
CFG-004	config-bundle.spec	high	Grant resource entitlements
MCPD-001	mcp-traits.spec	critical	Implement Tool trait
MCPD-002	mcp-traits.spec	critical	JSON-RPC request/response cycle
MCPD-003	mcp-traits.spec	high	Tool schema validation
MCPD-006	mcp-traits.spec	high	McpClient remote tool discovery
CTX-001	context.spec	high	Create semantic chunker
CTX-002	context.spec	critical	Chunk document semantically
CTX-003	context.spec	high	Deduplicate chunks
CTX-004	context.spec	high	Compress context with summarization
CTX-005	context.spec	medium	Build compression pipeline
CTX-006	context.spec	high	Enrich prompt with context
CTX-008	context.spec	high	Calculate context offload threshold by model size
CTX-009	context.spec	high	Offload context to ledger when threshold exceeded
CTX-010	context.spec	high	Restore archived context from ledger
CTX-011	context.spec	medium	Generate summary for archived context
CTX-012	context.spec	medium	Skip offload for small contexts
CTX-013	context.spec	medium	Estimate tokens for content
TRC-001	torg-circuits.spec	critical	Compile circuit from graph and features
TRC-002	torg-circuits.spec	critical	Evaluate circuit with feature extraction
TRC-003	torg-circuits.spec	high	Implement CircuitFeature trait
TRC-004	torg-circuits.spec	high	Thread-safe concurrent evaluations
TRC-005	torg-circuits.spec	medium	Zero-allocation evaluation performance
WORKSPACE-001	workspace.spec	high	Create isolated workspace container
WORKSPACE-002	workspace.spec	critical	Execute command in workspace
WORKSPACE-003	workspace.spec	high	Clone repository into workspace
WORKSPACE-004	workspace.spec	high	Enforce resource quotas
WORKSPACE-005	workspace.spec	medium	Cleanup workspace after use
WORKSPACE-006	workspace.spec	medium	Detect container runtime
SRV-007	server.spec	high	Agent goal lifecycle
SRV-008	server.spec	medium	Well-known state endpoint
ORCH-001	task-orchestration.spec	high	Plan task with dependencies
ORCH-002	task-orchestration.spec	critical	Execute task with executor
ORCH-003	task-orchestration.spec	high	Store task state persistently
ORCH-004	task-orchestration.spec	high	Retry failed task with backoff
ORCH-005	task-orchestration.spec	medium	Human review for ambiguous tasks
ORCH-006	task-orchestration.spec	medium	Parallel subtask execution
ORCH-007	task-orchestration.spec	medium	Task cancellation
ORCH-008	task-orchestration.spec	low	Task progress tracking
DBG-001	debugger.spec	high	Start session recording
DBG-002	debugger.spec	high	Replay recorded session
DBG-003	debugger.spec	medium	Analyze error patterns
DBG-004	debugger.spec	medium	Generate health report
DBG-005	debugger.spec	high	Check system health
DBG-006	debugger.spec	medium	Handle debugger errors
TDF-001	tdf.spec	critical	Encrypt data with policy
TDF-002	tdf.spec	critical	Decrypt data with KAS rewrap
TDF-003	tdf.spec	high	Policy builder with ABAC attributes
TDF-004	tdf.spec	critical	ABAC evaluation for access decision
TDF-005	tdf.spec	high	Delegation token verification
TDF-006	tdf.spec	high	Streaming encryption for large files
TDF-007	tdf.spec	medium	Blob transport stages encrypted payload
TDF-008	tdf.spec	high	A2A KAS handler processes rewrap requests
TDF-009	tdf.spec	medium	OpenTDF integration for standard compliance
GOSSIP-003	gossip-protocol.spec	high	Deduplicate messages by content hash
OCLAW-001	openclaw.spec	critical	WebSocket handshake with device auth
OCLAW-002	openclaw.spec	critical	Frame translation OpenClaw to A2A
OCLAW-003	openclaw.spec	critical	Frame translation A2A to OpenClaw
OCLAW-004	openclaw.spec	critical	Device identity verification
OCLAW-005	openclaw.spec	high	Listener accepts connections
OCLAW-006	openclaw.spec	high	Dispatcher routes events
AAUTH-001	agent-auth.spec	critical	Request authentication token
AAUTH-002	agent-auth.spec	critical	Store token securely
AAUTH-003	agent-auth.spec	high	Load stored token
AAUTH-004	agent-auth.spec	high	Refresh expired token
AAUTH-005	agent-auth.spec	critical	Complete challenge-response auth
AAUTH-006	agent-auth.spec	medium	Delete token on logout
QWE-001	qwen.spec	critical	Initialize Qwen client with region
QWE-002	qwen.spec	high	Execute chat completion
QWE-003	qwen.spec	high	Stream chat completion responses
QWE-004	qwen.spec	medium	Process vision input with image
QWE-005	qwen.spec	high	Execute tool calls in conversation
QWE-006	qwen.spec	medium	Switch between Qwen models
QWE-007	qwen.spec	high	Handle Qwen API errors
QWE-008	qwen.spec	low	Create image URL from file path
QWE-009	qwen.spec	medium	Format messages for provider
NET-003	network-security.spec	high	Public binding requires explicit acknowledgment
NET-010	network-security.spec	high	Rate limiting per IP
NET-011	network-security.spec	high	Admin interface separate from public API
NET-012	network-security.spec	high	Security configuration audit on startup
NET-013	network-security.spec	medium	Service fingerprint minimization
NET-014	network-security.spec	high	DNS resolution validation before connection
NET-015	network-security.spec	critical	Prompt injection attack prevention
NET-016	network-security.spec	critical	Command injection via LLM output is prevented
NET-017	network-security.spec	high	Egress audit logging
CHAT-001	chat-session.spec	high	Create authenticated chat session
CHAT-002	chat-session.spec	high	Send message to active session
CHAT-003	chat-session.spec	high	Reject message to non-active session
CHAT-004	chat-session.spec	high	Stream LLM deltas with back-pressure
CHAT-005	chat-session.spec	medium	Process metrics acknowledgment
CHAT-006	chat-session.spec	high	Close session gracefully
CHAT-007	chat-session.spec	medium	TTL cleaner removes expired sessions
CHAT-008	chat-session.spec	high	Get delta stream for session
CHAT-009	chat-session.spec	high	Handle session with router and tools
CHAT-010	chat-session.spec	medium	Session enters zombie state on abnormal exit
CHAT-011	chat-session.spec	high	Handle router service unavailability
CHAT-012	chat-session.spec	high	Handle tool execution timeout
CHAT-013	chat-session.spec	medium	Reject malformed delta message
CHAT-014	chat-session.spec	high	Create conversation manager with storage
CHAT-015	chat-session.spec	high	Start conversation session with metadata
CHAT-016	chat-session.spec	high	Restore last conversation session with compatibility check
CHAT-017	chat-session.spec	high	Add message to conversation
CHAT-018	chat-session.spec	high	Get context messages with limits
CHAT-019	chat-session.spec	medium	Create conversation summary
CHAT-020	chat-session.spec	medium	List available conversation sessions
CHAT-021	chat-session.spec	medium	Switch to different conversation session
CHAT-022	chat-session.spec	medium	Get session statistics
CHAT-023	chat-session.spec	low	Clear current session
CHAT-024	chat-session.spec	high	Sanitize message content for small models
CLI-001	cli.spec	high	Initialize CLI with tracing
CLI-002	cli.spec	high	Execute first-run flow
CLI-003	cli.spec	high	Dispatch agent command
CLI-004	cli.spec	high	Dispatch chat command
CLI-005	cli.spec	high	Dispatch task command
CLI-006	cli.spec	medium	Handle missing command (default to agent)
CLI-007	cli.spec	low	Platform-specific test command
CLI-008	cli.spec	high	Handle /new command in REPL
CLI-009	cli.spec	high	Handle /clear command in REPL
CLI-010	cli.spec	medium	Handle /context command in REPL
CLI-011	cli.spec	medium	Handle /history command in REPL
CLI-012	cli.spec	medium	Handle /switch command in REPL
CLI-013	cli.spec	medium	Handle /read command in REPL
CLI-014	cli.spec	medium	Handle /list command in REPL
CLI-015	cli.spec	high	Handle chat with model parameters
CLI-016	cli.spec	medium	Handle chat with image for vision
CLI-017	cli.spec	high	Handle /exit or /quit command
SAT-001	sat.spec	critical	Extract CNF from TØRG policy graph
SAT-002	sat.spec	high	Find boundary probes for output
SAT-003	sat.spec	high	Stress test policy for holes
SAT-004	sat.spec	medium	Cache boundary probe results
SAT-005	sat.spec	medium	Schedule probe tasks with CPU budget
SAT-006	sat.spec	medium	Prioritize anomalies for probing
TORG-001	torg.spec	high	Create Qwen3 token mapping from vocabulary
TORG-002	torg.spec	high	Create Ministral token mapping
TORG-003	torg.spec	critical	Initialize TorgLlamaSampler
TORG-004	torg.spec	critical	Get logit bias for current decoder state
TORG-005	torg.spec	critical	Feed sampled token to advance state
TORG-006	torg.spec	critical	Finish sampling and extract graph
TORG-007	torg.spec	critical	Evaluate graph on inputs
TORG-008	torg.spec	medium	Format TØRG system prompt
ATT-001	attestation.spec	high	Detect platform code
ATT-002	attestation.spec	critical	Create platform evidence from identity
ATT-003	attestation.spec	critical	Collect evidence via platform attestor
ATT-004	attestation.spec	high	Get security state
ATT-005	attestation.spec	medium	Report attestation capabilities
ATT-006	attestation.spec	high	Validate evidence freshness
DEEP-001	deepseek.spec	critical	Send chat completion request
DEEP-002	deepseek.spec	critical	Stream chat responses via SSE
DEEP-003	deepseek.spec	medium	Use reasoning model for complex tasks
DEEP-004	deepseek.spec	high	Handle API errors with retry
DEEP-005	deepseek.spec	medium	Strict mode for JSON output
HRM-001	hrm.spec	critical	Create conductor with task store
AUTHZ-001	authorization.spec	critical	Get decision for single resource access
AUTHZ-002	authorization.spec	high	Check bulk resource permissions
AUTHZ-003	authorization.spec	high	Multi-resource authorization request
AUTHZ-004	authorization.spec	medium	Decision caching improves performance
AUTHZ-005	authorization.spec	high	Entity identification from JWT
AUTHZ-006	authorization.spec	medium	Resource specification with attributes
GITHUB-001	github.spec	critical	Authenticate as GitHub App
GITHUB-002	github.spec	high	Create issue with labels
GITHUB-003	github.spec	medium	Poll organization for new issues
GITHUB-004	github.spec	high	Handle issue with AI-generated response
GITHUB-005	github.spec	high	Merge pull request with checks
MCPC-001	mcp-claude.spec	critical	Check authentication availability
MCPC-002	mcp-claude.spec	critical	Initialize Claude Code capability
MCPC-003	mcp-claude.spec	high	Register Claude Code tools
MCPC-004	mcp-claude.spec	high	Configure tool permissions
MCPC-005	mcp-claude.spec	high	Bridge SDK to MCP protocol
MCPC-006	mcp-claude.spec	medium	Map events to Claude format
MCPC-007	mcp-claude.spec	critical	Enforce policy on requests
MCPC-008	mcp-claude.spec	high	Handle Claude SDK errors
MCPC-009	mcp-claude.spec	medium	Load configuration from file

Addresses three critical bugs from RimWorld Gemma 4 testing: A2A messages bypassing MCP tool pipeline, conversation context resetting every cycle, and fragmented event processing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

send_advisory_task confirmed self-contained (mesh_state + protocol only). BPE merge table sourced from Llama 3.1 tokenizer.json (Apache 2.0). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

12 tasks across 3 phases. Phase 1: standalone modules (agent_event, token_estimator, conversation_window). Phase 2: event loop integration with overnight test gate. Phase 3: ToolMemory cleanup. Updated spec: LlamaTokenEstimator wraps loaded model's tokenizer instead of vendoring BPE merge table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…dingMessage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…sationWindow Add two new Router methods for Phase 2 agent loop context budgeting: - min_feasible_context_size(): iterates loaded models via model_registry.model_names(), returns minimum trained context size (default 4096 when no models loaded) - any_loaded_model(): returns Arc<LlamaModel> from first loaded model for token estimation Also re-exports LlamaModel from arkavo-llm so router can reference it without a direct arkavo-llama-cpp dependency. Both methods are feature-gated behind llama-cpp with a fallback returning 4096 for non-llama builds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… history Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Gemma/Llama chat templates require alternating user/assistant roles. When error cycles push user messages without assistant responses, consecutive user messages break the template. build_messages() now merges consecutive same-role messages before returning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Narrow pub to pub(super) for ConversationWindow and MockEstimator. Remove ToolMemory.pending_instructions (never written to or read). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two fixes found during playtest: - Conductor: augment last user message with learning guidance when existing_messages is provided (was computing but discarding it) - Agent loop: pass purpose as system_prompt for classification_content hint (was None, losing domain context for the classifier) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The classifier's task.len() > 300 rule falsely classified every orchestrator cycle as complex (cycle prompts include ToolMemory output), causing 7+ minute startup from unnecessary task decomposition. Replace with a 0.8B model call via route_chat (chat_semaphore, won't block main inference). The model classifies SINGLE vs MULTI in ~50ms. Falls back to the heuristic classifier on timeout or model error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Clarify that sequential workflows (register→observe→act) are SINGLE. Default to SINGLE on error/timeout — false negatives are cheap, false positives cost 80+ seconds of decomposition overhead. Bump logging to INFO for production visibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The RPC handler was constructed before start_orchestrator_loop set the event sender, so it cloned None and never received A2A messages through the event channel. A2A messages kept spawning separate conductor calls, racing the orchestrator for the GPU. Now A2aRpcImpl holds Arc<Mutex<Option<Sender>>> (shared reference) instead of Option<Sender> (snapshot). The handler locks the mutex at call time to get the live sender. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Both stdio and HTTP MCP clients only checked JSON-RPC level errors (response.error) and ignored the MCP-spec isError field in CallToolResult. Tool errors like "Serialization error: missing field AgentType" were returned as Ok({"result": "error text"}), making the executor and ToolMemory treat them as successful calls. Now returns Err when isError is true, so the executor records it as a failure and the model gets error feedback for retry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Gemma-4-E2B (2.3B active, 8/8 tool accuracy) replaces Ministral-3B as the preferred fast model for judge, synthesis, and classification tasks. Falls back to Ministral-3B if Gemma 4 is not cached. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Gemma-4-E2B, E4B, and 26B-A4B were missing from feasible_models(), so Thompson Sampling never considered them as candidates. The models loaded but only the hardcoded Ministral/Qwen variants were eligible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Gemma-4-E4B produces non-lazy GBNF grammar that our standalone sampler can't handle, resulting in 1/8 tool calling accuracy. Thompson Sampling wastes 30+ seconds on validation retries before learning to avoid it. Exclude until PEG output parser is ready. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace format_for_prompt() with format_control_signals() -> Option<String>. ToolMemory now emits only derived signals the model can't see in raw conversation history: setup state, duplicate warnings, action variety, and error pattern escalation. Silent (None) when everything is fine. ConversationWindow carries the raw history. Control signals go through system_suffix — separate token budget, not in the cycle prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Case-insensitive model name matching in from_name() - Strip "call:" prefix from Gemma 4 curly-brace tool call format - Planner waits for executor/judge feedback before next round Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Specialists commented out in launch script until they have proactive mesh tool usage. Commander AGENTS.md updated for Gemma 4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

On single GPU, the judge's route_fast() call contended with the planner for Metal compute, adding 3-8s per tool result to the feedback loop. Replace with condense_tool_result() which extracts Delta sections and truncates — zero GPU time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Orchestrator cycle prompts and notification events are always single tasks — the LLM complexity assessment (route_chat GPU call) was pure overhead. New skip_complexity parameter bypasses it for these callers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Try structural condensation first (Delta extraction, free). If it meaningfully reduces size (>50% reduction), use it. Otherwise fall back to LLM distillation for unstructured text. Feedback budget increased from 200 to 800 chars so the planner sees useful data. Works for any MCP server output, not just game-rl JSON. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The raw ModelRegistry path hung for 15+ min on 26B MoE due to Metal shader compilation. The Router path matches what `arkavo chat` uses — same model loading, context pool, and inference semaphore. Verified: Ministral-3B completes 3 scenarios in 5.4s via Router path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Round 0 (planning) keeps full budget: thinking on, 16K tokens, full schema. Round 1+ (execution) switches to execution profile: temp 0.1, thinking off, max 200 tokens. Execution mode now respects model hints from AGENTS.md instead of always falling back to the fastest local model. This fixes the 26B MoE generating 13.5K tokens on round 1 when it only needed ~200 for a tool call. Expected round 1 time: ~3-4s instead of 7min. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Registration responses carry 5KB+ ActionSpace/ObservationSpace schemas that blow up context windows (14K tokens after Jinja expansion). The condenser now replaces large arrays (>5 items) and objects (>5 fields, >500 chars) with count summaries like "[30 items]" or "{6 fields}". This fixes the GPU fault at position 16384 caused by context overflow when the 26B model processes registerAgent results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…flow Round 1 passes None for tool_registry so the Jinja template doesn't inject 8 tool schemas (which expand 567 content tokens to 16K+ actual tokens, causing GPU faults at position 16384). The model already has tool schemas from round 0's conversation history. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…esult roles RLM context size check now uses the actual model hint instead of defaulting to 8K, preventing false RLM activation on models with larger context windows (e.g., gemma-4-26b-a4b at 16K). Also eliminates a redundant RlmBridge instance that was created just for system prompt generation. Schema stripping in condense_tool_result now uses a generic is_schema_shaped() heuristic that detects JSON Schema patterns (arrays of objects with type+description fields) instead of stripping any large object. Observation data (colonists, resources, alerts) now survives condensation (791 chars vs 92 chars previously), giving the planner enough context to formulate actions. Parallel planner tool results now use Message::tool_result() with proper call_id and tool_name instead of Message::user(). Jinja templates (especially Gemma 4) render correct <|tool_response> tokens, fixing garbled model output on round 1+ caused by missing conversation context. Commander AGENTS.md updated to use gemma-4-26b-a4b model with colony-lost reset policy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arkavo-com · 2026-04-13T15:27:25Z

Tested llama.cpp PR ggml-org/llama.cpp#21760

Cherry-picked all 5 commits onto our vendored llama.cpp (c08d28d08). The PR fixes three Gemma 4 26B A4B parsing edge cases:

Missing <|turn>model\n generation prompt after content+tool_call turns
Greedy content rule eating <channel|> tokens before <|tool_call> can match
Duplicate <|channel> tokens at generation start

Test results

Round 0 PEG parsing: Works correctly — native tool call extraction on every cycle
Round 1 PEG parsing: Now works after fixing route_with_tools_execution to bypass classify() and pass all tools directly to the Jinja template (was our bug, not the PR's)
Grammar output: Confirms scan-to-toolcall rule and <channel|> in content stop set are active
Live gameplay: Multi-cycle tool calling confirmed with registerAgent → reset → observe → step sequences

Our fix needed alongside

route_with_tools_execution was routing through classify() + keyword-based tool search, which returned 0 tools on round 1+. This caused the Jinja template to render without tools (format=2 generic instead of format=3 Gemma4). Rewrote to bypass classification entirely for execution mode — directly passes all tools with NameAndDescription detail level.

Recommendation

Approve the upstream PR. The PEG parser fixes are correct and working.

The Err branch of execute_tool_calls was missing add_fast_lesson, so schema violations (e.g. missing "Type" field) were never persisted as corrective lessons. The model saw the error within one tool loop iteration but lost it when the conversation cleared between agent cycles, repeating the same mistake indefinitely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…cdc11a Learning system: - Add fast-path lesson for MCP transport errors (Err branch was missing add_fast_lesson, so schema violations were lost between agent cycles) - Add route_synthesis: plain completion with largest loaded model for lesson synthesis (0.8B/3B models can't produce structured JSON reliably) - Restructure synthesis prompt to focus on action→outcome pairs, strip read-only tool calls from episode data before synthesis - Add is_procedural_lesson filter to reject lessons about observation sequencing — the agent loop handles that, lessons should be strategic - Derive Default for Message and Role to reduce struct literal boilerplate Model support: - Add Gemma 4 31B dense model variant (LocalGemma4_31B) across router, selector, quality, tool extraction, architect executor, and UI - Categorize 31B as XLarge tier (15-45s inference, best for background synthesis tasks, too slow for real-time agent loops) llama.cpp: - Update to e21cdc11a (merged PR #21760: Gemma 4 parsing edge cases) - Update all CI workflow files to pin new commit - Fix mtmd_decode_use_non_causal API change (now takes chunk parameter) RimWorld agent mesh: - Update commander AGENTS.md with intent-based spatial actions (PlaceBuildingNear, EstablishFarm, etc.) and alert→action mappings - Fix crop name (RawPotatoes → Potato), anchor (Stockpile → MapCenter) - Add combat response mapping for raids Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…p test - test_feasible_models_gemini_only: GeminiPro was removed from feasible set in d422770 but test still asserted its presence - test_min_feasible_context_size_default: gracefully skip when llama-cpp feature is not enabled (CI runs with --no-default-features) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

With GeminiPro removed from feasible set, gemini-only gives a single model (GeminiFlash) which takes the single-model shortcut, bypassing Thompson Sampling entirely. Test now enables both Gemini and Anthropic to ensure multiple feasible models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- prefer_capable parameter unused when llama-cpp feature is off - ui.rs match missing Gemma 4 E2B/E4B/26B/31B variants Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Clippy correctly flags public functions that dereference raw pointers without being marked unsafe. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

rand unsoundness requires a custom logger that accesses ThreadRng inside the log handler — not applicable to our usage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LlamaModel is only exported from arkavo_llm on non-musl targets, but any_loaded_model was gated on just llama-cpp feature without the musl exclusion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

model_registry.get() returns Arc<LlamaModel> which is () on musl. Both the implementation and fallback cfg gates need to account for the musl case where llama-cpp feature is on but LlamaModel is not. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Newer clippy on CI (1.94) catches these: - Default::default() → Map::default() for clarity - raw pointer as-cast → .cast() for constness safety - usize as i32 → i32::try_from().unwrap_or() for wrapping safety Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…teln, pub visibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

llama.cpp update to e21cdc11a added Gemma 4 and DeepSeek v3.2 parsers, pushing the release binary from 59MB to 61MB. Bump limit from 60MB to 65MB to accommodate upstream growth. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arkavo-com and others added 28 commits April 5, 2026 11:52

Spec: close gaps on send_advisory_task deps and BPE merge table source

199b68f

send_advisory_task confirmed self-contained (mesh_state + protocol only). BPE merge table sourced from Llama 3.1 tokenizer.json (Apache 2.0). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Agent event types for unified loop — CycleReceipt, CorrelationId, Pen…

f457550

…dingMessage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add doc comments to MessageDisposition and AgentEvent variants

824a2c0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

TokenEstimator trait with LlamaTokenEstimator and MockEstimator

7383087

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ConversationWindow — VecDeque history with O(1) token-budget trimming

094d09d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Conductor: add existing_messages parameter for pre-built conversation…

0d4ce0d

… history Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extract orchestrator helper functions to agent_loop.rs

37bf1f1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace inline orchestrator loop with agent_loop::run_agent_loop

a53f9f5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Route orchestrator A2A messages through event channel with CycleReceipt

a931232

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix visibility warnings, remove dead pending_instructions field

39a534f

Narrow pub to pub(super) for ConversationWindow and MockEstimator. Remove ToolMemory.pending_instructions (never written to or read). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Gemma 4 tool calling fixes

ee9850d

- Case-insensitive model name matching in from_name() - Strip "call:" prefix from Gemma 4 curly-brace tool call format - Planner waits for executor/judge feedback before next round Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

RimWorld: update AGENTS.md models, disable specialists for solo testing

6968030

Specialists commented out in launch script until they have proactive mesh tool usage. Commander AGENTS.md updated for Gemma 4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arkavo-com and others added 6 commits April 6, 2026 19:43

Restrict MessageSnapshot visibility to pub(crate)

5a45912

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arkavo-com and others added 20 commits April 13, 2026 12:11

Fix CI: unused variable warning, missing match arms in ui.rs

bd5c557

- prefer_capable parameter unused when llama-cpp feature is off - ui.rs match missing Gemma 4 E2B/E4B/26B/31B variants Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Mark decode_use_non_causal as unsafe (takes raw pointer)

97877c7

Clippy correctly flags public functions that dereference raw pointers without being marked unsafe. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix clippy: complex type alias, collapsible if in llamacpp_provider

e11618f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add RUSTSEC-2026-0097 to advisory ignore list

2d3e97d

rand unsoundness requires a custom logger that accesses ThreadRng inside the log handler — not applicable to our usage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix musl build: gate any_loaded_model on not(target_env = "musl")

4e5a988

LlamaModel is only exported from arkavo_llm on non-musl targets, but any_loaded_model was gated on just llama-cpp feature without the musl exclusion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix clippy 1.94: redundant u32 cast, bool-to-int, map_or simplification

56834ae

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix is_some_and on Result: use is_ok_and, return usize for slice index

6b27a30

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix clippy 1.94: collapsible if in agent_connection.rs

4cc8e5f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix clippy 1.94: is_multiple_of, format strings, collapsible ifs

0f7dfec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix clippy 1.94: use inclusive ranges (1..=n instead of 1..1+n)

5cd32ec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix remaining clippy 1.94 lints: format strings, collapsible ifs, wri…

1df410f

…teln, pub visibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Trigger CI rerun (previous run had sccache DNS failure + slow runners)

225b724

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Retrigger CI (previous run had hung runners on apt install)

2fffa44

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bump binary size limit to 65MB

c5316e1

llama.cpp update to e21cdc11a added Gemma 4 and DeepSeek v3.2 parsers, pushing the release binary from 59MB to 61MB. Bump limit from 60MB to 65MB to accommodate upstream growth. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arkavo-com force-pushed the feature/gemma4-support branch from d78c1a3 to c5316e1 Compare April 14, 2026 14:37

arkavo-com merged commit 82bcbc7 into main Apr 14, 2026
45 of 46 checks passed

arkavo-com deleted the feature/gemma4-support branch April 14, 2026 22:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma 4#566

Gemma 4#566
arkavo-com merged 78 commits intomainfrom
feature/gemma4-support

arkavo-com commented Apr 5, 2026

Uh oh!

github-actions bot commented Apr 5, 2026 •

edited

Loading

Uh oh!

arkavo-com commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arkavo-com commented Apr 5, 2026

Summary

Tool Bench Results (Q4_K_M, Apple Silicon)

Test plan

Uh oh!

github-actions bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Spec Coverage Delta

Newly Covered

Spec Coverage Report

Quality Gate

Uh oh!

arkavo-com commented Apr 13, 2026

Tested llama.cpp PR ggml-org/llama.cpp#21760

Test results

Our fix needed alongside

Recommendation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Apr 5, 2026 •

edited

Loading