feat(context-mode): make OpenClaw capability reporting truthful by dikotiledon · Pull Request #295 · mksglu/context-mode

dikotiledon · 2026-04-16T09:08:52Z

Summary

This PR makes OpenClaw capability reporting in context-mode truthful and session-specific.

Previously, OpenClaw support could be interpreted too broadly from installation state or partial hook activity. This change set moves capability classification behind explicit runtime evidence so the plugin only reports full support, and only claims active token savings, when DB-backed persistence has actually been observed for the current session.

What changed

Capability model

added a pure capability classifier in src/openclaw/capability.ts
defined explicit session states: full, degraded, and unsupported
made capability transitions monotonic as stronger evidence appears

OpenClaw runtime wiring

recorded per-session runtime evidence in src/openclaw-plugin.ts
surfaced capability state, reason code, evidence level, active capture path, and recommended next action in runtime-facing output
fail-closed direct sessions until a working capture path is proven
prevented false Token savings active: yes claims without DB-backed proof

Tests

added capability contract coverage in tests/plugins/openclaw-capability.test.ts
expanded plugin coverage in tests/plugins/openclaw.test.ts
hardened Windows/runtime verification in tests/executor.test.ts
stabilized ctx_doctor regression coverage in tests/core/server.test.ts

Docs

updated docs/adapters/openclaw.md to describe capability-aware support
updated README.md to explain what full, degraded, and unsupported mean in practice
removed blanket support wording for direct OpenClaw sessions

Plan coverage

This PR completes the implementation plan in four parts:

lock the capability contract in a pure helper
wire runtime evidence into the OpenClaw plugin
tighten direct-session wording and support claims in docs
complete the verification gate and final hardening needed to ship the branch

Validation

Passed:

npx vitest run tests/plugins/openclaw-capability.test.ts tests/plugins/openclaw.test.ts --pool threads --maxWorkers 1
npx vitest run tests/executor.test.ts -t "Windows: Python runtime prefers python.exe over python3 alias"
npx vitest run tests/core/server.test.ts -t "ctx_doctor"
npm run build
OpenClaw gateway restart and health verification
direct-session runtime truth check against the latest session DB

Runtime guarantees verified:

metadata-only sessions stay unsupported
hook-observed sessions stay degraded until persistence is proven
DB-backed persistence upgrades the session to full
token savings are not reported active without DB-backed proof

Known unrelated blockers

A repository-wide full-suite gate is still blocked by pre-existing failures outside this OpenClaw change set:

Go executor timeouts in tests/executor.test.ts
Rust linker/toolchain environment failure in tests/executor.test.ts
broader suite noise in tests/hooks/integration.test.ts

These are called out explicitly so this PR does not over-claim repo-wide green status.

Risk

Low to moderate.

The behavioral change is intentionally fail-closed: sessions that previously looked implicitly supported may now report degraded or unsupported until runtime evidence is observed. That is expected and is the point of the patch.

Rollback

If needed, revert the PR and restart the gateway. The change set is self-contained to capability classification, plugin reporting, docs, and verification hardening.

Reviewer focus

Please review this as a capability-truthfulness and verification-hardening PR:

Is the classifier conservative enough?
Are support claims now backed by runtime evidence?
Does runtime output avoid implying token savings without proof?
Do the docs now match actual runtime behavior?

mksglu · 2026-04-16T14:52:38Z

Hey @dikotiledon — this is a well-structured PR. The capability classification model (full/degraded/unsupported) is exactly the right approach for OpenClaw.

I want to be upfront: I don't use OpenClaw myself day-to-day, so I can't fully evaluate the runtime behavior from code review alone. I need to make sure this works correctly before merging because I'm presenting context-mode to the OpenClaw community soon, and I need confidence that the integration is solid.

Before I merge, I need help with verification:

Can you run a full end-to-end test? Start a fresh OpenClaw session, trigger some tool calls, and show me the capability state transitions: unsupported → degraded → full. Screenshots or terminal output would be ideal.
What does ctx doctor output look like on OpenClaw after this PR? I want to see what users will actually see.
Does the fail-closed behavior feel right in practice? If someone installs context-mode on OpenClaw and runs their first session, they'll see "unsupported" until evidence appears. Is that confusing? Or does it transition fast enough that users don't notice?
Have you tested this on the latest OpenClaw version (>2026.1.29)? The adapter has version-specific fallbacks and I want to make sure the capability classifier works with the current gateway.

The code looks clean. The monotonic state transitions, the pure classifier in capability.ts, the DB-backed evidence requirement — all good patterns. I just need runtime proof before this ships.

If you can share the test output, I'll merge promptly. Thanks for the thorough work.

mksglu · 2026-04-17T16:16:34Z

@dikotiledon why did u closed that man?

github-actions bot and others added 12 commits April 15, 2026 18:22

ci: update install stats

3a8384e

fix(openclaw): remove default routing prompt injection

68e44ff

feat(openclaw): capture direct session tool results

333edc7

chore: ignore local worktrees directory

e40ed91

test(context-mode): define openclaw capability contract

2c07888

feat(context-mode): surface truthful openclaw capability state

9cd0611

fix(context-mode): bound capability state and expose capture path

e934ddf

docs(context-mode): align openclaw support claims to runtime evidence

4d95c0b

ci: update install stats

72eacf5

docs(context-mode): clarify openclaw capability-aware wording

57006a4

docs(context-mode): clarify direct openclaw session expectations

05ff6a4

fix(context-mode): prefer python.exe and stabilize ctx_doctor

d00efdb

dikotiledon changed the title ~~fix(context-mode): stabilize Windows Python and ctx_doctor~~ feat(context-mode): harden OpenClaw capability reporting Apr 16, 2026

dikotiledon changed the title ~~feat(context-mode): harden OpenClaw capability reporting~~ feat(context-mode): make OpenClaw capability reporting truthful Apr 16, 2026

Merge branch 'main' into feature/context-mode-capability-hardening

b381206

mksglu changed the base branch from main to next April 16, 2026 11:51

dikotiledon closed this Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(context-mode): make OpenClaw capability reporting truthful#295

feat(context-mode): make OpenClaw capability reporting truthful#295
dikotiledon wants to merge 13 commits intomksglu:nextfrom
dikotiledon:feature/context-mode-capability-hardening

dikotiledon commented Apr 16, 2026 •

edited

Loading

Uh oh!

mksglu commented Apr 16, 2026

Uh oh!

mksglu commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dikotiledon commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Capability model

OpenClaw runtime wiring

Tests

Docs

Plan coverage

Validation

Known unrelated blockers

Risk

Rollback

Reviewer focus

Uh oh!

mksglu commented Apr 16, 2026

Uh oh!

mksglu commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dikotiledon commented Apr 16, 2026 •

edited

Loading