Skip to content

feat(context-mode): make OpenClaw capability reporting truthful#295

Closed
dikotiledon wants to merge 13 commits intomksglu:nextfrom
dikotiledon:feature/context-mode-capability-hardening
Closed

feat(context-mode): make OpenClaw capability reporting truthful#295
dikotiledon wants to merge 13 commits intomksglu:nextfrom
dikotiledon:feature/context-mode-capability-hardening

Conversation

@dikotiledon
Copy link
Copy Markdown

@dikotiledon dikotiledon commented Apr 16, 2026

Summary

This PR makes OpenClaw capability reporting in context-mode truthful and session-specific.

Previously, OpenClaw support could be interpreted too broadly from installation state or partial hook activity. This change set moves capability classification behind explicit runtime evidence so the plugin only reports full support, and only claims active token savings, when DB-backed persistence has actually been observed for the current session.

What changed

Capability model

  • added a pure capability classifier in src/openclaw/capability.ts
  • defined explicit session states: full, degraded, and unsupported
  • made capability transitions monotonic as stronger evidence appears

OpenClaw runtime wiring

  • recorded per-session runtime evidence in src/openclaw-plugin.ts
  • surfaced capability state, reason code, evidence level, active capture path, and recommended next action in runtime-facing output
  • fail-closed direct sessions until a working capture path is proven
  • prevented false Token savings active: yes claims without DB-backed proof

Tests

  • added capability contract coverage in tests/plugins/openclaw-capability.test.ts
  • expanded plugin coverage in tests/plugins/openclaw.test.ts
  • hardened Windows/runtime verification in tests/executor.test.ts
  • stabilized ctx_doctor regression coverage in tests/core/server.test.ts

Docs

  • updated docs/adapters/openclaw.md to describe capability-aware support
  • updated README.md to explain what full, degraded, and unsupported mean in practice
  • removed blanket support wording for direct OpenClaw sessions

Plan coverage

This PR completes the implementation plan in four parts:

  • lock the capability contract in a pure helper
  • wire runtime evidence into the OpenClaw plugin
  • tighten direct-session wording and support claims in docs
  • complete the verification gate and final hardening needed to ship the branch

Validation

Passed:

  • npx vitest run tests/plugins/openclaw-capability.test.ts tests/plugins/openclaw.test.ts --pool threads --maxWorkers 1
  • npx vitest run tests/executor.test.ts -t "Windows: Python runtime prefers python.exe over python3 alias"
  • npx vitest run tests/core/server.test.ts -t "ctx_doctor"
  • npm run build
  • OpenClaw gateway restart and health verification
  • direct-session runtime truth check against the latest session DB

Runtime guarantees verified:

  • metadata-only sessions stay unsupported
  • hook-observed sessions stay degraded until persistence is proven
  • DB-backed persistence upgrades the session to full
  • token savings are not reported active without DB-backed proof

Known unrelated blockers

A repository-wide full-suite gate is still blocked by pre-existing failures outside this OpenClaw change set:

  • Go executor timeouts in tests/executor.test.ts
  • Rust linker/toolchain environment failure in tests/executor.test.ts
  • broader suite noise in tests/hooks/integration.test.ts

These are called out explicitly so this PR does not over-claim repo-wide green status.

Risk

Low to moderate.

The behavioral change is intentionally fail-closed: sessions that previously looked implicitly supported may now report degraded or unsupported until runtime evidence is observed. That is expected and is the point of the patch.

Rollback

If needed, revert the PR and restart the gateway. The change set is self-contained to capability classification, plugin reporting, docs, and verification hardening.

Reviewer focus

Please review this as a capability-truthfulness and verification-hardening PR:

  • Is the classifier conservative enough?
  • Are support claims now backed by runtime evidence?
  • Does runtime output avoid implying token savings without proof?
  • Do the docs now match actual runtime behavior?

@dikotiledon dikotiledon changed the title fix(context-mode): stabilize Windows Python and ctx_doctor feat(context-mode): harden OpenClaw capability reporting Apr 16, 2026
@dikotiledon dikotiledon changed the title feat(context-mode): harden OpenClaw capability reporting feat(context-mode): make OpenClaw capability reporting truthful Apr 16, 2026
@mksglu mksglu changed the base branch from main to next April 16, 2026 11:51
@mksglu
Copy link
Copy Markdown
Owner

mksglu commented Apr 16, 2026

Hey @dikotiledon — this is a well-structured PR. The capability classification model (full/degraded/unsupported) is exactly the right approach for OpenClaw.

I want to be upfront: I don't use OpenClaw myself day-to-day, so I can't fully evaluate the runtime behavior from code review alone. I need to make sure this works correctly before merging because I'm presenting context-mode to the OpenClaw community soon, and I need confidence that the integration is solid.

Before I merge, I need help with verification:

  1. Can you run a full end-to-end test? Start a fresh OpenClaw session, trigger some tool calls, and show me the capability state transitions: unsupported → degraded → full. Screenshots or terminal output would be ideal.

  2. What does ctx doctor output look like on OpenClaw after this PR? I want to see what users will actually see.

  3. Does the fail-closed behavior feel right in practice? If someone installs context-mode on OpenClaw and runs their first session, they'll see "unsupported" until evidence appears. Is that confusing? Or does it transition fast enough that users don't notice?

  4. Have you tested this on the latest OpenClaw version (>2026.1.29)? The adapter has version-specific fallbacks and I want to make sure the capability classifier works with the current gateway.

The code looks clean. The monotonic state transitions, the pure classifier in capability.ts, the DB-backed evidence requirement — all good patterns. I just need runtime proof before this ships.

If you can share the test output, I'll merge promptly. Thanks for the thorough work.

@mksglu
Copy link
Copy Markdown
Owner

mksglu commented Apr 17, 2026

@dikotiledon why did u closed that man?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants