Skip to content

feat(gui): Gateway Lifecycle GUI Controls#4

Open
mafueee wants to merge 64 commits intofeat/native-management-suitefrom
feat/gateway-lifecycle-gui
Open

feat(gui): Gateway Lifecycle GUI Controls#4
mafueee wants to merge 64 commits intofeat/native-management-suitefrom
feat/gateway-lifecycle-gui

Conversation

@mafueee
Copy link
Copy Markdown
Owner

@mafueee mafueee commented Mar 25, 2026

Gateway Lifecycle GUI Controls

Problem

When the OpenShell gateway is offline, the dashboard provides no discoverable way to start it — blocking all management operations with no actionable feedback.

Solution

Three complementary UI elements ensure gateway status is always visible and controllable:

  1. Global Alert Banner (GatewayControlPanel.tsx) — Red gradient banner with animated shimmer shown on every page when the gateway is unhealthy/offline. Features one-click Start button and auto-hides when healthy.

  2. Dedicated Gateway Page (GatewayPage.tsx at /gateway) — Full container details (state, image, ID), health status, and lifecycle controls (Start/Stop/Restart) with 5-second auto-refresh.

  3. Sidebar Health Indicator — Persistent green/red dot next to the "Gateway" nav link, powered by the existing WebSocket connection.

Changes

File Type Description
GatewayControlPanel.tsx NEW Global banner component
GatewayControlPanel.test.tsx NEW 8 unit tests
GatewayPage.tsx NEW Management page component
GatewayPage.test.tsx NEW 7 unit tests
App.tsx MOD Route + sidebar + banner integration
client.ts MOD GatewayStatusDetailed interface
index.css MOD 155 lines of gateway CSS with animations
README.md MOD Documentation for new gateway features

Testing

  • 15/15 new tests passing
  • Frontend rebuilt and verified on live dashboard

mafueee added 30 commits March 25, 2026 22:33
…page

- Add GatewayControlPanel.tsx: global banner shown on every page when
  gateway is offline with one-click Start button and animated shimmer
- Add GatewayPage.tsx: dedicated /gateway route with full container
  details, health checks, Start/Stop/Restart controls
- Add sidebar gateway nav link with live green/red health indicator dot
- Add GatewayStatusDetailed type to API client
- Add 160+ lines of gateway-specific CSS with responsive breakpoints
- Integrate GatewayControlPanel above Routes in App.tsx
- Add 15 comprehensive tests (8 control panel, 7 gateway page)
- Update README with new gateway lifecycle GUI documentation

Closes the UX gap where a stopped gateway blocked all platform functionality
with no discoverable way to start it from the GUI.
- Mount GatewayControlPanel above Routes for global visibility
- Add /gateway route to GatewayPage
- Add sidebar Gateway nav link with live health indicator dot
- Connect useWebSocket gateway state to all gateway UI components
…nnection header + config paths

Three bugs caused the OpenShell gateway to show as 'Not Installed' in the GUI:

1. dockerGateway.js used Docker API v1.41, but the engine requires ≥v1.44 (server is v1.53).
   All Docker socket requests failed silently → findGatewayContainer() returned null.
   Fix: Updated API version from v1.41 to v1.47.

2. Docker Engine keeps HTTP connections alive by default, preventing the socket 'end' event
   from firing. The 15s timeout triggered, which the catch block converted to null.
   Fix: Added 'Connection: close' header to all Docker socket requests.

3. gatewayHealth.js looked for 'active_cluster' and '{name}_metadata.json' in the config root,
   but the actual layout uses 'active_gateway' and 'gateways/{name}/metadata.json'.
   Fix: Updated both resolveGatewayHttpEndpoint() and isGatewayConfigured() paths.

Also updated README.md with dockerGateway.js component docs and config path details.
grpcClient.js had the same config path bugs as gatewayHealth.js:
- active_cluster → active_gateway
- {name}_metadata.json → gateways/{name}/metadata.json
- clusters/{name}/mtls/ → gateways/{name}/mtls/

These prevented the gRPC client from finding the mTLS certificates
and metadata, causing gateway health checks to fail (healthy: false)
even though the container was running.

With this fix, the gateway now reports healthy: true with gRPC method.
- waitForReady: false on unary calls prevents indefinite queuing
- Fast reconnect backoff (500ms/3s) avoids long stalls
- Reduced provider CRUD timeouts from 30s to 3s
- Add isValidApiKey() helper that rejects multi-line, overly long, or
  prose-containing keys from being saved to config
- Guard onboard deploy and PUT /api/inference with key validation
- Detect corrupted API key in chat/message route and return actionable
  error with 'Reconfigure Inference' guidance
- Add contextual recovery links in ChatInterface for API key errors,
  missing OpenClaw, and SSH transport failures
- Add 2 new ChatInterface test cases for error recovery scenarios
- Update README Agent Chat description"
Add restoreApiKeyFromConfig() that runs at startup to restore the
persisted _apiKey from ~/.nemoclaw/config.json into process.env.
Update README to document the persistent API key restoration behavior.
Reverts OnboardWizard.tsx to the original version from the initial commit,
restoring:
- Preflight checks step with API-driven checks
- wizard-steps CSS class step indicators
- className="input" with inline validation
- 3 hardcoded providers (cloud, ollama, vllm) in vertical list
- "Setup Complete" step with CLI command instead of SSE deploy

Updates test file to match the reverted component.
Merges the original input style (wizard-steps CSS, className='input',
inline validation, preflight checks) with the full provider list from
PROVIDERS array (including OpenRouter, Gemini, NIM) and the SSE deploy
functionality.

This correctly addresses the input style revert without losing provider
support.
The OpenShell gateway only accepts provider types 'openai', 'anthropic',
and 'nvidia'. NemoClaw GUI was passing raw provider keys (e.g.,
'openrouter', 'gemini') causing INVALID_ARGUMENT errors during deployment.

Changes:
- Added mapProviderToGrpcType() helper in grpcClient.js
- Updated claws.js, index.js to use the mapping
- Added unit tests for all provider type mappings
- Updated README.md with provider type mapping docs
The /api/chat/message endpoint now calls the configured LLM provider's
OpenAI-compatible /v1/chat/completions API directly from the server,
instead of attempting to execute openclaw inside the sandbox via gRPC
ExecSandbox (which failed with 'command not found').

Features:
- Per-session conversation memory (50 msg cap, 30min TTL)
- Credential resolution: local config > vault > process.env > gateway bundle
- Structured error responses for 401, timeouts, and connectivity failures
- Updated welcome message to clarify messages are routed through
  the sandbox's security policy via ExecSandbox
- All agent actions now enforced by claw's Landlock, network, and
  filesystem policies
mafueee added 30 commits March 27, 2026 07:16
…ents

- Add server/lib/gatewayProxy.js: WebSocket-to-gRPC bridge with auto-start
- Add server/routes/sandbox.js: REST proxy for all OpenClaw API endpoints
- Add ApprovalsList, SkillsList, PluginManager, MemorySearch, CronManager React components
- Wire all new routes and components into App.tsx with SandboxPage wrapper
- Update README with OpenClaw Gateway Proxy and Sandbox API sections
…arnings

- Replace openclaw channels add (fails: binary not in sandbox) with UpdateConfig
  gRPC storing CHANNEL_DISCORD_TOKEN as a gateway sandbox setting
- Falls back to local extensions-state.json if gateway unavailable
- Downgrade pip/npm install failures from error to warning status
- Update tests and README"
… channel registration

settingValue must be wrapped as { stringValue: credential } to match the
openshell.sandbox.v1.SettingValue oneof message proto definition, rather
than being passed as a raw string which causes a serialization failure.
…nv file

- Rewrite configureChannelInSandbox() to use Python (token as argv[1])
  to write DISCORD_BOT_TOKEN to /sandbox/.openclaw-data/.channel-env,
  completely bypassing shell escaping issues with bot tokens

- Use 'openclaw doctor --fix' to hot-apply channel config to the running
  gateway daemon (pid 52 cannot be killed from ExecSandbox due to sandbox
  process isolation; doctor communicates via local socket instead)

- Add POST /api/extensions/sync-channel endpoint for backfilling already-
  installed extensions without reinstalling them

- Update nemoclaw-start.sh to source .channel-env before launching the
  gateway, ensuring DISCORD_BOT_TOKEN persists across container restarts

- Update README.md: document new persistent mechanism, sync-channel
  endpoint, and add 'Syncing an Already-Installed Extension' section"
…gateway restart

- Shift extension installation status check to rely on the local state file rather than the unreliable gRPC getSandboxConfig command.
- Modify `configureChannelInSandbox` to aggressively restart the `openclaw gateway` daemon to explicitly load fresh environment variables and channel configurations.
… in extensions

- Add `--break-system-packages` to `pip install` commands in the extension registry so that Python modules (like discord.py) can successfully install system-wide inside the sandbox containers.
…essaging tools

- Add explicit prompt instructions into the `docs` strings of messaging extensions (Discord, Telegram, Slack) in `registry.json`. 
- These instructions are dynamically injected into the agent's system prompt to stop the LLM from entering a 120s retry-loop trying to guess alpha-channel names (e.g. "general"), ensuring it asks the user for a numeric ID instead.
…nvironment

Root cause: openclaw agent connects to an internal gateway daemon which has its
own isolated process environment. Shell-level exports do not reach the daemon.

Fixes applied:
- extensions.js: restart gateway using env KEY=VALUE to explicitly pass the token
  to the daemon process; locate openclaw binary across multiple common paths;
  provide clear success/failure/not-found log messages
- index.js: scope getExtensionCredentials() to current sandbox only;
  source .channel-env and export creds as shell vars before openclaw agent call

The gateway is now restarted with DISCORD_BOT_TOKEN in its daemon environment
so tools and Python subprocesses spawned by the agent can access it via os.environ."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant