Skip to content

fix: use Vec<u8> for exec stdout/stderr to prevent UTF-8 corruption#361

Open
yan5xu wants to merge 3 commits intoboxlite-ai:mainfrom
yan5xu:main
Open

fix: use Vec<u8> for exec stdout/stderr to prevent UTF-8 corruption#361
yan5xu wants to merge 3 commits intoboxlite-ai:mainfrom
yan5xu:main

Conversation

@yan5xu
Copy link
Copy Markdown
Contributor

@yan5xu yan5xu commented Mar 11, 2026

Summary

  • String::from_utf8_lossy() in portal/interfaces/exec.rs was replacing incomplete UTF-8 byte sequences at gRPC chunk boundaries with U+FFFD replacement characters
  • This caused Chinese text corruption when stdout output was large enough to be split across multiple gRPC chunks (e.g. a 130KB JSON response)
  • Changed stdout/stderr channels from String to Vec<u8> throughout the pipeline to preserve raw bytes

Changes

  • portal/interfaces/exec.rs: route_output sends raw chunk.data (Vec) instead of String::from_utf8_lossy conversion
  • litebox/exec.rs: ExecStdout/ExecStderr stream items changed from String to Vec<u8>
  • serve.rs: base64-encodes raw bytes directly (was .as_bytes() on String)
  • terminal/mod.rs: writes raw bytes to stdout/stderr (was .as_bytes() on String)

Root cause analysis

Process stdout (valid UTF-8) 
  → gRPC chunk splits at arbitrary byte boundary (mid-character)
  → String::from_utf8_lossy replaces trailing incomplete bytes with U+FFFD
  → downstream receives corrupted text

Verified: same Go binary producing 130KB JSON output had 6 U+FFFD in DB (pre-existing) but 11 after passing through BoxLite serve — the extra 5 were introduced by from_utf8_lossy.

Test plan

  • cargo build --release -p boxlite-cli passes
  • Verified: pinix invoke get-topic now returns exactly 6 U+FFFD (matching DB), down from 11

🤖 Generated with Claude Code

yan5xu and others added 3 commits March 9, 2026 19:32
Holds a single BoxliteRuntime and exposes box lifecycle + exec operations
over HTTP. Solves runtime lock contention when multiple callers need
concurrent access (e.g. Pinix Server executing nested clip-to-clip calls).

Endpoints (9, MVP):
- GET  /v1/config
- POST /v1/local/boxes (create)
- GET  /v1/local/boxes (list)
- GET  /v1/local/boxes/{id} (info)
- POST /v1/local/boxes/{id}/start
- POST /v1/local/boxes/{id}/stop
- POST /v1/local/boxes/{id}/exec (returns execution_id, supports stdin)
- GET  /v1/local/boxes/{id}/executions/{eid}/output (SSE stream)
- DELETE /v1/local/boxes/{id}

SSE exec protocol:
  event: stdout, data: {"data":"<base64>"}
  event: stderr, data: {"data":"<base64>"}
  event: exit, data: {"exit_code":0,"duration_ms":1234}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes for BoxLite serve mode:

1. serve: add volumes field to CreateBoxRequest and pass to BoxOptions
   - Without this, virtiofs mounts were silently dropped and /clip
     was not visible inside containers

2. disk/ext4: quote host path in debugfs write command
   - Paths containing spaces (macOS "Application Support") caused
     debugfs to silently fail, leaving guest binary missing from rootfs

3. guest: mount devtmpfs at /dev for block device node auto-creation
   - Kernel sees block devices in /proc/partitions but /dev/vd* nodes
     were missing, preventing Container.Init from mounting disks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
String::from_utf8_lossy() was replacing incomplete UTF-8 sequences
at gRPC chunk boundaries with U+FFFD replacement characters.

Changed stdout/stderr channels from String to Vec<u8> throughout:
- portal/interfaces/exec.rs: route_output sends raw bytes
- litebox/exec.rs: ExecStdout/ExecStderr yield Vec<u8>
- serve.rs: base64-encodes raw bytes directly
- terminal/mod.rs: writes raw bytes to stdout/stderr

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant