A network-accessible research interface for LM Studio. Run it once on the machine hosting LM Studio; access it from any browser on your network. Designed for maximum observability — every token, timing metric, and raw API payload is visible in real time.
| Requirement | Version |
|---|---|
| Node.js | ≥ 18.0.0 (uses native fetch and ReadableStream) |
| LM Studio | Any version with the local server enabled |
| Browser | Any modern browser on the same network |
git clone <this-repo>
cd harness-this
npm installnpm startThe server starts on http://0.0.0.0:3000 and proxies to http://localhost:1234.
npm run devAll configuration is done via environment variables — no config files to edit.
| Variable | Default | Description |
|---|---|---|
PORT |
3000 |
Port the UI server listens on |
LM_HOST |
localhost |
Hostname or IP of the machine running LM Studio |
LM_PORT |
1234 |
Port LM Studio's local server is bound to |
Examples:
# LM Studio running on a different machine
LM_HOST=192.168.1.42 npm start
# LM Studio on a non-default port
LM_PORT=8080 npm start
# Everything custom
PORT=8000 LM_HOST=192.168.1.42 LM_PORT=1234 npm startFind this machine's local IP:
ipconfig getifaddr en0 # macOS (Wi-Fi)
ipconfig getifaddr en1 # macOS (Ethernet)
ip route get 1 | awk '{print $7}' # LinuxThen open http://<that-ip>:3000 in any browser on the network.
The server binds to
0.0.0.0(all interfaces) by default, so no extra firewall configuration is required on most setups.
- Open LM Studio
- Go to the Local Server tab (the
<->icon in the left sidebar) - Click Start Server — the default port is
1234 - Load at least one model
The UI will show a green status dot and populate the model dropdown when it successfully connects.
┌──────────────────────────────────────────────────────────────────┐
│ ⊞ ◉ LM Research ● [Model dropdown] ⊕ New ↓ Export ⊡ │
├─────────────┬──────────────────────────────┬─────────────────────┤
│ PARAMETERS │ │ Console │Stats│Raw │
│ │ Chat messages │ │
│ Preset │ stream here │ Live server events │
│ Sliders │ │ broadcast via SSE │
│ System │ │ │
│ Prompt │──────────────────────────────│ │
│ │ [token count] [Markdown ✓] │ │
│ │ ┌──────────────────────┐ │ │
│ │ │ Type message here… │[→] │ │
└─────────────┴──────────────────────────────┴─────────────────────┘
| Control | Range | Description |
|---|---|---|
| System Prompt | text | Injected as the system role message on every request |
| Preset | Default / Creative / Precise / Code | One-click parameter bundles (see table below) |
| Temperature | 0 – 2 | Randomness of token sampling |
| Top P | 0 – 1 | Nucleus sampling cutoff |
| Top K | 0 – 200 | Limits vocabulary to top-K tokens at each step |
| Repeat Penalty | 0.5 – 2 | Penalises recently used tokens |
| Max Tokens | -1 – 32768 | Maximum completion length (-1 = model default) |
| Seed | -1 – 2147483647 | Fixed seed for reproducible outputs (-1 = random) |
| Stop Sequences | one per line | Generation stops when any sequence is matched |
Presets:
| Preset | Temp | Top P | Top K | Rep. Penalty | Best for |
|---|---|---|---|---|---|
| Default | 0.7 | 0.95 | 40 | 1.1 | General use |
| Creative | 1.2 | 0.98 | 80 | 1.05 | Brainstorming, fiction |
| Precise | 0.1 | 0.90 | 20 | 1.15 | Factual Q&A, structured output |
| Code | 0.15 | 0.95 | 20 | 1.2 | Code generation, debugging |
- Type in the input box and press Ctrl+Enter (or ⌘+Enter on Mac) to send, or click Send.
- Responses stream token-by-token with a blinking cursor.
- Each assistant message shows a stats footer: token count, generation speed, time to first token, total time, and finish reason.
- Copy button on each message copies the raw text to clipboard.
- Copy code button appears on every code block.
- ↺ Retry button re-runs the last assistant turn with current parameters (useful for seed/temperature experiments).
- Toggle Markdown in the toolbar to switch between rendered and raw text.
- Esc or the ■ Stop button aborts generation mid-stream.
Real-time feed of every server-side event, broadcast via Server-Sent Events:
| Badge | Color | Meaning |
|---|---|---|
REQ |
Blue | Outgoing request to LM Studio |
RES |
Green | Response received from LM Studio |
ERR |
Red | Any error (network, parse, upstream) |
SYS |
Purple | Server lifecycle events (client connect/disconnect) |
STAT |
Yellow | Per-generation summary line after each completion |
Click ✕ to clear the console log.
Detailed metrics for the last generation and session totals:
| Metric | Source | Notes |
|---|---|---|
| Prompt tokens | LM Studio usage field | Available when include_usage is supported |
| Completion tokens | LM Studio usage field | Falls back to stream chunk count |
| Tokens / sec | Calculated | completion_tokens ÷ (elapsed − TTFT) |
| Time to first token (TTFT) | Measured by proxy | Latency from request sent to first content chunk |
| Total time | Measured by proxy | Wall-clock time for the full generation |
| Finish reason | LM Studio | stop (clean end), length (hit max_tokens), other |
Full JSON of the last Request (sent to LM Studio) and Response (all streamed chunks) with syntax highlighting. Use this to inspect exactly what parameters were sent, verify stop sequences, or copy payloads for use in other tools.
| Shortcut | Action |
|---|---|
Ctrl/⌘ + Enter |
Send message |
Esc |
Abort current generation |
Ctrl/⌘ + L |
Focus the input box |
Click ↓ Export in the header to download a JSON file containing:
- Model used
- System prompt
- All parameter values at time of export
- Full message history with per-message stats
- Session totals
File is named lm-research-<timestamp>.json.
Red status dot / "Cannot reach LM Studio"
- Confirm LM Studio's local server is started (the
<->tab, click Start Server) - Check
LM_HOSTandLM_PORTmatch the LM Studio server address - Make sure no firewall is blocking port 1234 between the two machines
Model dropdown shows "No models loaded"
- LM Studio is reachable but no model is loaded — load a model in LM Studio first
Responses appear cut off
- Increase Max Tokens in the parameters panel, or click ↺ Retry after raising the limit
Generation is slow
- Watch the Tokens / sec value in the Stats tab — this reflects the model's throughput on your hardware, not the UI
- Lower Top K (e.g., 20) can slightly improve speed on some backends