Glyphbox

A NetHack harness that exposes a Python sandbox as the primary interface for LLMs to play NetHack. The LLM receives the full game screen each turn and responds with Python code that executes against a high-level game API.

How It Works

LLM receives game screen each turn
    │
    │  responds with Python code via execute_code tool call
    ▼
Python Sandbox (restricted execution environment)
    │
    │  code calls methods on `nh` (NetHackAPI)
    ▼
NetHackAPI (high-level game interface)
    │
    │  translates to gymnasium actions
    ▼
NetHack Learning Environment (NLE)
    NetHack 3.6.7 simulator

Each turn, the LLM sees the full 24x80 ASCII game screen, game messages, and results from its previous actions. It writes Python code that runs in a sandboxed environment with access to nh — a NetHackAPI instance that wraps all game interactions into clean method calls.

Setup

git clone <repo-url>
cd glyphbox
uv sync

Set an API key for your provider:

# OpenRouter (default provider)
export OPENROUTER_API_KEY="your-key"

Verify the setup:

uv run python -m src.cli verify

Usage

# Watch the agent play in a TUI
uv run python -m src.cli watch

# Use a specific model
uv run python -m src.cli watch --model anthropic/claude-3-haiku-20240307

# Record the session with asciinema
uv run python -m src.cli watch --record

# Play back a recording
asciinema play data/recordings/run_2026-01-19_19-34-49.cast

CLI Options

Flag	Description
`--model`, `-m`	Override the LLM model
`--record`, `-r`	Record session with asciinema (saves to `data/recordings/`)
`--config`, `-c`	Path to config file (default: `config/default.yaml`)
`--log-level`, `-l`	`DEBUG`, `INFO`, `WARNING`, `ERROR`

TUI Controls

Key	Action
`S`	Start the agent
`Space`	Pause / Resume
`Q`	Quit

The Sandbox

The LLM interacts with NetHack exclusively through an execute_code tool call. The submitted Python code runs in a restricted sandbox with these globals available:

Name	Type	Description
`nh`	`NetHackAPI`	Game interface — all actions and queries
`Direction`	`Enum`	`N`, `S`, `E`, `W`, `NE`, `NW`, `SE`, `SW`, `UP`, `DOWN`, `SELF`
`Position`	`dataclass`	`(x, y)` coordinates with `.distance_to()`, `.direction_to()`, `.adjacent()`
`PathResult`	`dataclass`	Pathfinding result with `.path` and `.reason`
`PathStopReason`	`Enum`	Why pathfinding stopped
`TargetResult`	`dataclass`	Target search result with `.position` and `.success`
`HungerState`	`Enum`	`SATIATED`, `NOT_HUNGRY`, `HUNGRY`, `WEAK`, `FAINTING`, `FAINTED`
`random`	module	Python `random` module

Standard builtins (print, len, range, min, max, sorted, list, dict, etc.) are available. Imports are forbidden — all needed types are pre-loaded.

Security

Code is validated via AST inspection before execution:

Forbidden imports: os, sys, subprocess, socket, http, pickle, threading, multiprocessing, etc.
Forbidden calls: exec, eval, compile, open, input, __import__
Forbidden attributes: __class__, __bases__, __dict__, __globals__, __code__
Timeout: signal-based SIGALRM kills runaway code

Example Code

What the LLM submits each turn:

# Check health and eat if hungry
if nh.is_hungry:
    food = nh.get_food()
    if food:
        nh.eat(food[0].slot)
    else:
        nh.pray()
elif nh.has_adjacent_hostile:
    for monster in nh.get_adjacent_hostiles():
        direction = nh.position.direction_to(monster.position)
        nh.attack(direction)
else:
    nh.autoexplore()

NetHackAPI Reference

All methods are synchronous. Actions consume game turns; queries do not.

Properties

Property	Type	Description
`nh.hp`	`int`	Current hit points
`nh.max_hp`	`int`	Maximum hit points
`nh.position`	`Position`	Current `(x, y)`
`nh.dungeon_level`	`int`	Current depth
`nh.turn`	`int`	Game turn counter
`nh.is_hungry`	`bool`	Hunger >= HUNGRY
`nh.is_weak`	`bool`	Hunger is WEAK or FAINTING
`nh.has_adjacent_hostile`	`bool`	Hostile monster in adjacent tile
`nh.turns_since_last_prayer`	`int`	Turns since last prayer
`nh.role`	`str`	Player class (e.g. `"Valkyrie"`)
`nh.is_done`	`bool`	Episode has ended

State Queries

nh.get_stats()               # Stats: hp, max_hp, pw, ac, xp_level, gold, hunger, etc.
nh.get_position()            # Position (x, y)
nh.get_screen()              # Full 24x80 ASCII screen as string
nh.get_screen_lines()        # List of 24 screen lines
nh.get_local_map(radius=7)   # Cropped view centered on player with coordinate guides

nh.get_message()             # Current top-of-screen message
nh.get_messages(n=10)        # Last n game messages

nh.get_current_level()       # DungeonLevel with parsed Tile grid
nh.get_tile(pos)             # Tile at position
nh.get_adjacent_tiles()      # Dict mapping "N"/"S"/"E"/etc. to tile descriptions

nh.get_visible_monsters()    # All visible monsters
nh.get_adjacent_hostiles()   # Hostile monsters in 8 adjacent tiles
nh.get_hostile_monsters()    # All non-peaceful monsters

nh.get_inventory()           # Inventory items (each has .slot, .name, .quantity)
nh.get_food()                # Food items in inventory
nh.get_weapons()             # Weapons in inventory
nh.get_items_here()          # Items on the ground at current position
nh.get_items_here_glyphs()   # Items at current position (glyph-based, no turn cost)
nh.get_items_on_map()        # All visible items on map with positions

nh.find_stairs()             # (stairs_up_pos, stairs_down_pos)
nh.find_doors()              # List of (Position, is_open)
nh.find_altars()             # List of altar positions
nh.find_nearest_item()       # TargetResult with .position and .success

Actions

All actions return ActionResult with .success, .messages, .turn_elapsed.

Movement:

nh.move(Direction.E)          # Move one tile east
nh.move(Direction.N, count=5) # Move up to 5 tiles north (auto-stops at walls/monsters)
nh.run(Direction.E)           # Run until wall/intersection/monster
nh.move_to(Position(42, 12))  # Pathfind and walk to position
nh.go_up()                    # Ascend stairs
nh.go_down()                  # Descend stairs

Combat:

nh.attack(Direction.N)        # Melee attack north
nh.kick(Direction.S)          # Kick south
nh.fire(Direction.E)          # Fire ranged weapon east
nh.throw('a', Direction.W)    # Throw inventory item 'a' west

Items:

nh.pickup()                   # Pick up all items here
nh.drop('a')                  # Drop item by inventory letter
nh.eat('a')                   # Eat from inventory
nh.eat()                      # Eat from ground
nh.quaff('a')                 # Drink potion
nh.read('a')                  # Read scroll/spellbook
nh.wield('a')                 # Wield weapon
nh.wear('a')                  # Wear armor
nh.take_off('a')              # Remove worn armor
nh.zap('a', Direction.N)      # Zap wand
nh.apply('a')                 # Use tool/key/horn

Doors and Stairs:

nh.open_door(Direction.E)     # Open door east
nh.close_door(Direction.W)    # Close door west

Special:

nh.pray()                     # Pray to deity (has cooldown)
nh.pay()                      # Pay shopkeeper
nh.engrave("Elbereth")        # Write on floor
nh.cast_spell('a', Direction.N)  # Cast spell
nh.look()                     # Look at current square
nh.send_keys("keys")          # Raw keystrokes (escape hatch)

Utility:

nh.wait(count=10)             # Rest for turns (auto-interrupts on attack)
nh.search(count=20)           # Search for secrets (auto-interrupts on danger)

Prompt responses (when the game asks [ynq] or [yn]):

nh.confirm()                  # Send 'y'
nh.deny()                     # Send 'n'
nh.escape()                   # Send ESC
nh.space()                    # Send space (dismiss --More--)

Navigation

nh.autoexplore(max_steps=500) # Explore automatically
# Returns AutoexploreResult with .stop_reason:
#   "fully_explored", "hostile", "low_hp", "hungry", "blocked"

nh.move_to(Position(42, 12))  # Pathfind to specific position
# Interrupts on: hunger change, HP < 30%

nh.find_unexplored()          # Find nearest unexplored tile
# Returns TargetResult with .position

Reminders and Notes

nh.add_reminder(turns=50, msg="Check corpse")  # One-time alert after N turns
nh.add_note(turns=0, msg="Wand has 5 charges") # Persistent note (0 = permanent)
nh.remove_note(note_id)                         # Remove a note
nh.get_active_notes()                            # List (id, message) tuples

LLM Context

Each turn, the agent receives a system prompt (once) and a decision prompt (every turn).

System prompt contains:

The full NetHackAPI reference documentation
Available tool descriptions
Strategic advice for NetHack play

Decision prompt contains:

=== CURRENT GAME VIEW ===
Your position: (40, 12)
{24x80 ASCII game screen}

Adjacent tiles:
  N: wall |
  S: floor .
  E: goblin d
  ...

Hostile Monsters:
  - goblin 'd' [E, adjacent]

Inventory:
  a: long sword (wielded)
  b: leather armor (worn)
  c: 3 food rations

Items on map:
  - gold coin at (38, 11)

Stairs:
  - Stairs down (>) at (35, 10)

Last Result:
game_messages:
  - You hit the goblin!
actions:
  - attack(E) ok

Study the map. What do you observe, and what action will you take?

The context is configurable. Sections like inventory, adjacent tiles, items on map, and hostile monsters can each be toggled on or off. Old turns are compressed to save tokens — maps are stripped from history and old tool call arguments are compacted.

Configuration

All settings live in config/default.yaml. Every setting can also be overridden via environment variables.

Agent

Setting	Default	Description
`agent.provider`	`"openrouter"`	LLM provider: `"openrouter"` or `"anthropic"`
`agent.model`	`"openai/gpt-5.2"`	Model identifier (OpenRouter or Anthropic format)
`agent.base_url`	`"https://openrouter.ai/api/v1"`	API endpoint
`agent.temperature`	`0.1`	Sampling temperature (0.0 = deterministic, 1.0 = creative)
`agent.reasoning`	`"high"`	Extended thinking effort: `none`, `minimal`, `low`, `medium`, `high`, `xhigh`
`agent.max_turns`	`100000`	Maximum turns per episode
`agent.max_consecutive_errors`	`5`	Errors before auto-quit
`agent.decision_timeout`	`60.0`	Seconds to wait for LLM response
`agent.skill_timeout`	`30.0`	Seconds to wait for skill execution
`agent.hp_flee_threshold`	`0.3`	Flee when HP drops below this fraction
`agent.skills_enabled`	`false`	Enable `write_skill`/`invoke_skill` tools
`agent.auto_save_skills`	`true`	Auto-save successful skills
`agent.log_decisions`	`true`	Log LLM decisions

Context Management

These settings control what the LLM sees each turn and how much history is retained.

Setting	Default	Description
`agent.max_history_turns`	`1000`	How many turns to keep. `0` = unlimited (compress old turns)
`agent.maps_in_history`	`0`	Recent turns that keep the full map. `0` = only current turn
`agent.tool_calls_in_history`	`10`	Recent turns that keep full tool call args. Older ones show `[compacted]`
`agent.show_inventory`	`true`	Include inventory in each turn's context
`agent.show_adjacent_tiles`	`true`	Show tile descriptions for all 8 directions
`agent.show_items_on_map`	`true`	List visible items with positions
`agent.local_map_mode`	`false`	Show cropped local view instead of full map
`agent.local_map_radius`	`7`	Radius of local view (7 = 15x15 area)

When local_map_mode is enabled, the LLM also gets a view_full_map tool to request the entire dungeon level when needed.

Environment

Setting	Default	Description
`environment.name`	`"NetHackChallenge-v0"`	NLE environment name
`environment.max_episode_steps`	`1000000`	NLE step limit per episode
`environment.render_mode`	`null`	`null`, `"human"`, or `"ansi"`
`environment.character`	`"random"`	`"random"` or specific like `"val-hum-law-fem"`

Skills

Setting	Default	Description
`skills.library_path`	`"./skills"`	Directory for saved skills
`skills.auto_save`	`true`	Auto-save successful skills
`skills.min_success_rate_to_save`	`0.3`	Minimum success rate to persist a skill
`skills.max_actions_per_skill`	`500`	Action limit per skill execution

Logging

Setting	Default	Description
`logging.level`	`"INFO"`	`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`
`logging.file`	`"./data/agent.log"`	Log file path
`logging.log_llm`	`true`	Log LLM prompts and responses
`logging.log_actions`	`true`	Log game actions

Environment Variables

Variable	Description
`OPENROUTER_API_KEY` (or `OPENROUTER_KEY`)	API key for OpenRouter
`ANTHROPIC_API_KEY`	API key for direct Anthropic access
`NETHACK_AGENT_MODEL`	Override model
`NETHACK_AGENT_PROVIDER`	Override provider
`NETHACK_AGENT_BASE_URL`	Override API endpoint
`NETHACK_AGENT_LOG_LEVEL`	Override log level

Project Structure

src/
├── cli.py                    # Entry point (watch, verify)
├── config.py                 # Configuration loading
├── agent/
│   ├── agent.py              # Main orchestration loop
│   ├── llm_client.py         # LLM API client (OpenRouter/Anthropic)
│   ├── parser.py             # LLM response parsing
│   └── prompts.py            # System and decision prompt templates
├── api/
│   ├── nethack_api.py        # High-level game API (the `nh` object)
│   ├── environment.py        # NLE gymnasium wrapper
│   ├── actions.py            # Action execution
│   ├── queries.py            # Map and state queries
│   ├── pathfinding.py        # A* pathfinding
│   ├── models.py             # Position, Stats, Monster, Item, Tile, etc.
│   ├── glyphs.py             # Glyph identification
│   └── knowledge.py          # Monster/item knowledge base
├── sandbox/
│   ├── manager.py            # Sandbox execution engine + API call tracking
│   ├── validation.py         # AST-based code security validation
│   └── exceptions.py         # Sandbox error types
├── memory/
│   ├── episode.py            # Episode-level memory coordination
│   ├── dungeon.py            # Dungeon level map tracking
│   ├── working.py            # Working memory (current session)
│   └── manager.py            # SQLite persistence
├── skills/
│   ├── executor.py           # Skill execution
│   ├── library.py            # Skill file management
│   ├── models.py             # Skill data structures
│   └── statistics.py         # Skill performance tracking
├── scoring/
│   └── progress.py           # Game progress scoring
└── tui/
    ├── app.py                # Textual TUI application
    ├── runner.py             # Agent runner for TUI
    ├── events.py             # Event system
    ├── logging.py            # TUI logging
    └── widgets/              # Stats bar, game screen, decision log, etc.

Development

# Run tests (unit tests only by default)
uv run pytest

# Run a specific test file
uv run pytest tests/test_nethack_api.py -v

# Run integration tests (requires API key)
uv run pytest -m integration

# Lint
uv run ruff check src/ tests/
uv run ruff format src/ tests/

Logs and Recordings

Logs are written to data/logs/ with filenames like run_2026-01-19_19-34-49.log. They include LLM requests/responses, game state, agent decisions, and API call results.

Recordings (when using --record) are saved to data/recordings/ as .cast files playable with asciinema play.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
alembic		alembic
config		config
docker		docker
frontend		frontend
images		images
scripts		scripts
skills		skills
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Dockerfile.frontend		Dockerfile.frontend
README.md		README.md
SPEC.md		SPEC.md
alembic.ini		alembic.ini
dev.sh		dev.sh
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
kill_tui.sh		kill_tui.sh
nginx.conf		nginx.conf
pyproject.toml		pyproject.toml
run_tui.sh		run_tui.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Glyphbox

How It Works

Setup

Usage

CLI Options

TUI Controls

The Sandbox

Security

Example Code

NetHackAPI Reference

Properties

State Queries

Actions

Navigation

Reminders and Notes

LLM Context

Configuration

Agent

Context Management

Environment

Skills

Logging

Environment Variables

Project Structure

Development

Logs and Recordings

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Glyphbox

How It Works

Setup

Usage

CLI Options

TUI Controls

The Sandbox

Security

Example Code

NetHackAPI Reference

Properties

State Queries

Actions

Navigation

Reminders and Notes

LLM Context

Configuration

Agent

Context Management

Environment

Skills

Logging

Environment Variables

Project Structure

Development

Logs and Recordings

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages