A NetHack harness that exposes a Python sandbox as the primary interface for LLMs to play NetHack. The LLM receives the full game screen each turn and responds with Python code that executes against a high-level game API.
LLM receives game screen each turn
│
│ responds with Python code via execute_code tool call
▼
Python Sandbox (restricted execution environment)
│
│ code calls methods on `nh` (NetHackAPI)
▼
NetHackAPI (high-level game interface)
│
│ translates to gymnasium actions
▼
NetHack Learning Environment (NLE)
NetHack 3.6.7 simulator
Each turn, the LLM sees the full 24x80 ASCII game screen, game messages, and results from its previous actions. It writes Python code that runs in a sandboxed environment with access to nh — a NetHackAPI instance that wraps all game interactions into clean method calls.
git clone <repo-url>
cd glyphbox
uv syncSet an API key for your provider:
# OpenRouter (default provider)
export OPENROUTER_API_KEY="your-key"Verify the setup:
uv run python -m src.cli verify# Watch the agent play in a TUI
uv run python -m src.cli watch
# Use a specific model
uv run python -m src.cli watch --model anthropic/claude-3-haiku-20240307
# Record the session with asciinema
uv run python -m src.cli watch --record
# Play back a recording
asciinema play data/recordings/run_2026-01-19_19-34-49.cast| Flag | Description |
|---|---|
--model, -m |
Override the LLM model |
--record, -r |
Record session with asciinema (saves to data/recordings/) |
--config, -c |
Path to config file (default: config/default.yaml) |
--log-level, -l |
DEBUG, INFO, WARNING, ERROR |
| Key | Action |
|---|---|
S |
Start the agent |
Space |
Pause / Resume |
Q |
Quit |
The LLM interacts with NetHack exclusively through an execute_code tool call. The submitted Python code runs in a restricted sandbox with these globals available:
| Name | Type | Description |
|---|---|---|
nh |
NetHackAPI |
Game interface — all actions and queries |
Direction |
Enum |
N, S, E, W, NE, NW, SE, SW, UP, DOWN, SELF |
Position |
dataclass |
(x, y) coordinates with .distance_to(), .direction_to(), .adjacent() |
PathResult |
dataclass |
Pathfinding result with .path and .reason |
PathStopReason |
Enum |
Why pathfinding stopped |
TargetResult |
dataclass |
Target search result with .position and .success |
HungerState |
Enum |
SATIATED, NOT_HUNGRY, HUNGRY, WEAK, FAINTING, FAINTED |
random |
module | Python random module |
Standard builtins (print, len, range, min, max, sorted, list, dict, etc.) are available. Imports are forbidden — all needed types are pre-loaded.
Code is validated via AST inspection before execution:
- Forbidden imports:
os,sys,subprocess,socket,http,pickle,threading,multiprocessing, etc. - Forbidden calls:
exec,eval,compile,open,input,__import__ - Forbidden attributes:
__class__,__bases__,__dict__,__globals__,__code__ - Timeout: signal-based
SIGALRMkills runaway code
What the LLM submits each turn:
# Check health and eat if hungry
if nh.is_hungry:
food = nh.get_food()
if food:
nh.eat(food[0].slot)
else:
nh.pray()
elif nh.has_adjacent_hostile:
for monster in nh.get_adjacent_hostiles():
direction = nh.position.direction_to(monster.position)
nh.attack(direction)
else:
nh.autoexplore()All methods are synchronous. Actions consume game turns; queries do not.
| Property | Type | Description |
|---|---|---|
nh.hp |
int |
Current hit points |
nh.max_hp |
int |
Maximum hit points |
nh.position |
Position |
Current (x, y) |
nh.dungeon_level |
int |
Current depth |
nh.turn |
int |
Game turn counter |
nh.is_hungry |
bool |
Hunger >= HUNGRY |
nh.is_weak |
bool |
Hunger is WEAK or FAINTING |
nh.has_adjacent_hostile |
bool |
Hostile monster in adjacent tile |
nh.turns_since_last_prayer |
int |
Turns since last prayer |
nh.role |
str |
Player class (e.g. "Valkyrie") |
nh.is_done |
bool |
Episode has ended |
nh.get_stats() # Stats: hp, max_hp, pw, ac, xp_level, gold, hunger, etc.
nh.get_position() # Position (x, y)
nh.get_screen() # Full 24x80 ASCII screen as string
nh.get_screen_lines() # List of 24 screen lines
nh.get_local_map(radius=7) # Cropped view centered on player with coordinate guides
nh.get_message() # Current top-of-screen message
nh.get_messages(n=10) # Last n game messages
nh.get_current_level() # DungeonLevel with parsed Tile grid
nh.get_tile(pos) # Tile at position
nh.get_adjacent_tiles() # Dict mapping "N"/"S"/"E"/etc. to tile descriptions
nh.get_visible_monsters() # All visible monsters
nh.get_adjacent_hostiles() # Hostile monsters in 8 adjacent tiles
nh.get_hostile_monsters() # All non-peaceful monsters
nh.get_inventory() # Inventory items (each has .slot, .name, .quantity)
nh.get_food() # Food items in inventory
nh.get_weapons() # Weapons in inventory
nh.get_items_here() # Items on the ground at current position
nh.get_items_here_glyphs() # Items at current position (glyph-based, no turn cost)
nh.get_items_on_map() # All visible items on map with positions
nh.find_stairs() # (stairs_up_pos, stairs_down_pos)
nh.find_doors() # List of (Position, is_open)
nh.find_altars() # List of altar positions
nh.find_nearest_item() # TargetResult with .position and .successAll actions return ActionResult with .success, .messages, .turn_elapsed.
Movement:
nh.move(Direction.E) # Move one tile east
nh.move(Direction.N, count=5) # Move up to 5 tiles north (auto-stops at walls/monsters)
nh.run(Direction.E) # Run until wall/intersection/monster
nh.move_to(Position(42, 12)) # Pathfind and walk to position
nh.go_up() # Ascend stairs
nh.go_down() # Descend stairsCombat:
nh.attack(Direction.N) # Melee attack north
nh.kick(Direction.S) # Kick south
nh.fire(Direction.E) # Fire ranged weapon east
nh.throw('a', Direction.W) # Throw inventory item 'a' westItems:
nh.pickup() # Pick up all items here
nh.drop('a') # Drop item by inventory letter
nh.eat('a') # Eat from inventory
nh.eat() # Eat from ground
nh.quaff('a') # Drink potion
nh.read('a') # Read scroll/spellbook
nh.wield('a') # Wield weapon
nh.wear('a') # Wear armor
nh.take_off('a') # Remove worn armor
nh.zap('a', Direction.N) # Zap wand
nh.apply('a') # Use tool/key/hornDoors and Stairs:
nh.open_door(Direction.E) # Open door east
nh.close_door(Direction.W) # Close door westSpecial:
nh.pray() # Pray to deity (has cooldown)
nh.pay() # Pay shopkeeper
nh.engrave("Elbereth") # Write on floor
nh.cast_spell('a', Direction.N) # Cast spell
nh.look() # Look at current square
nh.send_keys("keys") # Raw keystrokes (escape hatch)Utility:
nh.wait(count=10) # Rest for turns (auto-interrupts on attack)
nh.search(count=20) # Search for secrets (auto-interrupts on danger)Prompt responses (when the game asks [ynq] or [yn]):
nh.confirm() # Send 'y'
nh.deny() # Send 'n'
nh.escape() # Send ESC
nh.space() # Send space (dismiss --More--)nh.autoexplore(max_steps=500) # Explore automatically
# Returns AutoexploreResult with .stop_reason:
# "fully_explored", "hostile", "low_hp", "hungry", "blocked"
nh.move_to(Position(42, 12)) # Pathfind to specific position
# Interrupts on: hunger change, HP < 30%
nh.find_unexplored() # Find nearest unexplored tile
# Returns TargetResult with .positionnh.add_reminder(turns=50, msg="Check corpse") # One-time alert after N turns
nh.add_note(turns=0, msg="Wand has 5 charges") # Persistent note (0 = permanent)
nh.remove_note(note_id) # Remove a note
nh.get_active_notes() # List (id, message) tuplesEach turn, the agent receives a system prompt (once) and a decision prompt (every turn).
System prompt contains:
- The full NetHackAPI reference documentation
- Available tool descriptions
- Strategic advice for NetHack play
Decision prompt contains:
=== CURRENT GAME VIEW ===
Your position: (40, 12)
{24x80 ASCII game screen}
Adjacent tiles:
N: wall |
S: floor .
E: goblin d
...
Hostile Monsters:
- goblin 'd' [E, adjacent]
Inventory:
a: long sword (wielded)
b: leather armor (worn)
c: 3 food rations
Items on map:
- gold coin at (38, 11)
Stairs:
- Stairs down (>) at (35, 10)
Last Result:
game_messages:
- You hit the goblin!
actions:
- attack(E) ok
Study the map. What do you observe, and what action will you take?
The context is configurable. Sections like inventory, adjacent tiles, items on map, and hostile monsters can each be toggled on or off. Old turns are compressed to save tokens — maps are stripped from history and old tool call arguments are compacted.
All settings live in config/default.yaml. Every setting can also be overridden via environment variables.
| Setting | Default | Description |
|---|---|---|
agent.provider |
"openrouter" |
LLM provider: "openrouter" or "anthropic" |
agent.model |
"openai/gpt-5.2" |
Model identifier (OpenRouter or Anthropic format) |
agent.base_url |
"https://openrouter.ai/api/v1" |
API endpoint |
agent.temperature |
0.1 |
Sampling temperature (0.0 = deterministic, 1.0 = creative) |
agent.reasoning |
"high" |
Extended thinking effort: none, minimal, low, medium, high, xhigh |
agent.max_turns |
100000 |
Maximum turns per episode |
agent.max_consecutive_errors |
5 |
Errors before auto-quit |
agent.decision_timeout |
60.0 |
Seconds to wait for LLM response |
agent.skill_timeout |
30.0 |
Seconds to wait for skill execution |
agent.hp_flee_threshold |
0.3 |
Flee when HP drops below this fraction |
agent.skills_enabled |
false |
Enable write_skill/invoke_skill tools |
agent.auto_save_skills |
true |
Auto-save successful skills |
agent.log_decisions |
true |
Log LLM decisions |
These settings control what the LLM sees each turn and how much history is retained.
| Setting | Default | Description |
|---|---|---|
agent.max_history_turns |
1000 |
How many turns to keep. 0 = unlimited (compress old turns) |
agent.maps_in_history |
0 |
Recent turns that keep the full map. 0 = only current turn |
agent.tool_calls_in_history |
10 |
Recent turns that keep full tool call args. Older ones show [compacted] |
agent.show_inventory |
true |
Include inventory in each turn's context |
agent.show_adjacent_tiles |
true |
Show tile descriptions for all 8 directions |
agent.show_items_on_map |
true |
List visible items with positions |
agent.local_map_mode |
false |
Show cropped local view instead of full map |
agent.local_map_radius |
7 |
Radius of local view (7 = 15x15 area) |
When local_map_mode is enabled, the LLM also gets a view_full_map tool to request the entire dungeon level when needed.
| Setting | Default | Description |
|---|---|---|
environment.name |
"NetHackChallenge-v0" |
NLE environment name |
environment.max_episode_steps |
1000000 |
NLE step limit per episode |
environment.render_mode |
null |
null, "human", or "ansi" |
environment.character |
"random" |
"random" or specific like "val-hum-law-fem" |
| Setting | Default | Description |
|---|---|---|
skills.library_path |
"./skills" |
Directory for saved skills |
skills.auto_save |
true |
Auto-save successful skills |
skills.min_success_rate_to_save |
0.3 |
Minimum success rate to persist a skill |
skills.max_actions_per_skill |
500 |
Action limit per skill execution |
| Setting | Default | Description |
|---|---|---|
logging.level |
"INFO" |
DEBUG, INFO, WARNING, ERROR, CRITICAL |
logging.file |
"./data/agent.log" |
Log file path |
logging.log_llm |
true |
Log LLM prompts and responses |
logging.log_actions |
true |
Log game actions |
| Variable | Description |
|---|---|
OPENROUTER_API_KEY (or OPENROUTER_KEY) |
API key for OpenRouter |
ANTHROPIC_API_KEY |
API key for direct Anthropic access |
NETHACK_AGENT_MODEL |
Override model |
NETHACK_AGENT_PROVIDER |
Override provider |
NETHACK_AGENT_BASE_URL |
Override API endpoint |
NETHACK_AGENT_LOG_LEVEL |
Override log level |
src/
├── cli.py # Entry point (watch, verify)
├── config.py # Configuration loading
├── agent/
│ ├── agent.py # Main orchestration loop
│ ├── llm_client.py # LLM API client (OpenRouter/Anthropic)
│ ├── parser.py # LLM response parsing
│ └── prompts.py # System and decision prompt templates
├── api/
│ ├── nethack_api.py # High-level game API (the `nh` object)
│ ├── environment.py # NLE gymnasium wrapper
│ ├── actions.py # Action execution
│ ├── queries.py # Map and state queries
│ ├── pathfinding.py # A* pathfinding
│ ├── models.py # Position, Stats, Monster, Item, Tile, etc.
│ ├── glyphs.py # Glyph identification
│ └── knowledge.py # Monster/item knowledge base
├── sandbox/
│ ├── manager.py # Sandbox execution engine + API call tracking
│ ├── validation.py # AST-based code security validation
│ └── exceptions.py # Sandbox error types
├── memory/
│ ├── episode.py # Episode-level memory coordination
│ ├── dungeon.py # Dungeon level map tracking
│ ├── working.py # Working memory (current session)
│ └── manager.py # SQLite persistence
├── skills/
│ ├── executor.py # Skill execution
│ ├── library.py # Skill file management
│ ├── models.py # Skill data structures
│ └── statistics.py # Skill performance tracking
├── scoring/
│ └── progress.py # Game progress scoring
└── tui/
├── app.py # Textual TUI application
├── runner.py # Agent runner for TUI
├── events.py # Event system
├── logging.py # TUI logging
└── widgets/ # Stats bar, game screen, decision log, etc.
# Run tests (unit tests only by default)
uv run pytest
# Run a specific test file
uv run pytest tests/test_nethack_api.py -v
# Run integration tests (requires API key)
uv run pytest -m integration
# Lint
uv run ruff check src/ tests/
uv run ruff format src/ tests/Logs are written to data/logs/ with filenames like run_2026-01-19_19-34-49.log. They include LLM requests/responses, game state, agent decisions, and API call results.
Recordings (when using --record) are saved to data/recordings/ as .cast files playable with asciinema play.
MIT
