Open
Conversation
added 2 commits
March 3, 2026 19:58
Implements the kitty graphics protocol for inline image display:
Phase 0 — APC Routing:
- Vendored vte-graphics with APC state machine support
- Added apc_start/apc_put/apc_end to Perform and Handler traits
Phase 1 — Static Image Display:
- Key-value parser (kitty_parser.rs) for all protocol fields
- Chunked transfer accumulation (m=0/m=1)
- Decode pipeline: base64 → zlib → PNG/RGB/RGBA → GraphicData
- Image storage with 320 MiB quota and eviction
- Placement via existing insert_graphic() pipeline
- Protocol responses (OK/EINVAL) with quiet mode support
- Delete support (d=a/i/n, others stubbed)
- Source rectangle cropping, cursor movement control (C=1)
Modularised into kitty/{mod,state,decode,placement,response}.rs
for parallel development of Phase 2 features.
Validated end-to-end: timg -p kitty renders images correctly.
43 tests (unit + integration), 0 failures.
Clippy clean on alacritty_terminal and vte-graphics.
…es, scaling, animation
Phase 2 implements four parallel work streams on top of the Phase 0+1
baseline (static image display via direct base64 transmission).
Stream A — File & Shared Memory Transmission (decode.rs)
- Medium::File (t=f): read image data from base64-encoded file path
with offset (O=) and size (S=) support
- Medium::TempFile (t=t): relative to temp dir, auto-deleted after read,
path traversal validation rejects ../ and absolute paths
- Medium::SharedMemory (t=s): POSIX shm_open + mmap, unix-gated,
shm_unlink after read
- Per-chunk base64 decoding: handles clients like chafa that
independently encode each chunk (vs spec's split-one-string model)
- Lenient dimension check: truncates trailing bytes when slightly over
expected size (matches kitty's buf_used >= data_sz behavior)
Stream B — Full Delete Modes (state.rs)
- KittyPlacement tracking: col, row, width, height, z_index per placement
- All 11 delete target pairs implemented: cursor (d=c/C),
placement ID (d=p/P), column (d=q/Q), row (d=r/R), cell (d=x/X),
cell+z (d=y/Y), z-index (d=z/Z), animation frames (d=f/F)
- IncludingScrollback variants clean up orphaned images from storage
- Cursor position passed through from term grid
Stream C — Scaling & Pixel Offsets (placement.rs)
- Cell-based scaling (c=/r=): bilinear resize via image crate
- Both set: exact target dimensions
- Only columns: proportional height
- Only rows: proportional width
- Sub-cell pixel offsets (X=/Y=): transparent padding approach,
RGB→RGBA promotion when padding needed
- Pipeline: crop → scale → offset → insert_graphic
Stream D — Animation Frame Storage (animation.rs, new module)
- AnimationState/AnimationFrame data structures
- Lazy single→multi-frame promotion (WezTerm pattern)
- blit() with alpha compositing (source-over) and overwrite modes
- a=f handler: load_animation_frame with base frame copy, positional
blitting, frame editing by index
- a=c handler: compose_frames between frames in same animation
- a=a handler: control_animation (start/stop/loading, loop count)
- Wired into mod.rs dispatch (replaces stubs)
Protocol Compatibility
- Response suppression when id=0 (matches kitty's finish_command_response)
- DecodePaddingMode::Indifferent for base64 (accepts padded/unpadded)
Tests: 347 total (was 43), 0 failures, clippy clean
- 13 end-to-end integration tests (raw APC → VTE → Term → state verification)
- Unit tests across all streams (decode, state, placement, animation)
- Manual test script: 25 protocol-level tests with ✓/✗ feedback
- Real-world demo script: timg, chafa, mpv visual validation
with terminal mode reset after each tool invocation
|
🎉I can't believe you did something so amazing!!!🎉🥳 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds kitty graphics protocol support to the
graphicsbranch, building on top of the existing sixel infrastructure. The key insight driving the implementation is thatGraphicDatais protocol-agnostic — once kitty's APC payload is decoded into a pixel buffer, the entire GPU pipeline (texture upload, cell attachment, shader rendering) works identically to sixel. No renderer changes were needed.Method
Analyse kovid's spec, wezterm, ghostty, and alacritty-graphics. Turns out not a lot, mostly encoding/decoding machinery, which I generated (see docs)
Architecture
The new code lives entirely in
alacritty_terminal. The rendering side (alacritty/src/display/) is untouched — kitty images flow through the sameinsert_graphic()→GraphicCell→ texture upload → fragment shader path that sixel already uses.What's implemented
APC routing —
vte-graphicsis patched (dsturnbull/vte@kitty) to dispatchapc_start/apc_put/apc_end, following the same pattern as the existing DCS hooks. The VTE state machine already recognised theAPC_STRINGstate; it just wasn't wired to callbacks.Key-value parser (
kitty_parser.rs, ~800 LOC) — Streaming parser for theESC _G<key>=<value>[,...];<payload>ESC \format. Handles all documented protocol fields including the overloaded animation keys (c=/r=/z=mean different things in frame context vs display context).Transmission mediums (
decode.rs, ~1100 LOC):t=dDirect — inline base64, with lenient padding handling for clients like chafa that independently encode each chunkt=fFile — base64-encoded file path with offset/size supportt=tTemp file — relative to$TMPDIR, auto-deleted after read, path traversal validation rejects../and absolute pathst=sShared memory — POSIXshm_open+mmap,shm_unlinkafter read (unix-gated)Decode pipeline: base64 → zlib (optional,
o=z) → format dispatch (RGBf=24, RGBAf=32, PNGf=100) →GraphicData. Chunked transfers (m=0/m=1) decode each chunk's base64 on arrival rather than concatenating base64 strings, which handles the chafa per-chunk-padding case correctly.Image storage (
state.rs, ~1060 LOC) —HashMap<u32, KittyImage>with 320 MiB quota and LRU eviction (matching kitty/ghostty/wezterm). Image number (I=) → image ID (i=) mapping. All 11 delete target pairs:d=a/Ad=i/Id=n/Nd=c/Cd=q/Qd=r/Rd=x/Xd=y/Yd=z/Zd=p/Pd=f/FThe
IncludingScrollback(uppercase) variants clean up orphaned images from storage.Placement (
placement.rs, ~600 LOC) — Source rectangle cropping (x=/y=/w=/h=), cell-based scaling (c=/r=via bilinear resize), sub-cell pixel offsets (X=/Y=via transparent padding), cursor movement control (C=1). Pipeline: crop → scale → offset →insert_graphic.Animation (
animation.rs, ~1120 LOC) — Frame storage with lazy single→multi-frame promotion (following the WezTerm pattern).a=floads frames with base-frame copy and positional blitting.a=ccomposites between frames (alpha blending and overwrite modes).a=acontrols playback (start/stop/loading, loop count). Note: the animation tick loop (advancing frames on a timer and re-uploading textures) is not yet wired into the display event loop — frames are stored and composed but not auto-advanced.Protocol responses (
response.rs) —ESC _Gi=<id>;OK ESC \/ESC _Gi=<id>;EINVAL:<msg> ESC \viaEvent::PtyWrite. Respectsq=1(suppress OK) andq=2(suppress all). No response wheni=0andI=0(matching kitty'sfinish_command_responsebehaviour).Testing
184
#[test]functions across the modules:scripts/test_kitty_graphics.py— 25 manual protocol-level tests with visual ✓/✗ feedback (query, PNG, chunked, file/tempfile/shm mediums, delete modes, scaling, pixel offsets, animation frames)scripts/demo_kitty_graphics.sh— real-world validation with timg, chafa, and mpvWhat works end-to-end
timg -p kitty image.png✓timg -p kitty animation.gif✓chafa -f kitty image.png✓kitty +kitten icat image.png✓ (chunked PNG transfer)New dependencies
flate21.1o=z)png0.18f=100) — also fixes a breaking API change in the window icon loaderimage0.25 (minimal, png+rayon features)c=/r=cell-based resizetempfile3 (dev)Segfaults with mpv video playback — looking for input
mpv --vo=kittyandtimg -V(video mode) send images at 30+ fps, effectively stress-testing the graphics pipeline. Under this load, both tools reliably trigger a segfault inmpv. n.b. this also happens in ghostty!