Skip to content

Distilled conversation backup -- secondary readable layer (Alpha #1) #12

@djdarcy

Description

@djdarcy

Distilled conversation backup -- secondary readable layer (Alpha #1)

Epic

One of four alpha-release criteria. This issue tracks the "distilled backup" feature: extracting just user messages and assistant responses from each session JSONL into a compact, human-readable format that serves as a secondary backup layer alongside the full git-tracked JSONLs.

Why

The full JSONL is the authoritative record but it's verbose (tool calls, tool results, system metadata, progress events). Reading a JSONL with a text editor is painful. A distilled copy is:

  1. Readable -- you can open it in any editor and scan the conversation
  2. Smaller -- 10% of the original size for text-heavy sessions, much less for tool-heavy ones
  3. Greppable -- plain text or markdown that grep / ripgrep can search without JSONL parsing
  4. Damage-tolerant -- if the original JSONL is corrupted or partially lost, the distilled copy still has the conversation

This is similar to what claude-vault does with its SQLite FTS5 archive, but:

Approach

Two implementation paths, not mutually exclusive:

Path A: csb-side distillation (historical sessions)

Add csb distill <id> or csb distill --all that parses a JSONL, extracts user+assistant messages, writes markdown to ~/.claude/distilled/<project>/<session-id>.md:

# Session: EXT-TOOL__CLAUDE-VAULT
**ID:** 916441e6-afca-466d-b00b-94801d090ef5
**Started:** 2026-03-23 18:14
**Last active:** 2026-04-10 21:30

---

## [user] 2026-03-23 18:14
Hey Claude I just git cloned claude-vault...

## [assistant] 2026-03-23 18:14
I'll gather information about this project...

## [user] 2026-03-23 18:20
Can you check if...

Could be wired into csb backup as an always-on step: after git commit, emit distilled copies for any newly-scanned or updated sessions. Or as an opt-in via config.

Path B: session-logger-side distillation (new sessions)

Extend claude-session-logger with new log channels: user and assistant. Every time a user sends a message or Claude responds, append to the session's distilled log alongside the existing sesslog_*, shell_*, and tasks_* channels. This gives us distillation for free on new sessions without waiting for backup to run.

Session-logger already has the channel routing infrastructure (ChannelConfig, RoutingConfig in log-command.py). Adding two new channels is additive and shouldn't break existing users.

Recommendation: Do both. Path A handles the ~116 historical sessions; Path B handles everything going forward. csb and session-logger are complementary here.

Acceptance criteria

  • csb distill <id> extracts user+assistant text from a JSONL and writes to ~/.claude/distilled/<project>/<session-id>.md
  • csb distill --all runs across all indexed sessions
  • csb backup optionally triggers distillation (config flag distill_on_backup: true)
  • Distilled files are git-tracked (part of the noise commit)
  • Format: markdown with role headers, timestamps, and message bodies
  • Handles multi-block assistant messages (text + thinking + tool_use)
  • Preserves user message formatting (code blocks, etc.) where possible
  • Tests: round-trip (distill -> verify key messages present)
  • Session-logger PR/issue filed for the user/assistant channels addition

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    alpha-blockerMust be resolved before alpha releaseenhancementNew feature or requestepicLarge multi-part feature or effort

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions