csb backup: ensure jsonl transcripts are actively committed to git (not just indexed)
Problem
CSB scans and indexes .jsonl transcript files from ~/.claude/projects/ into SQLite (session-backup.db), extracting metadata like timestamps, tool call counts, and working directories. However, there's an ambiguity about whether the actual .jsonl files are being committed to git as part of the backup.
The original .gitignore in ~/.claude/ was blocking .jsonl files from being tracked. CSB classifies projects/ as a NOISE_DIR and stages it as part of noise commits, but if the .gitignore excluded these files, the git add would silently skip them -- meaning the SQLite index has metadata about sessions whose actual content was never preserved in git.
This was discovered during a recovery effort where:
- 654
.jsonl files existed in manual filesystem backups but not in current ~/.claude/projects/
- Claude Code had purged them from
projects/ over time
csb restore couldn't recover them because they were never committed to git
- The only surviving copies were in manual
cp -r backups (.claude_ORIG, .claude - Copy, etc.)
What needs to be verified and fixed
-
Audit the current .gitignore: Confirm that *.jsonl and projects/ are NOT gitignored. If they are, update the gitignore so CSB's noise commits actually capture them.
-
Verify git add in noise commits captures .jsonl files: The two-commit model (noise + user) relies on git add for NOISE_DIRS. If projects/ is in NOISE_DIRS, its .jsonl contents should be staged. Verify this is happening in practice.
-
Handle large .jsonl files: Some transcripts are 10MB+. Options:
- Git LFS for files above a threshold
- Compression before commit (gzip the jsonl)
- Accept the repo size growth (simplest, since these are append-only logs)
-
Track history.jsonl: The global ~/.claude/history.jsonl (3.7MB+, growing) and ~/.claude/session-env are classified as NOISE_FILES in CSB's git_ops.py but should be verified as actually committed.
What's working
- CSB's metadata extraction from
.jsonl files works well (streaming parser, handles 100MB+ files)
- The two-commit model (noise vs user) correctly separates concerns
- Session deletion detection via git history works
csb restore successfully recovers sessions that ARE in git history
Acceptance criteria
Related issues
csb backup: ensure jsonl transcripts are actively committed to git (not just indexed)
Problem
CSB scans and indexes
.jsonltranscript files from~/.claude/projects/into SQLite (session-backup.db), extracting metadata like timestamps, tool call counts, and working directories. However, there's an ambiguity about whether the actual.jsonlfiles are being committed to git as part of the backup.The original
.gitignorein~/.claude/was blocking.jsonlfiles from being tracked. CSB classifiesprojects/as aNOISE_DIRand stages it as part of noise commits, but if the.gitignoreexcluded these files, thegit addwould silently skip them -- meaning the SQLite index has metadata about sessions whose actual content was never preserved in git.This was discovered during a recovery effort where:
.jsonlfiles existed in manual filesystem backups but not in current~/.claude/projects/projects/over timecsb restorecouldn't recover them because they were never committed to gitcp -rbackups (.claude_ORIG,.claude - Copy, etc.)What needs to be verified and fixed
Audit the current
.gitignore: Confirm that*.jsonlandprojects/are NOT gitignored. If they are, update the gitignore so CSB's noise commits actually capture them.Verify
git addin noise commits captures.jsonlfiles: The two-commit model (noise + user) relies ongit addfor NOISE_DIRS. Ifprojects/is in NOISE_DIRS, its.jsonlcontents should be staged. Verify this is happening in practice.Handle large
.jsonlfiles: Some transcripts are 10MB+. Options:Track
history.jsonl: The global~/.claude/history.jsonl(3.7MB+, growing) and~/.claude/session-envare classified as NOISE_FILES in CSB'sgit_ops.pybut should be verified as actually committed.What's working
.jsonlfiles works well (streaming parser, handles 100MB+ files)csb restoresuccessfully recovers sessions that ARE in git historyAcceptance criteria
.jsonlfiles in~/.claude/projects/are confirmed to be git-tracked (not gitignored)csb backupnoise commits include actual.jsonlfile content, not just metadatahistory.jsonlis committed as part of noisesession-envis committed as part of noisecsb restorecan recover any session thatcsb backuphas committedRelated issues