Conversation
…sub-feature A of 3)
Adds canonical CSL-JSON 1.0.2 as a second `--format` for `bib snapshot`
and routes `bib verify` through the sidecar's `format` discriminant.
BibTeX path stays untouched; existing tests pass without modification.
Field mapping (canonical CSL 1.0.2, schema-validated in tests):
- `bibtex_key` → `id` (falls back to `generate_key(paper)` then `paper.id`)
- always emits `type: "article-journal"` (data model has no paper_type)
- `title` → `title`
- `authors[i]` → `{ family, given }` parsed on first comma; family-only fallback
- `year` → `issued.date-parts[0][0]` (number)
- `journal` → `container-title` (canonical name; not `journal`)
- `doi` → `DOI` (canonical uppercase)
- `url` → `URL`
- `abstract` → `abstract`
- paper-tags → `keyword` (single comma-joined string per spec, NOT an array)
Sidecar gains `format: "csl-json"` discriminant alongside the existing
`"bibtex"`. Schema unchanged; new constructor `BibLockfile::new_csl_json`
mirrors `new_bibtex`. Verify reads the sidecar's `format` to pick the
matching emitter, so JSON drifts compare against JSON regenerates.
Determinism contract identical to BibTeX: sorted entries (by `id`),
fixed key order, omitted-not-nulled optional fields, LF endings, no
timestamps in the output.
CLI / MCP:
- `bib snapshot --format <bibtex|csl-json>`; default `--output` is
`paper.bib` for bibtex and `paper.json` for csl-json
- `bib verify` autoroutes on the sidecar's `format` (no flag needed)
- MCP `bib_snapshot` accepts optional `format` param
Tests (32 new; full workspace passes):
- 24 unit tests in `csl_json.rs` (incl. 3 schema-validation cases for
full / minimal / multi-author entries; hand-rolled validator
per-canonical-rules since `jsonschema` would be a heavy add)
- 7 integration tests in `bib_csl_json.rs` covering snapshot
determinism, drift, default output naming, sidecar format field,
and a backwards-compat assertion for the BibTeX path
- existing 7 BibTeX integration tests still pass unmodified
[tape-exempt: text-only --format flag addition under bib snapshot;
no TUI/rendering surface. Round-trip behavior verified by 7
shell-out integration tests. CSL-JSON output is JSON, not user-
visible terminal content.]
Refs #135 (sub-feature A of 3 — TUI keybind = B, `bib diff` = C).
a8809d8 to
f9db120
Compare
5 tasks
gerchowl
added a commit
that referenced
this pull request
Apr 28, 2026
… of 3) (#181) ## Summary - New `scitadel bib diff <file_a> [<file_b>] [--question-id <id>]` — entry-level (added / removed / changed) structural diff between two bibliographies, the human-readable explainer when `bib verify` says drift detected. Auto-detects BibTeX vs CSL-JSON by content sniff; the two flavors are interchangeable in either argument slot. - Identity rule per the design: `citekey → DOI → arxiv_id → (title, year)`, strict per-rung first-match-wins. Lists sorted by citekey for deterministic output. - Hand-rolled ANSI in `scitadel-export::diff_format` (no new color/diff crates), TTY-detect via `std::io::IsTerminal`, `--no-color` for CI. - `--format json` for structured CI consumption; serde-derived round-trip. - Exit codes `0` (no diff) / `1` (any diff) mirror `git diff`. - MCP tool `bib_diff` mirroring the CLI for agent workflows. ## Test plan - [x] 34 unit tests in `scitadel-export` (4 identity-rung tests, 6 field-change tests, sort determinism, JSON round-trip, mixed-format zero-diff, format detection, ANSI color toggle) - [x] 7 end-to-end CLI tests (identical files exit 0, drifted files exit 1 with the documented text report, `--no-color` strips ANSI, `--format json` round-trips, BibTeX-vs-CSL-JSON of the same shortlist exit 0, `--question-id` form against a seeded DB, missing-second-side error) - [x] 6 inline CLI plumbing tests (TTY toggle, exit-code derivation, format dispatch) - [x] 335 workspace tests passing (excluding `scitadel-tui` per quality gate) - [x] `just lint` clean ## Closes Closes #135 (after #180 sub-feature A and #151 sub-feature B landed) [tape-exempt: text-only --help addition under bib diff; output is colored text on stdout, not TUI rendering]
18 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Sub-feature A of #135 (3 file-disjoint PRs): adds canonical CSL-JSON 1.0.2 as a second
--formatforbib snapshotand threads it through the existing.scitadel-bib.lockplumbing landed in #179. BibTeX path is 100% backwards compatible.crates/scitadel-export/src/csl_json.rsmirroringbibtex.rsshape (export_csl_json,export_csl_json_with_tags).BibLockfile::new_csl_jsonalongsidenew_bibtex; sidecar schema unchanged — only theformatdiscriminant flips.bib snapshot --format <bibtex|csl-json>(defaultbibtex);--outputdefaults topaper.biborpaper.jsonper format.bib verifyreads the sidecar'sformatfield and routes to the matching emitter (no new flag required).bib_snapshotgains an optionalformatparameter;bib_verifyautoroutes.CSL field mapping decisions (canonical CSL 1.0.2, schema-validated in tests)
Paperfieldbibtex_key(orgenerate_key(paper), thenpaper.id)idpaper_type)type"article-journal"; emitter gates againstCSL_TYPESenum so a future paper_type field can map safelytitletitleauthorsauthor(array of{family, given})yearissued.date-parts[[YYYY]]as numberjournalcontainer-titlejournaldoiDOIurlURLabstractabstractkeywordOmissions vs. emit-empty: when a
Paperfield isNone/ empty, the CSL key is omitted entirely rather than emitted asnull/"". The canonical schema treats null and empty-string differently from absence in some processors; absence is the lossless choice.No
paper_typemapper: the data model carries no equivalent today. The CLI / MCP surface always emits"article-journal". A future paper_type field can map throughCSL_TYPES(the canonical 51-value enum is exposed as apub constfor that hookup).Test plan
csl_json.rscovering field mapping, determinism, omission rules, multi-author parsing, and Unicode preservation.jsonschemacrate would be a heavy add for what's a structurally simple emitter we control.bib_csl_json.rs(binary-shells the CLI):journal/doi/url/keywordsare NOT present)paper.jsonfor--format csl-jsonformat: "csl-json"fieldpaper.bib+format: "bibtex"cargo test --workspace --exclude scitadel-tuipasses.just lint(rustfmt + clippy-D warnings) clean.Caveats
Last, Firstconvention. Names without a comma fall back tofamily-only (matches CSLliteralin spirit without requiring theliteralfield, which downstream processors render inconsistently).bib snapshotwithout--format csl-json; users mid-CSL-workflow whose sidecar got deleted will see the bibtex hint. Cheap to refine in a follow-up; not worth a custom heuristic now.Out of scope (other sub-features)
Eon QuestionDashboard) → sub-feature Bbib difffor human-readable explanation of verify failures → sub-feature C (will close feat(bib): CSL-JSON + TUI export keybind + diff (iter 3 of 3) #135)Refs #135 (sub-feature A of 3)