Skip to content

feat(bib): CSL-JSON 1.0.2 export + sidecar format discriminant (#135 sub-feature A of 3)#180

Merged
gerchowl merged 1 commit intodevfrom
feat/135a-csl-json-export
Apr 28, 2026
Merged

feat(bib): CSL-JSON 1.0.2 export + sidecar format discriminant (#135 sub-feature A of 3)#180
gerchowl merged 1 commit intodevfrom
feat/135a-csl-json-export

Conversation

@gerchowl
Copy link
Copy Markdown
Contributor

Summary

Sub-feature A of #135 (3 file-disjoint PRs): adds canonical CSL-JSON 1.0.2 as a second --format for bib snapshot and threads it through the existing .scitadel-bib.lock plumbing landed in #179. BibTeX path is 100% backwards compatible.

  • New crates/scitadel-export/src/csl_json.rs mirroring bibtex.rs shape (export_csl_json, export_csl_json_with_tags).
  • BibLockfile::new_csl_json alongside new_bibtex; sidecar schema unchanged — only the format discriminant flips.
  • bib snapshot --format <bibtex|csl-json> (default bibtex); --output defaults to paper.bib or paper.json per format.
  • bib verify reads the sidecar's format field and routes to the matching emitter (no new flag required).
  • MCP bib_snapshot gains an optional format parameter; bib_verify autoroutes.

CSL field mapping decisions (canonical CSL 1.0.2, schema-validated in tests)

Internal Paper field CSL field Notes
bibtex_key (or generate_key(paper), then paper.id) id always present; deterministic
n/a (model has no paper_type) type always "article-journal"; emitter gates against CSL_TYPES enum so a future paper_type field can map safely
title title omitted when empty
authors author (array of {family, given}) split on first comma; family-only when no comma; order preserved
year issued.date-parts [[YYYY]] as number
journal container-title canonical name; not journal
doi DOI canonical uppercase
url URL canonical uppercase
abstract abstract
paper-tags keyword single comma-joined string per spec, not array

Omissions vs. emit-empty: when a Paper field is None / empty, the CSL key is omitted entirely rather than emitted as null / "". The canonical schema treats null and empty-string differently from absence in some processors; absence is the lossless choice.

No paper_type mapper: the data model carries no equivalent today. The CLI / MCP surface always emits "article-journal". A future paper_type field can map through CSL_TYPES (the canonical 51-value enum is exposed as a pub const for that hookup).

Test plan

  • 24 unit tests in csl_json.rs covering field mapping, determinism, omission rules, multi-author parsing, and Unicode preservation.
  • 3 schema-validation tests (full metadata / minimal / multi-author) using a hand-rolled validator against canonical 1.0.2 invariants — jsonschema crate would be a heavy add for what's a structurally simple emitter we control.
  • 7 integration tests in bib_csl_json.rs (binary-shells the CLI):
    • snapshot byte-determinism across runs
    • canonical field names (asserts journal / doi / url / keywords are NOT present)
    • verify-then-zero on fresh snapshot
    • verify-then-one with diff on content drift
    • default output paper.json for --format csl-json
    • sidecar format: "csl-json" field
    • backwards-compat assertion: bib path still writes paper.bib + format: "bibtex"
  • All 7 existing BibTeX integration tests pass unmodified.
  • Full cargo test --workspace --exclude scitadel-tui passes.
  • just lint (rustfmt + clippy -D warnings) clean.

Caveats

  • Author parsing uses Last, First convention. Names without a comma fall back to family-only (matches CSL literal in spirit without requiring the literal field, which downstream processors render inconsistently).
  • The verify "no lockfile" hint suggests bib snapshot without --format csl-json; users mid-CSL-workflow whose sidecar got deleted will see the bibtex hint. Cheap to refine in a follow-up; not worth a custom heuristic now.

Out of scope (other sub-features)

Refs #135 (sub-feature A of 3)

…sub-feature A of 3)

Adds canonical CSL-JSON 1.0.2 as a second `--format` for `bib snapshot`
and routes `bib verify` through the sidecar's `format` discriminant.
BibTeX path stays untouched; existing tests pass without modification.

Field mapping (canonical CSL 1.0.2, schema-validated in tests):
- `bibtex_key` → `id` (falls back to `generate_key(paper)` then `paper.id`)
- always emits `type: "article-journal"` (data model has no paper_type)
- `title` → `title`
- `authors[i]` → `{ family, given }` parsed on first comma; family-only fallback
- `year` → `issued.date-parts[0][0]` (number)
- `journal` → `container-title` (canonical name; not `journal`)
- `doi` → `DOI` (canonical uppercase)
- `url` → `URL`
- `abstract` → `abstract`
- paper-tags → `keyword` (single comma-joined string per spec, NOT an array)

Sidecar gains `format: "csl-json"` discriminant alongside the existing
`"bibtex"`. Schema unchanged; new constructor `BibLockfile::new_csl_json`
mirrors `new_bibtex`. Verify reads the sidecar's `format` to pick the
matching emitter, so JSON drifts compare against JSON regenerates.

Determinism contract identical to BibTeX: sorted entries (by `id`),
fixed key order, omitted-not-nulled optional fields, LF endings, no
timestamps in the output.

CLI / MCP:
- `bib snapshot --format <bibtex|csl-json>`; default `--output` is
  `paper.bib` for bibtex and `paper.json` for csl-json
- `bib verify` autoroutes on the sidecar's `format` (no flag needed)
- MCP `bib_snapshot` accepts optional `format` param

Tests (32 new; full workspace passes):
- 24 unit tests in `csl_json.rs` (incl. 3 schema-validation cases for
  full / minimal / multi-author entries; hand-rolled validator
  per-canonical-rules since `jsonschema` would be a heavy add)
- 7 integration tests in `bib_csl_json.rs` covering snapshot
  determinism, drift, default output naming, sidecar format field,
  and a backwards-compat assertion for the BibTeX path
- existing 7 BibTeX integration tests still pass unmodified

[tape-exempt: text-only --format flag addition under bib snapshot;
no TUI/rendering surface. Round-trip behavior verified by 7
shell-out integration tests. CSL-JSON output is JSON, not user-
visible terminal content.]

Refs #135 (sub-feature A of 3 — TUI keybind = B, `bib diff` = C).
@gerchowl gerchowl force-pushed the feat/135a-csl-json-export branch from a8809d8 to f9db120 Compare April 28, 2026 12:06
@gerchowl gerchowl merged commit 744482a into dev Apr 28, 2026
11 of 12 checks passed
@gerchowl gerchowl deleted the feat/135a-csl-json-export branch April 28, 2026 12:08
gerchowl added a commit that referenced this pull request Apr 28, 2026
… of 3) (#181)

## Summary

- New `scitadel bib diff <file_a> [<file_b>] [--question-id <id>]` —
entry-level (added / removed / changed) structural diff between two
bibliographies, the human-readable explainer when `bib verify` says
drift detected. Auto-detects BibTeX vs CSL-JSON by content sniff; the
two flavors are interchangeable in either argument slot.
- Identity rule per the design: `citekey → DOI → arxiv_id → (title,
year)`, strict per-rung first-match-wins. Lists sorted by citekey for
deterministic output.
- Hand-rolled ANSI in `scitadel-export::diff_format` (no new color/diff
crates), TTY-detect via `std::io::IsTerminal`, `--no-color` for CI.
- `--format json` for structured CI consumption; serde-derived
round-trip.
- Exit codes `0` (no diff) / `1` (any diff) mirror `git diff`.
- MCP tool `bib_diff` mirroring the CLI for agent workflows.

## Test plan

- [x] 34 unit tests in `scitadel-export` (4 identity-rung tests, 6
field-change tests, sort determinism, JSON round-trip, mixed-format
zero-diff, format detection, ANSI color toggle)
- [x] 7 end-to-end CLI tests (identical files exit 0, drifted files exit
1 with the documented text report, `--no-color` strips ANSI, `--format
json` round-trips, BibTeX-vs-CSL-JSON of the same shortlist exit 0,
`--question-id` form against a seeded DB, missing-second-side error)
- [x] 6 inline CLI plumbing tests (TTY toggle, exit-code derivation,
format dispatch)
- [x] 335 workspace tests passing (excluding `scitadel-tui` per quality
gate)
- [x] `just lint` clean

## Closes

Closes #135 (after #180 sub-feature A and #151 sub-feature B landed)

[tape-exempt: text-only --help addition under bib diff; output is
colored text on stdout, not TUI rendering]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant