From 8b91902b2007d399d53cbe0f92e7dfab1706ce6a Mon Sep 17 00:00:00 2001 From: edoch Date: Thu, 19 Mar 2026 15:01:02 +0100 Subject: [PATCH 01/35] docs: add Step tree architecture design and implementation TODOs TODO-0119 defines the unified Step tree architecture that replaces the current three-layer output model (ProcessOutput + Result + CommandOutput). Reviewed through 6 rounds of design analysis. TODOs 0120-0131 break the implementation into phased work: infrastructure types, command conversions, and cleanup. TODOs 0132-0133 track deferred macro automation. Supersedes TODO-0110 (recursive output architecture). --- docs/spec/todos/TODO-0100.md | 2 +- docs/spec/todos/TODO-0110.md | 10 +- docs/spec/todos/TODO-0119.md | 582 +++++++++++++++++++++++++++++++++++ docs/spec/todos/TODO-0120.md | 61 ++++ docs/spec/todos/TODO-0121.md | 51 +++ docs/spec/todos/TODO-0122.md | 78 +++++ docs/spec/todos/TODO-0123.md | 44 +++ docs/spec/todos/TODO-0124.md | 61 ++++ docs/spec/todos/TODO-0125.md | 44 +++ docs/spec/todos/TODO-0126.md | 39 +++ docs/spec/todos/TODO-0127.md | 47 +++ docs/spec/todos/TODO-0128.md | 43 +++ docs/spec/todos/TODO-0129.md | 44 +++ docs/spec/todos/TODO-0130.md | 65 ++++ docs/spec/todos/TODO-0131.md | 65 ++++ docs/spec/todos/TODO-0132.md | 37 +++ docs/spec/todos/TODO-0133.md | 55 ++++ docs/spec/todos/index.md | 17 +- 18 files changed, 1341 insertions(+), 4 deletions(-) create mode 100644 docs/spec/todos/TODO-0119.md create mode 100644 docs/spec/todos/TODO-0120.md create mode 100644 docs/spec/todos/TODO-0121.md create mode 100644 docs/spec/todos/TODO-0122.md create mode 100644 docs/spec/todos/TODO-0123.md create mode 100644 docs/spec/todos/TODO-0124.md create mode 100644 docs/spec/todos/TODO-0125.md create mode 100644 docs/spec/todos/TODO-0126.md create mode 100644 docs/spec/todos/TODO-0127.md create mode 100644 docs/spec/todos/TODO-0128.md create mode 100644 docs/spec/todos/TODO-0129.md create mode 100644 docs/spec/todos/TODO-0130.md create mode 100644 docs/spec/todos/TODO-0131.md create mode 100644 docs/spec/todos/TODO-0132.md create mode 100644 docs/spec/todos/TODO-0133.md diff --git a/docs/spec/todos/TODO-0100.md b/docs/spec/todos/TODO-0100.md index ec4a1de..33efd53 100644 --- a/docs/spec/todos/TODO-0100.md +++ b/docs/spec/todos/TODO-0100.md @@ -4,7 +4,7 @@ title: "Redesign text output format for all commands" status: todo priority: high created: 2026-03-13 -depends_on: [110] +depends_on: [119] blocks: [] --- diff --git a/docs/spec/todos/TODO-0110.md b/docs/spec/todos/TODO-0110.md index 2b28108..5b98938 100644 --- a/docs/spec/todos/TODO-0110.md +++ b/docs/spec/todos/TODO-0110.md @@ -1,11 +1,13 @@ --- id: 110 title: "Recursive output architecture — nested process steps" -status: todo +status: done priority: high created: 2026-03-14 +completed: 2026-03-18 depends_on: [99] blocks: [100] +superseded_by: 119 --- # TODO-0110: Recursive output architecture — nested process steps @@ -224,7 +226,11 @@ Compact JSON remains result-only (existing behavior). Auto steps are not shown. The tree structure is self-documenting: `auto_build.process.auto_update` is present if and only if update ran inside build inside search. -## Implementation scope +## Resolution + +Superseded by [TODO-0119](TODO-0119.md). TODO-0110 proposed composing existing `*CommandOutput` structs for nesting. TODO-0119 replaces the entire output model with a unified `Step` tree where both leaf steps and commands are the same node type, with an `Outcome` enum and recursive rendering. + +## Original Implementation scope ### Pipeline framework (`src/pipeline/mod.rs`) diff --git a/docs/spec/todos/TODO-0119.md b/docs/spec/todos/TODO-0119.md new file mode 100644 index 0000000..7e2ecb6 --- /dev/null +++ b/docs/spec/todos/TODO-0119.md @@ -0,0 +1,582 @@ +--- +id: 119 +title: "Unified Step tree architecture — replace pipeline/command output split" +status: todo +priority: high +created: 2026-03-18 +depends_on: [] +blocks: [100, 101] +supersedes: [110] +--- + +# TODO-0119: Unified Step tree architecture — replace pipeline/command output split + +## Summary + +Replace the current three-layer output model (`ProcessOutput` structs + `*Result` structs + `*CommandOutput` wrappers) and the `pipeline/` step wrapper modules with a single recursive `Step` tree, generic over the outcome type. Both leaf steps and commands are `Step` nodes. Commands are steps with substeps. + +Verbose/compact is a **struct choice**: `Step` for verbose, `Step` for compact. Both outcome types implement `Render` (produces `Vec`) and `Serialize` (produces JSON). Formatters are shared functions that consume blocks — adding a new output format means writing one function, not touching any command. + +## Problem + +The current architecture has three layers per command (e.g., `BuildProcessOutput` + `BuildResult` + `BuildCommandOutput`), 12 pipeline wrapper modules that add boilerplate around core functions, and per-command manual `has_failed_step()` / `format_text()` implementations that enumerate every field. Adding a step to a command requires touching the ProcessOutput struct, error-handling early returns, format_text branches, and has_failed_step. Build has ~10 early-return blocks each manually constructing the full output struct with `Skipped` for remaining steps. + +Additionally, per-command `format_text` implementations are full of string-building boilerplate and directly format child structures instead of delegating. Adding a new output format requires writing rendering code in every command. + +## Design + +### Core types + +`Step` is generic over the outcome type. Commands build `Step` (full data). For compact mode, the tree is converted to `Step` via `to_compact()`. + +```rust +struct Step { + substeps: Vec>, + outcome: StepOutcome, +} + +enum StepOutcome { + Complete { result: Result, elapsed_ms: u64 }, + Skipped, +} + +struct StepError { + kind: ErrorKind, // User | Application + message: String, +} + +type FullStep = Step; +type CompactStep = Step; +``` + +Conversion from full to compact is recursive: + +```rust +impl Step { + fn to_compact(&self) -> Step { + let compact_outcome = match &self.outcome { + StepOutcome::Complete { result: Ok(outcome), elapsed_ms } => { + // Pass substeps so command outcomes can read substep data for summaries + let compact = outcome.to_compact(&self.substeps); + StepOutcome::Complete { result: Ok(compact), elapsed_ms: *elapsed_ms } + } + StepOutcome::Complete { result: Err(e), elapsed_ms } => { + StepOutcome::Complete { result: Err(e.clone()), elapsed_ms: *elapsed_ms } + } + StepOutcome::Skipped => StepOutcome::Skipped, + }; + Step { + substeps: self.substeps.iter().map(|s| s.to_compact()).collect(), + outcome: compact_outcome, + } + } +} +``` + +Both `Outcome` and `CompactOutcome` implement `Render + Serialize`. The same struct is used for text rendering AND JSON serialization — no separate paths. + +### Outcome enums + +`Outcome` variants use named structs. `CompactOutcome` mirrors the structure with compact counterparts. Every type follows the same pattern: full struct + compact struct + `From` impl. No exceptions. + +```rust +enum Outcome { + // Leaf steps + Scan(ScanOutcome), + Infer(InferOutcome), + ReadConfig(ReadConfigOutcome), + WriteConfig(WriteConfigOutcome), + Validate(ValidateOutcome), + Classify(ClassifyOutcome), + LoadModel(LoadModelOutcome), + EmbedFiles(EmbedFilesOutcome), + EmbedQuery(EmbedQueryOutcome), + ReadIndex(ReadIndexOutcome), + ReadIndexMetadata(ReadIndexMetadataOutcome), + WriteIndex(WriteIndexOutcome), + ExecuteSearch(ExecuteSearchOutcome), + DeleteIndex(DeleteIndexOutcome), + MutateConfig(MutateConfigOutcome), + CheckConfigChanged(CheckConfigChangedOutcome), + ReadChunkText(ReadChunkTextOutcome), + + // Command-level outcomes (summaries of substeps) + Init(Box), + Update(Box), + Check(Box), + Build(Box), + Search(Box), + Info(Box), + Clean(CleanOutcome), +} + +enum CompactOutcome { + // Leaf steps — compact counterparts + Scan(ScanOutcomeCompact), + Infer(InferOutcomeCompact), + ReadConfig(ReadConfigOutcomeCompact), + WriteConfig(WriteConfigOutcomeCompact), + Validate(ValidateOutcomeCompact), + Classify(ClassifyOutcomeCompact), + LoadModel(LoadModelOutcomeCompact), + EmbedFiles(EmbedFilesOutcomeCompact), + EmbedQuery(EmbedQueryOutcomeCompact), + ReadIndex(ReadIndexOutcomeCompact), + ReadIndexMetadata(ReadIndexMetadataOutcomeCompact), + WriteIndex(WriteIndexOutcomeCompact), + ExecuteSearch(ExecuteSearchOutcomeCompact), + DeleteIndex(DeleteIndexOutcomeCompact), + MutateConfig(MutateConfigOutcomeCompact), + CheckConfigChanged(CheckConfigChangedOutcomeCompact), + ReadChunkText(ReadChunkTextOutcomeCompact), + + // Command-level outcomes — compact versions + Init(Box), + Update(Box), + Check(Box), + Build(Box), + Search(Box), + Info(Box), + Clean(CleanOutcomeCompact), +} +``` + +Command-level outcomes are `Box`ed because they carry `Vec` payloads. + +For leaf steps where full and compact are identical, the compact struct can be a type alias or a trivial wrapper with `From` that copies all fields. The pattern is the same regardless — every Outcome variant has a corresponding CompactOutcome variant. + +### Compact rendering: leaf steps are silent + +In compact mode, **leaf step outcomes return empty vecs** (no blocks). Only command-level outcomes render their summaries and tables. This keeps compact output clean (3-5 lines) matching current behavior. + +Example — `mdvs search` with auto-build, compact: + +``` +Built index — 43 files, 142 chunks +Searched "rust patterns" — 5 hits + +╭───┬──────────────────────┬───────╮ +│ 1 │ "notes/ownership.md" │ 0.872 │ +│ 2 │ "notes/lifetimes.md" │ 0.814 │ +╰───┴──────────────────────┴───────╯ +``` + +The tree: +``` +Step(SearchCompact) → renders summary + table +├── Step(BuildCompact) → renders "Built index — 43 files, 142 chunks" +│ ├── Step(ScanCompact) → renders Empty (leaf) +│ ├── Step(ValidateCompact)→ renders Empty (leaf) +│ └── ... → renders Empty (leaf) +├── Step(ReadIndexCompact) → renders Empty (leaf) +├── Step(LoadModelCompact) → renders Empty (leaf) +└── ... → renders Empty (leaf) +``` + +No summary strings needed on parent outcomes — the Build substep's compact outcome renders its own summary line. No data duplication between parent and child. + +Same command, verbose: + +``` +Scan: 43 files (15ms) +Validate: 43 files, 0 violations (8ms) +Classify: 43 to embed (1ms) +Load model: minishlab/potion-base-8M (350ms) +Embed: 43 files, 142 chunks (3200ms) +Write index: ok (45ms) +Built index — 43 files, 142 chunks + +Read index: 43 files, 142 chunks (8ms) +Load model: skipped (reused) +Embed query: ok (5ms) +Execute search: 5 hits (12ms) +Searched "rust patterns" — 5 hits + +╭───┬──────────────────────┬───────┬──────────────────╮ +│ 1 │ "notes/ownership.md" │ 0.872 │ lines 24-42: │ +│ │ │ │ Rust's owner... │ +│ 2 │ "notes/lifetimes.md" │ 0.814 │ lines 10-28: │ +│ │ │ │ Lifetimes... │ +╰───┴──────────────────────┴───────┴──────────────────╯ +5 hits | model: "minishlab/potion-base-8M" | limit: 10 +``` + +### Verbose/compact as struct choice + +No `verbose: bool` parameter anywhere in rendering or serialization. The decision happens once at the top level — full tree or compact tree. + +### Exit code logic + +Exit codes are determined on the **full tree** (`Step`), **before** rendering and `to_compact()`. The flow in main.rs: + +```rust +let step: Step = build::run(...).await; + +// 1. Compute exit codes on full tree +let failed = has_failed(&step); +let violations = has_violations(&step); + +// 2. Choose output format and render +let output = match (format, verbose) { + (Text, true) => format_text(&step.render()), + (Text, false) => format_text(&step.to_compact().render()), + (Json, true) => serde_json::to_string_pretty(&step).unwrap(), + (Json, false) => serde_json::to_string_pretty(&step.to_compact()).unwrap(), +}; +print!("{output}"); + +// 3. Exit +if failed { std::process::exit(2); } +if violations { std::process::exit(1); } +``` + +`has_failed()` and `has_violations()` are free functions on `Step`, never called on compact trees. `to_compact()` is called only in main.rs dispatch — commands always return full trees. + +### Key principles + +1. **Uniform node type**: `Step` — both leaf steps and commands. Leaves have `substeps: vec![]`. +2. **Outcome carries ALL data**: no separate `*Result` structs. Each Outcome variant contains everything needed for its rendering. +3. **Command outcomes are summaries**: aggregate substep data into a denormalized view. Small duplication is intentional. +4. **Verbose/compact is a struct choice**: `Step` vs `Step`. No `verbose` flag in Render or Serialize. +5. **Generic Step**: one type, two instantiations. `to_compact()` converts recursively. +6. **Leaf compact outcomes are silent**: leaf `CompactOutcome` variants return empty vecs (no blocks). Only command-level compact outcomes render summaries and tables. +7. **No summary string duplication**: compact command outcomes don't carry summary strings for nested commands. The nested command's own compact outcome renders its summary line. Data flows through the tree, not through parent fields. +8. **Consistent compact pattern**: every type has a compact counterpart. Full struct + compact struct + `From` impl. No exceptions, even when full and compact are identical. +9. **Block-based rendering**: data types produce `Vec`. Shared formatters convert blocks to text, markdown, etc. +10. **Self-rendering sub-structures**: every data type implements `Render`. Parents compose by collecting child blocks. +11. **Skipped steps are explicit**: remaining steps after failure get `StepOutcome::Skipped`. Steps skipped due to !verbose also get Skipped. +12. **Typed data flows outside the tree**: functional pipeline data (`ScannedFiles`, etc.) lives as local variables in `run()`. Outcome structs carry only aggregated/serializable data — never `ScannedFiles`, `MdvsToml`, or other functional pipeline types. +13. **Errors are structured**: `StepError { kind, message }`. Validation violations are successful outcomes (`Ok`). Build aborting on violations is an error (`Err`). Errors live in `StepOutcome`, never in `Outcome` variants. +14. **Pre-checks are steps**: config mutation, config change detection, dimension mismatch, index metadata read become leaf steps. + +### Rendering architecture + +Two layers: **Render** (what to show) and **Formatter** (how to show it). + +#### Block enum + +```rust +enum Block { + /// A single line of text. + Line(String), + /// A table with optional headers and styled rows. + Table { + headers: Option>, + rows: Vec>, + style: TableStyle, + }, + /// A labeled group of child blocks (for sections, grouping). + Section { + label: String, + children: Vec, + }, +} + +enum TableStyle { + /// No internal horizontal separators. For summary tables. + Compact, + /// Detail rows span all columns (via ColumnSpan). For per-item record tables. + Record { + detail_rows: Vec, + }, +} +``` + +The Render impl (on data types) decides the structure. The formatter handles mechanics (tabled for text, pipe tables for markdown). + +#### Render trait + +```rust +trait Render { + fn render(&self) -> Vec; +} +``` + +No parameters. The struct IS the decision. + +Leaf compact outcomes return empty vec (no blocks): +```rust +impl Render for ScanOutcomeCompact { + fn render(&self) -> Vec { vec![] } +} +``` + +Full leaf outcomes render one-liners (without timing — Step injects it): +```rust +impl Render for ScanOutcome { + fn render(&self) -> Vec { + vec![Block::Line(format!("Scan: {} files", self.files_found))] + } +} +``` + +Command outcomes render their summaries and tables: +```rust +impl Render for BuildOutcomeCompact { + fn render(&self) -> Vec { + vec![Block::Line(format!("Built index — {} files, {} chunks", + self.files_total, self.chunks_total))] + } +} +``` + +Compact parents collect child compact rows into unified tables where appropriate: +```rust +impl Render for CheckOutcomeCompact { + fn render(&self) -> Vec { + vec![ + Block::Line(format!("Checked {} files — {} violations", ...)), + Block::Table { + headers: Some(vec!["field".into(), "kind".into(), "files".into()]), + rows: self.violations.iter().map(|v| v.to_row()).collect(), + style: TableStyle::Compact, + }, + ] + } +} +``` + +#### Step renders recursively + +```rust +impl Render for Step { + fn render(&self) -> Vec { + let mut blocks = vec![]; + for substep in &self.substeps { + blocks.extend(substep.render()); + } + + // Leaf steps (no substeps): inject elapsed_ms into the first Block::Line + let outcome_blocks = self.outcome.render(); + if self.substeps.is_empty() { + if let Some(elapsed) = self.outcome.elapsed_ms() { + let mut injected = false; + for block in outcome_blocks { + if !injected { + if let Block::Line(text) = block { + blocks.push(Block::Line(format!("{text} ({elapsed}ms)"))); + injected = true; + continue; + } + } + blocks.push(block); + } + } else { + blocks.extend(outcome_blocks); + } + } else { + // Command steps: no timing injection, outcome renders as-is + blocks.extend(outcome_blocks); + } + + blocks + } +} + +impl StepOutcome { + /// Accessor for elapsed_ms (available on Complete, None on Skipped). + pub fn elapsed_ms(&self) -> Option { + match self { + Self::Complete { elapsed_ms, .. } => Some(*elapsed_ms), + Self::Skipped => None, + } + } +} + +impl Render for StepOutcome { + fn render(&self) -> Vec { + match self { + Self::Complete { result: Ok(outcome), .. } => outcome.render(), + Self::Complete { result: Err(e), .. } => { + vec![Block::Line(format!("Error: {}", e.message))] + } + Self::Skipped => vec![], + } + } +} +``` + +**Timing strategy**: `elapsed_ms` lives in `StepOutcome`, not in Outcome structs. Leaf step outcomes render one-liners without timing ("Scan: 43 files"). `Step::render()` injects timing into the first `Block::Line` for leaf steps ("Scan: 43 files (15ms)"). Command outcomes don't get timing injected — their timing is the sum of substeps. + +#### Shared formatters + +```rust +fn format_text(blocks: &[Block]) -> String { + // Line → print as-is + // Table(Compact) → tabled with style_compact() + // Table(Record) → tabled with style_record() + ColumnSpan on detail_rows + // Section → print label, indent children +} + +fn format_markdown(blocks: &[Block]) -> String { + // Line → plain text + // Table → | pipe | table | with --- separators + // Section → ## header + // Empty → skip +} +``` + +### JSON serialization + +`Step` hand-implements `Serialize` (does not derive) because `StepOutcome` requires custom serialization to flatten the `Result` nesting. Custom serialization on `StepOutcome`: + +```json +{ "status": "complete", "elapsed_ms": 15, "outcome": { "Scan": { "files_found": 43 } } } +{ "status": "failed", "elapsed_ms": 2, "error": { "kind": "user", "message": "..." } } +{ "status": "skipped" } +``` + +Outcome uses externally tagged serialization (serde default): `{ "Build": { ... } }`. Box is transparent to serde. + +Verbose JSON: serialize `Step` — full tree with all substeps and full outcomes. +Compact JSON: serialize `Step` — same tree structure, compact outcomes at each node. + +### Verbose flag affecting pipeline steps + +Sometimes verbose causes extra pipeline work (e.g., reading chunk text from disk in search). This is a **pipeline decision**, not a rendering decision: + +```rust +let chunk_text_step = if verbose { + run_read_chunk_text(&hits, path) +} else { + Step { outcome: StepOutcome::Skipped, substeps: vec![] } +}; +substeps.push(chunk_text_step); +``` + +### Validation violations + +Validation finding violations is a **successful outcome** for the validate step. `Outcome::Validate(ValidateOutcome { violations: vec![...] })` is `Ok(...)`. + +When build finds violations, it aborts. Build's outcome is `Err(StepError { kind: User, message: "3 violations found" })`. The violation detail is in the Validate substep's successful outcome. + +### Command return type + +Each command's `run()` returns `Step`: + +```rust +pub fn run(path: &Path, ...) -> Step { + let mut substeps = Vec::new(); + // ... run steps, push to substeps ... + Step { + substeps, + outcome: StepOutcome::Complete { + result: Ok(Outcome::Build(Box::new(BuildOutcome { ... }))), + elapsed_ms: total_elapsed, + }, + } +} +``` + +### Nesting + +When search auto-builds, build returns its `Step`. Search includes it as a substep. The build step renders identically whether standalone or nested inside search — no context metadata needed. + +### What gets deleted + +- All `pipeline/*.rs` step wrapper modules and their `*Output` structs +- All per-command `*ProcessOutput`, `*CommandOutput`, `*Result` structs +- `StepOutput` trait, `CommandOutput` trait +- Per-command `has_failed_step()`, `format_text()`, `format_json()` implementations +- `format_json_compact` helper + +### What gets added + +- `Step`, `StepOutcome`, `StepError` types (generic) +- `Outcome` enum with named outcome structs per step/command +- `CompactOutcome` enum with compact counterpart structs (consistent pattern for all variants) +- `Block` enum, `TableStyle` enum, `Render` trait +- `Render` implementations on all outcome structs (full and compact) and sub-structures +- Shared formatters: `format_text()`, `format_markdown()` +- Recursive `has_failed`, `has_violations` on `Step` +- `#[cfg(test)]` convenience methods for test ergonomics + +### What stays + +- Core functions (`ScannedFiles::scan()`, `check::validate()`, `classify_files()`, etc.) +- Shared sub-types (`FieldViolation`, `SearchHit`, `BuildFileDetail`, `DiscoveredField`, etc.) — now with compact counterparts and `Render` impls +- Table style helpers in `src/table.rs` (used by `format_text` formatter) + +### Test helpers + +Test-only convenience methods, compiled out of production builds: + +```rust +#[cfg(test)] +impl Step { + pub fn unwrap_build(&self) -> &BuildOutcome { ... } + pub fn unwrap_search(&self) -> &SearchOutcome { ... } + pub fn unwrap_error(&self) -> &StepError { ... } + pub fn is_skipped(&self) -> bool { ... } + pub fn has_failed(&self) -> bool { ... } + // one unwrap_* per Outcome variant +} +``` + +### Exit codes + +```rust +fn has_failed(step: &Step) -> bool { + step.substeps.iter().any(|s| has_failed(s)) + || matches!(step.outcome, StepOutcome::Complete { result: Err(_), .. }) +} + +fn has_violations(step: &Step) -> bool { + step.substeps.iter().any(|s| has_violations(s)) + || step.outcome.contains_violations() +} + +impl Outcome { + fn contains_violations(&self) -> bool { + match self { + Outcome::Validate(v) => !v.violations.is_empty(), + Outcome::Check(c) => !c.violations.is_empty(), + _ => false, + } + } +} +``` + +- Exit 0: success +- Exit 1: validation violations (check/build) +- Exit 2: user error +- Exit 3: application error + +## Interaction with other TODOs + +- **Supersedes TODO-0110**: same problem, different solution. +- **Blocks TODO-0100**: text output redesign becomes about designing the specific blocks each command produces. +- **Blocks TODO-0101**: markdown format = one new formatter function. +- **Affects TODO-0088**: exit code logic via recursive Step tree inspection. + +## Follow-up TODOs (to be created after implementation) + +- **Macro for compact struct generation**: use `crabtime` to auto-generate `*Compact` structs and `From` impls from annotated full structs. Deferred until the pattern is stable. +- **Macro for step pipeline boilerplate**: the early-return pattern in `run()` (call step → push substep → check error → push Skipped for remaining → return) repeats ~10 times in build. A declarative macro could reduce each step to one line. Deferred until the manual pattern is proven stable. + +## Resolved questions + +- **Leaf step timing**: `Step::render()` injects `elapsed_ms` into the first `Block::Line` of leaf steps via `StepOutcome::elapsed_ms()` accessor. Outcome structs don't carry timing. The Render trait stays parameterless. +- **Block::Empty**: removed from the enum. Leaf compact outcomes return `vec![]` (empty vec). Parents extend with empty vec — no-op. No placeholder block needed. +- **Serde on Step**: `Step` hand-implements `Serialize` (not derive) because `StepOutcome` has custom serialization. The custom impl delegates field-by-field. +- **Skipped step boilerplate**: same volume as today, different shape. Accept for now, macro deferred as follow-up TODO. + +## Migration path + +Convert in reverse dependency order (leaves first): + +1. **Infrastructure**: `Step`, `StepOutcome`, `Outcome`, `CompactOutcome`, `Block`, `Render`, formatters +2. **Independent commands**: clean, info, check, init (no command-to-command deps) +3. **update** (called by build) +4. **build + search together** (tight coupling via auto-build/auto-update nesting) + +## Files affected + +- `src/step.rs` (new) — `Step`, `StepOutcome`, `StepError` +- `src/outcome/` (new directory) — `Outcome`, `CompactOutcome` enums + all named outcome structs + compact counterparts +- `src/block.rs` (new) — `Block`, `TableStyle`, `Render` trait +- `src/render.rs` (new) — shared formatters (`format_text`, `format_markdown`) +- `src/pipeline/*.rs` — delete all step wrapper modules +- `src/cmd/*.rs` — rewrite to build `Step` trees +- `src/output.rs` — simplify (remove `CommandOutput` trait; shared sub-types get compact counterparts) +- `src/main.rs` — dispatch via `Step` + exit code logic + `to_compact()` choice diff --git a/docs/spec/todos/TODO-0120.md b/docs/spec/todos/TODO-0120.md new file mode 100644 index 0000000..4436e64 --- /dev/null +++ b/docs/spec/todos/TODO-0120.md @@ -0,0 +1,61 @@ +--- +id: 120 +title: "Step tree: core types — Step, StepOutcome, StepError" +status: todo +priority: high +created: 2026-03-19 +depends_on: [119] +blocks: [121, 122, 125, 126, 127, 128, 129, 130, 131] +--- + +# TODO-0120: Step tree: core types — Step, StepOutcome, StepError + +## Summary + +Create the foundational generic `Step` type and its supporting types. This is the first piece of the TODO-0119 architecture — everything else builds on it. + +## Details + +Create `src/step.rs` with: + +### Types + +- `Step` — generic tree node with `substeps: Vec>` and `outcome: StepOutcome` +- `StepOutcome` — enum: `Complete { result: Result, elapsed_ms: u64 }` | `Skipped` +- `StepError` — struct: `kind: ErrorKind` + `message: String` +- `ErrorKind` — enum: `User` | `Application` +- Type aliases: `type FullStep = Step`, `type CompactStep = Step` + +### Methods + +- `StepOutcome::elapsed_ms(&self) -> Option` — accessor +- `Step::to_compact(&self) -> Step` — recursive conversion (calls `outcome.to_compact(&self.substeps)`) + +### Free functions + +- `has_failed(step: &Step) -> bool` — recursive, checks for `Err` in any node +- `has_violations(step: &Step) -> bool` — recursive, delegates to `Outcome::contains_violations()` + +### Test helpers (`#[cfg(test)]`) + +- `Step::unwrap_build() -> &BuildOutcome` +- `Step::unwrap_search() -> &SearchOutcome` +- `Step::unwrap_check() -> &CheckOutcome` +- `Step::unwrap_init() -> &InitOutcome` +- `Step::unwrap_update() -> &UpdateOutcome` +- `Step::unwrap_info() -> &InfoOutcome` +- `Step::unwrap_clean() -> &CleanOutcome` +- `Step::unwrap_error() -> &StepError` +- `Step::is_skipped() -> bool` +- One `unwrap_*` per leaf Outcome variant + +### Notes + +- `to_compact()` requires `Outcome` and `CompactOutcome` to exist (TODO-0122), but the method signature and stub can be written first. +- `StepError` must `derive(Clone)` for the `Err` branch of `to_compact()`. +- `Step` does NOT derive `Serialize` — hand-impl comes in TODO-0124. + +## Files + +- `src/step.rs` (new) +- `src/lib.rs` (add `pub mod step`) diff --git a/docs/spec/todos/TODO-0121.md b/docs/spec/todos/TODO-0121.md new file mode 100644 index 0000000..cf0b491 --- /dev/null +++ b/docs/spec/todos/TODO-0121.md @@ -0,0 +1,51 @@ +--- +id: 121 +title: "Step tree: Block enum and Render trait" +status: todo +priority: high +created: 2026-03-19 +depends_on: [120] +blocks: [123, 125, 126, 127, 128, 129, 130, 131] +--- + +# TODO-0121: Step tree: Block enum and Render trait + +## Summary + +Create the rendering primitive types and the `Render` trait. This is the abstraction layer between data and formatters. + +## Details + +Create `src/block.rs` with: + +### Types + +- `Block` enum: `Line(String)` | `Table { headers, rows, style }` | `Section { label, children }` +- `TableStyle` enum: `Compact` | `Record { detail_rows: Vec }` + +### Render trait + +```rust +pub trait Render { + fn render(&self) -> Vec; +} +``` + +No parameters. The struct IS the verbose/compact decision. + +### Render impls on Step types + +- `impl Render for Step` — recursive: render substeps, then render own outcome. For leaf steps (no substeps), inject `elapsed_ms` into the first `Block::Line`. +- `impl Render for StepOutcome` — match on Complete(Ok) → delegate to outcome, Complete(Err) → error line, Skipped → empty vec. + +### Notes + +- The timing injection logic: only leaf steps (substeps.is_empty()) get elapsed_ms appended to their first Block::Line. Command outcomes don't get timing. +- Block derives `Debug`, `Clone`. +- TableStyle derives `Debug`, `Clone`. + +## Files + +- `src/block.rs` (new) +- `src/step.rs` (add Render impls for Step and StepOutcome) +- `src/lib.rs` (add `pub mod block`) diff --git a/docs/spec/todos/TODO-0122.md b/docs/spec/todos/TODO-0122.md new file mode 100644 index 0000000..73984e2 --- /dev/null +++ b/docs/spec/todos/TODO-0122.md @@ -0,0 +1,78 @@ +--- +id: 122 +title: "Step tree: Outcome enums and all outcome structs" +status: todo +priority: high +created: 2026-03-19 +depends_on: [120, 121] +blocks: [124, 125, 126, 127, 128, 129, 130, 131] +--- + +# TODO-0122: Step tree: Outcome enums and all outcome structs + +## Summary + +Create the `Outcome` and `CompactOutcome` enums with all named outcome structs, compact counterparts, `From` impls, and `Render` impls. This is the largest infrastructure piece. + +## Details + +Create `src/outcome/` directory with `mod.rs` and per-step files. + +### Outcome enum (24 variants) + +Leaf steps: +- `Scan(ScanOutcome)`, `Infer(InferOutcome)`, `ReadConfig(ReadConfigOutcome)`, `WriteConfig(WriteConfigOutcome)`, `Validate(ValidateOutcome)`, `Classify(ClassifyOutcome)`, `LoadModel(LoadModelOutcome)`, `EmbedFiles(EmbedFilesOutcome)`, `EmbedQuery(EmbedQueryOutcome)`, `ReadIndex(ReadIndexOutcome)`, `ReadIndexMetadata(ReadIndexMetadataOutcome)`, `WriteIndex(WriteIndexOutcome)`, `ExecuteSearch(ExecuteSearchOutcome)`, `DeleteIndex(DeleteIndexOutcome)`, `MutateConfig(MutateConfigOutcome)`, `CheckConfigChanged(CheckConfigChangedOutcome)`, `ReadChunkText(ReadChunkTextOutcome)` + +Command-level (Box'd): +- `Init(Box)`, `Update(Box)`, `Check(Box)`, `Build(Box)`, `Search(Box)`, `Info(Box)`, `Clean(CleanOutcome)` + +### CompactOutcome enum (mirrors Outcome) + +Same 24 variants with `*Compact` inner types. For leaf steps where full=compact, use the same struct or a type alias. + +### Outcome structs (full + compact pairs) + +Each pair: `derive(Debug, Serialize)` on both, `From<&Full> for Compact` impl. ~34 `From` impls total (24 outcome pairs + ~10 sub-structure pairs). + +### Sub-structure compact counterparts + +- `FieldViolationCompact` (field + kind + count, no file paths) +- `SearchHitCompact` (filename + score, no chunk_text/lines) +- `NewFieldCompact` (name + count, no file list) +- `BuildFileDetailCompact` (filename + chunks — may be identical to full) +- `DiscoveredFieldCompact` (name + type + count, no globs) +- `ChangedFieldCompact` (name + change labels, no old/new values) +- `RemovedFieldCompact` (name, no globs) + +### Render impls + +- Full leaf outcomes: render one-liner (without timing — Step injects it) +- Compact leaf outcomes: render empty vec (silent) +- Full command outcomes: render summary + detail tables +- Compact command outcomes: render summary + compact tables + +### Key methods + +- `Outcome::to_compact(&self, substeps: &[Step]) -> CompactOutcome` — ~50-line match +- `Outcome::contains_violations(&self) -> bool` — for exit code logic + +### Notes + +- Outcome structs carry only aggregated/serializable data — never ScannedFiles, MdvsToml, or other functional pipeline types. +- Outcome and CompactOutcome derive `Serialize` with externally tagged (serde default). +- Command outcomes may need to read substep data during `to_compact()` for summaries. + +## Files + +- `src/outcome/mod.rs` (new) — Outcome, CompactOutcome enums, to_compact(), contains_violations() +- `src/outcome/scan.rs` (new) — ScanOutcome, ScanOutcomeCompact, Render impls +- `src/outcome/infer.rs` (new) +- `src/outcome/config.rs` (new) — ReadConfig, WriteConfig, MutateConfig, CheckConfigChanged +- `src/outcome/validate.rs` (new) +- `src/outcome/classify.rs` (new) +- `src/outcome/model.rs` (new) — LoadModel +- `src/outcome/embed.rs` (new) — EmbedFiles, EmbedQuery +- `src/outcome/index.rs` (new) — ReadIndex, ReadIndexMetadata, WriteIndex, DeleteIndex +- `src/outcome/search.rs` (new) — ExecuteSearch, ReadChunkText +- `src/outcome/commands.rs` (new) — Init, Update, Check, Build, Search, Info, Clean outcomes +- `src/lib.rs` (add `pub mod outcome`) diff --git a/docs/spec/todos/TODO-0123.md b/docs/spec/todos/TODO-0123.md new file mode 100644 index 0000000..453d4e0 --- /dev/null +++ b/docs/spec/todos/TODO-0123.md @@ -0,0 +1,44 @@ +--- +id: 123 +title: "Step tree: shared formatters (format_text, format_markdown)" +status: todo +priority: high +created: 2026-03-19 +depends_on: [121] +blocks: [125, 126, 127, 128, 129, 130, 131] +--- + +# TODO-0123: Step tree: shared formatters (format_text, format_markdown) + +## Summary + +Create the shared formatter functions that consume `Vec` and produce formatted output strings. These are the "how to show" layer — commands never contain format-specific code. + +## Details + +Create `src/render.rs` with: + +### format_text(blocks: &[Block]) -> String + +- `Block::Line(s)` → print as-is with newline +- `Block::Table { style: Compact, .. }` → tabled crate with `style_compact()` from `src/table.rs` +- `Block::Table { style: Record { detail_rows }, .. }` → tabled crate with `style_record()` + ColumnSpan on specified detail row indices +- `Block::Section { label, children }` → print label, indent each child block by 2 spaces + +### format_markdown(blocks: &[Block]) -> String + +- `Block::Line(s)` → print as-is with newline +- `Block::Table { .. }` → standard markdown pipe table with `|` columns and `---` header separators +- `Block::Section { label, children }` → `## label` header, then render children + +### Notes + +- Both formatters use the existing `style_compact()` and `style_record()` helpers from `src/table.rs`. +- `format_markdown` is needed for TODO-0101 but can be stubbed initially. +- Formatters must handle empty block vecs gracefully (return empty string). +- Section indentation: each line of each child block's rendered text gets 2-space indent. + +## Files + +- `src/render.rs` (new) +- `src/lib.rs` (add `pub mod render`) diff --git a/docs/spec/todos/TODO-0124.md b/docs/spec/todos/TODO-0124.md new file mode 100644 index 0000000..14a9393 --- /dev/null +++ b/docs/spec/todos/TODO-0124.md @@ -0,0 +1,61 @@ +--- +id: 124 +title: "Step tree: custom Serialize on Step and StepOutcome" +status: todo +priority: high +created: 2026-03-19 +depends_on: [120, 122] +blocks: [125, 126, 127, 128, 129, 130, 131] +--- + +# TODO-0124: Step tree: custom Serialize on Step and StepOutcome + +## Summary + +Hand-implement `Serialize` for `Step` and `StepOutcome` to produce clean JSON output. `Step` cannot derive `Serialize` because `StepOutcome` requires custom serialization to flatten the `Result` nesting. + +## Details + +### StepOutcome custom Serialize + +Flatten `Result` into a discriminated object: + +```json +{ "status": "complete", "elapsed_ms": 15, "outcome": { "Scan": { "files_found": 43 } } } +{ "status": "failed", "elapsed_ms": 2, "error": { "kind": "user", "message": "..." } } +{ "status": "skipped" } +``` + +Implementation uses `serialize_struct` with conditional fields based on the variant. + +### Step custom Serialize + +Hand-impl that delegates field-by-field: + +```rust +impl Serialize for Step { + fn serialize(&self, serializer: S) -> Result { + let mut state = serializer.serialize_struct("Step", 2)?; + state.serialize_field("substeps", &self.substeps)?; + state.serialize_field("outcome", &self.outcome)?; + state.end() + } +} +``` + +This works recursively: `Vec>` serializes each element via the same hand-impl. + +### Outcome and CompactOutcome + +These use `derive(Serialize)` with externally tagged (serde default). Each variant serializes as `{ "VariantName": { fields } }`. Box is transparent to serde. + +### Notes + +- `StepError` derives `Serialize` normally. +- `ErrorKind` derives `Serialize` with `#[serde(rename_all = "snake_case")]`. +- Verbose JSON: `serde_json::to_string_pretty(&step)` where step is `Step`. +- Compact JSON: `serde_json::to_string_pretty(&step.to_compact())` where result is `Step`. + +## Files + +- `src/step.rs` (add Serialize impls) diff --git a/docs/spec/todos/TODO-0125.md b/docs/spec/todos/TODO-0125.md new file mode 100644 index 0000000..dd2629f --- /dev/null +++ b/docs/spec/todos/TODO-0125.md @@ -0,0 +1,44 @@ +--- +id: 125 +title: "Step tree: convert clean command" +status: todo +priority: high +created: 2026-03-19 +depends_on: [120, 121, 122, 123, 124] +blocks: [129] +--- + +# TODO-0125: Step tree: convert clean command + +## Summary + +Convert the `clean` command to the new Step tree architecture. Simplest command — one leaf step (delete_index). Good first test of the full pipeline. + +## Details + +### Delete + +- `CleanProcessOutput`, `CleanCommandOutput`, `CleanResult` structs +- `CommandOutput` impl on `CleanResult` and `CleanCommandOutput` +- `has_failed_step()` on `CleanCommandOutput` + +### Rewrite + +- `clean::run()` returns `Step` instead of `CleanCommandOutput` +- Single substep: `DeleteIndex` step +- Command outcome: `Outcome::Clean(CleanOutcome { ... })` +- `CleanOutcome` and `CleanOutcomeCompact` already defined in TODO-0122 + +### Tests + +- Rewrite all tests in `src/cmd/clean.rs` to use `#[cfg(test)]` helpers (`unwrap_clean()`, `has_failed()`, `is_skipped()`) + +### Verification + +- `cargo test` — all clean tests pass +- `cargo clippy` + `cargo fmt` +- Manual: `cargo run -- clean example_kb` produces correct text and JSON output (both verbose and compact) + +## Files + +- `src/cmd/clean.rs` (rewrite run(), delete old structs, rewrite tests) diff --git a/docs/spec/todos/TODO-0126.md b/docs/spec/todos/TODO-0126.md new file mode 100644 index 0000000..308bd98 --- /dev/null +++ b/docs/spec/todos/TODO-0126.md @@ -0,0 +1,39 @@ +--- +id: 126 +title: "Step tree: convert info command" +status: todo +priority: high +created: 2026-03-19 +depends_on: [120, 121, 122, 123, 124] +blocks: [129] +--- + +# TODO-0126: Step tree: convert info command + +## Summary + +Convert the `info` command to the new Step tree architecture. Simple query/display command with 2-3 leaf steps. + +## Details + +### Delete + +- `InfoProcessOutput`, `InfoCommandOutput`, `InfoResult` structs +- `CommandOutput` impl on `InfoResult` and `InfoCommandOutput` + +### Rewrite + +- `info::run()` returns `Step` instead of `InfoCommandOutput` +- Substeps: ReadConfig, Scan (optional), ReadIndex (optional) +- Command outcome: `Outcome::Info(Box)` +- Define `InfoOutcome` fields: scan_glob, files_on_disk, fields (Vec), ignored_fields, index status +- Define `InfoOutcomeCompact` fields: files_on_disk, field_count, has_index + +### Tests + +- Rewrite tests to use `#[cfg(test)]` helpers + +## Files + +- `src/cmd/info.rs` (rewrite run(), delete old structs, rewrite tests) +- `src/outcome/commands.rs` (add InfoOutcome, InfoOutcomeCompact if not already defined) diff --git a/docs/spec/todos/TODO-0127.md b/docs/spec/todos/TODO-0127.md new file mode 100644 index 0000000..73dc5e8 --- /dev/null +++ b/docs/spec/todos/TODO-0127.md @@ -0,0 +1,47 @@ +--- +id: 127 +title: "Step tree: convert check command" +status: todo +priority: high +created: 2026-03-19 +depends_on: [120, 121, 122, 123, 124] +blocks: [129] +--- + +# TODO-0127: Step tree: convert check command + +## Summary + +Convert the `check` command to the new Step tree architecture. Medium complexity — has violations rendering, new fields, and optional auto-update nesting. + +## Details + +### Delete + +- `CheckProcessOutput`, `CheckCommandOutput`, `CheckResult` structs +- `CommandOutput` impl on `CheckResult` and `CheckCommandOutput` + +### Keep + +- `check::validate()` core function — stays as-is, both check and build call it + +### Rewrite + +- `check::run()` returns `Step` instead of `CheckCommandOutput` +- Substeps: ReadConfig, auto-update (nested `Step` from `update::run()` — only if update is already converted, otherwise stub), Scan, Validate +- Command outcome: `Outcome::Check(Box)` +- `CheckOutcome` carries `files_checked`, `violations: Vec`, `new_fields: Vec` +- `CheckOutcomeCompact` carries `files_checked`, `violations: Vec`, `new_fields: Vec` — self-sufficient for rendering since Validate substep is silent in compact + +### Render impls + +- `CheckOutcome::render()` — one-liner + violation record tables + new field tables +- `CheckOutcomeCompact::render()` — one-liner + compact violation table + compact new field table + +### Tests + +- Rewrite tests to use `#[cfg(test)]` helpers (`unwrap_check()`, `has_failed()`) + +## Files + +- `src/cmd/check.rs` (rewrite run(), delete old structs, keep validate(), rewrite tests) diff --git a/docs/spec/todos/TODO-0128.md b/docs/spec/todos/TODO-0128.md new file mode 100644 index 0000000..845ac1a --- /dev/null +++ b/docs/spec/todos/TODO-0128.md @@ -0,0 +1,43 @@ +--- +id: 128 +title: "Step tree: convert init command" +status: todo +priority: high +created: 2026-03-19 +depends_on: [120, 121, 122, 123, 124] +blocks: [129] +--- + +# TODO-0128: Step tree: convert init command + +## Summary + +Convert the `init` command to the new Step tree architecture. Simple pipeline — scan, infer, write_config. + +## Details + +### Delete + +- `InitProcessOutput`, `InitCommandOutput`, `InitResult` structs +- `CommandOutput` impl on `InitResult` and `InitCommandOutput` + +### Rewrite + +- `init::run()` returns `Step` instead of `InitCommandOutput` +- Substeps: Scan, Infer, WriteConfig (Skipped if dry_run) +- Command outcome: `Outcome::Init(Box)` +- `InitOutcome` carries `path`, `files_scanned`, `fields: Vec`, `dry_run` +- `InitOutcomeCompact` carries `path`, `files_scanned`, `field_count`, `dry_run` + +### Render impls + +- `InitOutcome::render()` — one-liner + per-field record tables with detail (allowed/required/nullable/hints) +- `InitOutcomeCompact::render()` — one-liner + compact field table + +### Tests + +- Rewrite tests to use `#[cfg(test)]` helpers (`unwrap_init()`, `has_failed()`) + +## Files + +- `src/cmd/init.rs` (rewrite run(), delete old structs, rewrite tests) diff --git a/docs/spec/todos/TODO-0129.md b/docs/spec/todos/TODO-0129.md new file mode 100644 index 0000000..0a001dd --- /dev/null +++ b/docs/spec/todos/TODO-0129.md @@ -0,0 +1,44 @@ +--- +id: 129 +title: "Step tree: convert update command" +status: todo +priority: high +created: 2026-03-19 +depends_on: [125, 126, 127, 128] +blocks: [130] +--- + +# TODO-0129: Step tree: convert update command + +## Summary + +Convert the `update` command to the new Step tree architecture. Must be done before build (which nests update as auto-update). + +## Details + +### Delete + +- `UpdateProcessOutput`, `UpdateCommandOutput`, `UpdateResult` structs +- `CommandOutput` impl on `UpdateResult` and `UpdateCommandOutput` + +### Rewrite + +- `update::run()` returns `Step` instead of `UpdateCommandOutput` +- Substeps: ReadConfig, Scan, Infer, WriteConfig (Skipped if dry_run or no changes) +- Field comparison logic stays inline in `run()` (not extracted as a step) +- Command outcome: `Outcome::Update(Box)` +- `UpdateOutcome` carries `files_scanned`, `added: Vec`, `changed: Vec`, `removed: Vec`, `unchanged: usize`, `dry_run` +- `UpdateOutcomeCompact` carries `files_scanned`, `added_count`, `changed_count`, `removed_count`, `unchanged`, `dry_run` + +### Render impls + +- `UpdateOutcome::render()` — one-liner + per-category tables (added, changed, removed) with detail rows +- `UpdateOutcomeCompact::render()` — one-liner + compact change summary tables + +### Tests + +- Rewrite tests to use `#[cfg(test)]` helpers (`unwrap_update()`, `has_failed()`) + +## Files + +- `src/cmd/update.rs` (rewrite run(), delete old structs, rewrite tests) diff --git a/docs/spec/todos/TODO-0130.md b/docs/spec/todos/TODO-0130.md new file mode 100644 index 0000000..7f9b840 --- /dev/null +++ b/docs/spec/todos/TODO-0130.md @@ -0,0 +1,65 @@ +--- +id: 130 +title: "Step tree: convert build + search commands" +status: todo +priority: high +created: 2026-03-19 +depends_on: [129] +blocks: [131] +--- + +# TODO-0130: Step tree: convert build + search commands + +## Summary + +Convert `build` and `search` commands together to the new Step tree architecture. These are tightly coupled — search nests build (auto-build), build nests update (auto-update). Most complex piece of the migration. + +## Details + +### Build command + +#### Delete +- `BuildProcessOutput`, `BuildCommandOutput`, `BuildResult` structs +- `CommandOutput` impl on `BuildResult` and `BuildCommandOutput` +- `has_violations()` on `BuildCommandOutput` + +#### Rewrite +- `build::run()` returns `Step` +- Substeps: ReadConfig, auto-update (nested Step from `update::run()`), MutateConfig, Scan, Validate, ReadIndexMetadata (new — split from classify), CheckConfigChanged (new), Classify, LoadModel, EmbedFiles, WriteIndex +- Violations: Validate substep succeeds with violations as Ok data. Build aborts with `Err(StepError { kind: User, message: "N violations found" })`. +- `BuildOutcome` carries full rebuild stats + per-file details +- `BuildOutcomeCompact` carries summary stats only + +#### New leaf steps +- `MutateConfig` — side-effect step: fills missing config sections, applies --set-* flags. Outcome: `MutateConfigOutcome { sections_added: usize }` +- `ReadIndexMetadata` — reads parquet metadata without full file index. Outcome: `ReadIndexMetadataOutcome { model, revision, chunk_size }` +- `CheckConfigChanged` — compares toml config vs parquet metadata. Outcome: `CheckConfigChangedOutcome { changed: bool }` + +### Search command + +#### Delete +- `SearchProcessOutput`, `SearchCommandOutput`, `SearchResult` structs +- `CommandOutput` impl on `SearchResult` and `SearchCommandOutput` + +#### Rewrite +- `search::run()` returns `Step` +- Substeps: ReadConfig, auto-build (nested Step from `build::run()`), ReadIndex, LoadModel, EmbedQuery, ExecuteSearch, ReadChunkText (Skipped if !verbose) +- `SearchOutcome` carries query, hits (Vec), model_name, limit +- `SearchOutcomeCompact` carries query, hits (Vec), model_name, limit + +### Tests + +- Rewrite all build and search tests to use `#[cfg(test)]` helpers +- End-to-end tests: init → build → search pipeline + +### Verification + +- `cargo test` — all 245+ tests pass +- `cargo clippy` + `cargo fmt` +- Manual: test against `example_kb/` with all output formats (text compact, text verbose, JSON compact, JSON verbose) + +## Files + +- `src/cmd/build.rs` (rewrite run(), delete old structs, rewrite tests) +- `src/cmd/search.rs` (rewrite run(), delete old structs, rewrite tests) +- `src/outcome/` (add MutateConfig, ReadIndexMetadata, CheckConfigChanged, ReadChunkText outcomes if not done in 0122) diff --git a/docs/spec/todos/TODO-0131.md b/docs/spec/todos/TODO-0131.md new file mode 100644 index 0000000..d07e800 --- /dev/null +++ b/docs/spec/todos/TODO-0131.md @@ -0,0 +1,65 @@ +--- +id: 131 +title: "Step tree: delete old pipeline, update main.rs, simplify output.rs" +status: todo +priority: high +created: 2026-03-19 +depends_on: [130] +blocks: [132, 133] +--- + +# TODO-0131: Step tree: delete old pipeline, update main.rs, simplify output.rs + +## Summary + +Final cleanup after all commands are converted. Delete the old pipeline modules, update main.rs to use the new Step-based dispatch, and simplify output.rs by removing the obsolete CommandOutput trait. + +## Details + +### Delete + +- All `src/pipeline/*.rs` step wrapper modules: + - `scan.rs`, `infer.rs`, `validate.rs`, `classify.rs`, `load_model.rs`, `embed.rs`, `read_config.rs`, `write_config.rs`, `read_index.rs`, `write_index.rs`, `execute_search.rs`, `delete_index.rs` +- `src/pipeline/mod.rs` — `ProcessingStepResult`, `ProcessingStep`, `StepOutput`, `ProcessingStepError`, `ErrorKind` (replaced by types in `src/step.rs`) + +### Simplify output.rs + +- Remove `CommandOutput` trait (`format_text`, `format_json`, `print`) +- Remove `OutputFormat` enum (moved to main.rs or render.rs) +- Remove `format_json_compact` helper +- Keep shared sub-types: `FieldViolation`, `ViolatingFile`, `FieldViolationCompact`, `SearchHit`, `SearchHitCompact`, `NewField`, `NewFieldCompact`, `DiscoveredField`, `DiscoveredFieldCompact`, `ChangedField`, `RemovedField`, `BuildFileDetail`, `FieldHint`, `format_file_count`, `format_size`, `field_hints`, `format_hints` + +### Update main.rs + +- All command match arms receive `Step` (not command-specific output types) +- Exit code logic: call `has_failed(&step)` and `has_violations(&step)` before rendering +- Output dispatch: + ```rust + match (format, verbose) { + (Text, true) => format_text(&step.render()), + (Text, false) => format_text(&step.to_compact().render()), + (Json, true) => serde_json::to_string_pretty(&step), + (Json, false) => serde_json::to_string_pretty(&step.to_compact()), + } + ``` +- Exit with appropriate code (0, 1, 2, 3) + +### Update lib.rs + +- Remove `pub mod pipeline` +- Add `pub mod step`, `pub mod block`, `pub mod outcome`, `pub mod render` (if not already) + +### Verification + +- `cargo test` — all tests pass +- `cargo clippy` — no warnings +- `cargo fmt` — formatted +- Manual end-to-end: test all 7 commands against `example_kb/` with all output formats +- Verify JSON output shape matches TODO-0119 spec + +## Files + +- `src/pipeline/` (delete entire directory) +- `src/output.rs` (simplify) +- `src/main.rs` (rewrite dispatch) +- `src/lib.rs` (update module declarations) diff --git a/docs/spec/todos/TODO-0132.md b/docs/spec/todos/TODO-0132.md new file mode 100644 index 0000000..5fc56b9 --- /dev/null +++ b/docs/spec/todos/TODO-0132.md @@ -0,0 +1,37 @@ +--- +id: 132 +title: "Macro for compact struct generation (crabtime)" +status: todo +priority: low +created: 2026-03-19 +depends_on: [131] +blocks: [] +--- + +# TODO-0132: Macro for compact struct generation (crabtime) + +## Summary + +Use `crabtime` to auto-generate `*Compact` structs and `From` impls from annotated full structs. Deferred until the manual pattern from TODO-0119 is proven stable. + +## Details + +After the Step tree architecture is fully implemented (TODO-0131 complete), the ~34 `From` impls and ~24 compact structs represent mechanical boilerplate. A `crabtime` macro could generate compact structs by stripping fields marked with a `#[compact_skip]` attribute and auto-generating the `From<&Full> for Compact` impl. + +### Scope + +- Annotate full outcome structs with `#[compact_skip]` on verbose-only fields +- Macro generates the compact struct (same fields minus skipped) +- Macro generates the `From<&Full> for Compact` impl +- `CompactOutcome` enum variants still written manually (small, ~24 lines) + +### Prerequisites + +- Pattern must be stable — no more changes to which fields are verbose-only +- `crabtime` crate added to dependencies +- All manual compact structs work correctly + +### Not in scope + +- Generating the `CompactOutcome` enum itself (too few variants to justify) +- Generating `Render` impls (too custom per type) diff --git a/docs/spec/todos/TODO-0133.md b/docs/spec/todos/TODO-0133.md new file mode 100644 index 0000000..9a97f23 --- /dev/null +++ b/docs/spec/todos/TODO-0133.md @@ -0,0 +1,55 @@ +--- +id: 133 +title: "Macro for step pipeline boilerplate (early-return pattern)" +status: todo +priority: low +created: 2026-03-19 +depends_on: [131] +blocks: [] +--- + +# TODO-0133: Macro for step pipeline boilerplate (early-return pattern) + +## Summary + +Create a declarative macro to reduce the repetitive early-return pattern in command `run()` functions. Deferred until the manual pattern from TODO-0119 is proven stable. + +## Details + +The pattern that repeats ~10 times in `build::run()`: + +```rust +let (result, data) = run_something(args); +substeps.push(Step { substeps: vec![], outcome: convert(result) }); +let data = match data { + Some(d) => d, + None => { + substeps.push(Step { substeps: vec![], outcome: StepOutcome::Skipped }); // remaining1 + substeps.push(Step { substeps: vec![], outcome: StepOutcome::Skipped }); // remaining2 + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { kind: ErrorKind::User, message: "...".into() }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + }; + } +}; +``` + +A macro could reduce each step to one line: + +```rust +let data = run_step!(substeps, start, run_something(args), ["remaining1", "remaining2"]); +``` + +### Prerequisites + +- All commands fully converted to Step tree (TODO-0131 complete) +- The pattern is identical across all commands (confirm during implementation) +- The Skipped step count per early-return is predictable + +### Not in scope + +- Async step handling (may need different macro or helper function) +- Macro for the `to_compact()` match statement (separate concern) diff --git a/docs/spec/todos/index.md b/docs/spec/todos/index.md index 30ef699..096c2b6 100644 --- a/docs/spec/todos/index.md +++ b/docs/spec/todos/index.md @@ -111,7 +111,7 @@ | [0107](TODO-0107.md) | Pre-commit hook for mdvs check | todo | medium | 2026-03-14 | | [0108](TODO-0108.md) | --set-revision with empty string or "None" should clear the revision | done | low | 2026-03-14 | | [0109](TODO-0109.md) | Clean up DataFusion error messages in --where | todo | low | 2026-03-14 | -| [0110](TODO-0110.md) | Recursive output architecture — nested process steps | todo | high | 2026-03-14 | +| [0110](TODO-0110.md) | Recursive output architecture — nested process steps | done (superseded by 0119) | high | 2026-03-14 | | [0111](TODO-0111.md) | Reject unknown fields in mdvs.toml with deny_unknown_fields | done | high | 2026-03-14 | | [0112](TODO-0112.md) | Document JSON output format in the mdBook | todo | medium | 2026-03-15 | | [0113](TODO-0113.md) | Progress bar for model download and embedding | todo | medium | 2026-03-16 | @@ -120,3 +120,18 @@ | [0116](TODO-0116.md) | Trim DataFusion default features to reduce binary size | todo | medium | 2026-03-17 | | [0117](TODO-0117.md) | Fix null values skipping Disallowed and NullNotAllowed checks | done | high | 2026-03-17 | | [0118](TODO-0118.md) | Rework README and book intro to show directory-aware schema | todo | high | 2026-03-17 | +| [0119](TODO-0119.md) | Unified Step tree architecture — replace pipeline/command output split | todo | high | 2026-03-18 | +| [0120](TODO-0120.md) | Step tree: core types — Step, StepOutcome, StepError | todo | high | 2026-03-19 | +| [0121](TODO-0121.md) | Step tree: Block enum and Render trait | todo | high | 2026-03-19 | +| [0122](TODO-0122.md) | Step tree: Outcome enums and all outcome structs | todo | high | 2026-03-19 | +| [0123](TODO-0123.md) | Step tree: shared formatters (format_text, format_markdown) | todo | high | 2026-03-19 | +| [0124](TODO-0124.md) | Step tree: custom Serialize on Step and StepOutcome | todo | high | 2026-03-19 | +| [0125](TODO-0125.md) | Step tree: convert clean command | todo | high | 2026-03-19 | +| [0126](TODO-0126.md) | Step tree: convert info command | todo | high | 2026-03-19 | +| [0127](TODO-0127.md) | Step tree: convert check command | todo | high | 2026-03-19 | +| [0128](TODO-0128.md) | Step tree: convert init command | todo | high | 2026-03-19 | +| [0129](TODO-0129.md) | Step tree: convert update command | todo | high | 2026-03-19 | +| [0130](TODO-0130.md) | Step tree: convert build + search commands | todo | high | 2026-03-19 | +| [0131](TODO-0131.md) | Step tree: delete old pipeline, update main.rs, simplify output.rs | todo | high | 2026-03-19 | +| [0132](TODO-0132.md) | Macro for compact struct generation (crabtime) | todo | low | 2026-03-19 | +| [0133](TODO-0133.md) | Macro for step pipeline boilerplate (early-return pattern) | todo | low | 2026-03-19 | From 78366dc7e8892dbfe3f40822919de4b2daa2b804 Mon Sep 17 00:00:00 2001 From: edoch Date: Thu, 19 Mar 2026 15:13:46 +0100 Subject: [PATCH 02/35] docs: add incremental checklists to Step tree implementation TODOs TODOs 0122, 0125-0131 now have checklists tracking incremental progress. Outcome variants are added per-command conversion phase. main.rs dispatch updated per-command. Pipeline deletion tracked per-module. --- docs/spec/todos/TODO-0122.md | 63 ++++++++++++++++++------- docs/spec/todos/TODO-0124.md | 10 ++-- docs/spec/todos/TODO-0125.md | 38 ++++++++------- docs/spec/todos/TODO-0126.md | 33 +++++++------ docs/spec/todos/TODO-0127.md | 54 +++++++++++---------- docs/spec/todos/TODO-0128.md | 38 ++++++++------- docs/spec/todos/TODO-0129.md | 40 +++++++++------- docs/spec/todos/TODO-0130.md | 87 ++++++++++++++++++---------------- docs/spec/todos/TODO-0131.md | 91 ++++++++++++++++++------------------ 9 files changed, 256 insertions(+), 198 deletions(-) diff --git a/docs/spec/todos/TODO-0122.md b/docs/spec/todos/TODO-0122.md index 73984e2..3fa8f1e 100644 --- a/docs/spec/todos/TODO-0122.md +++ b/docs/spec/todos/TODO-0122.md @@ -12,24 +12,53 @@ blocks: [124, 125, 126, 127, 128, 129, 130, 131] ## Summary -Create the `Outcome` and `CompactOutcome` enums with all named outcome structs, compact counterparts, `From` impls, and `Render` impls. This is the largest infrastructure piece. +Create the `Outcome` and `CompactOutcome` enums with all named outcome structs, compact counterparts, `From` impls, and `Render` impls. Built incrementally — add variants as each command is converted. + +## Incremental checklist + +The enums start with a `_ => todo!()` catch-all arm in `to_compact()` and `contains_violations()`. Variants are added as commands are converted. Each checkbox corresponds to a command conversion TODO. + +### Phase 1: minimal (for clean — TODO-0125) +- [ ] Create `src/outcome/mod.rs` with `Outcome` and `CompactOutcome` enums (initially: `DeleteIndex` + `Clean` variants only, `_ => todo!()` for the rest) +- [ ] `Outcome::to_compact()` and `Outcome::contains_violations()` with catch-all +- [ ] `DeleteIndexOutcome` / `DeleteIndexOutcomeCompact` + `From` + `Render` +- [ ] `CleanOutcome` / `CleanOutcomeCompact` + `From` + `Render` + +### Phase 2: info (TODO-0126) +- [ ] Add `ReadConfig`, `Scan`, `ReadIndex`, `Info` variants to both enums +- [ ] `ReadConfigOutcome` / `ReadConfigOutcomeCompact` + `From` + `Render` +- [ ] `ScanOutcome` / `ScanOutcomeCompact` + `From` + `Render` +- [ ] `ReadIndexOutcome` / `ReadIndexOutcomeCompact` + `From` + `Render` +- [ ] `InfoOutcome` / `InfoOutcomeCompact` + `From` + `Render` + +### Phase 3: check (TODO-0127) +- [ ] Add `Validate`, `Check` variants to both enums +- [ ] `ValidateOutcome` / `ValidateOutcomeCompact` + `From` + `Render` +- [ ] `CheckOutcome` / `CheckOutcomeCompact` + `From` + `Render` +- [ ] `FieldViolationCompact`, `NewFieldCompact` sub-structure compacts + `From` +- [ ] Update `Outcome::contains_violations()` for Validate and Check variants + +### Phase 4: init (TODO-0128) +- [ ] Add `Infer`, `WriteConfig`, `Init` variants to both enums +- [ ] `InferOutcome` / `InferOutcomeCompact` + `From` + `Render` +- [ ] `WriteConfigOutcome` / `WriteConfigOutcomeCompact` + `From` + `Render` +- [ ] `InitOutcome` / `InitOutcomeCompact` + `From` + `Render` +- [ ] `DiscoveredFieldCompact` sub-structure compact + `From` + +### Phase 5: update (TODO-0129) +- [ ] Add `Update` variant to both enums +- [ ] `UpdateOutcome` / `UpdateOutcomeCompact` + `From` + `Render` +- [ ] `ChangedFieldCompact`, `RemovedFieldCompact` sub-structure compacts + `From` + +### Phase 6: build + search (TODO-0130) +- [ ] Add `MutateConfig`, `ReadIndexMetadata`, `CheckConfigChanged`, `Classify`, `LoadModel`, `EmbedFiles`, `WriteIndex`, `Build` variants to both enums +- [ ] Add `EmbedQuery`, `ExecuteSearch`, `ReadChunkText`, `Search` variants to both enums +- [ ] All remaining outcome structs + compact counterparts + `From` + `Render` +- [ ] `SearchHitCompact`, `BuildFileDetailCompact` sub-structure compacts + `From` +- [ ] Remove `_ => todo!()` catch-all arms — all variants now covered ## Details -Create `src/outcome/` directory with `mod.rs` and per-step files. - -### Outcome enum (24 variants) - -Leaf steps: -- `Scan(ScanOutcome)`, `Infer(InferOutcome)`, `ReadConfig(ReadConfigOutcome)`, `WriteConfig(WriteConfigOutcome)`, `Validate(ValidateOutcome)`, `Classify(ClassifyOutcome)`, `LoadModel(LoadModelOutcome)`, `EmbedFiles(EmbedFilesOutcome)`, `EmbedQuery(EmbedQueryOutcome)`, `ReadIndex(ReadIndexOutcome)`, `ReadIndexMetadata(ReadIndexMetadataOutcome)`, `WriteIndex(WriteIndexOutcome)`, `ExecuteSearch(ExecuteSearchOutcome)`, `DeleteIndex(DeleteIndexOutcome)`, `MutateConfig(MutateConfigOutcome)`, `CheckConfigChanged(CheckConfigChangedOutcome)`, `ReadChunkText(ReadChunkTextOutcome)` - -Command-level (Box'd): -- `Init(Box)`, `Update(Box)`, `Check(Box)`, `Build(Box)`, `Search(Box)`, `Info(Box)`, `Clean(CleanOutcome)` - -### CompactOutcome enum (mirrors Outcome) - -Same 24 variants with `*Compact` inner types. For leaf steps where full=compact, use the same struct or a type alias. - ### Outcome structs (full + compact pairs) Each pair: `derive(Debug, Serialize)` on both, `From<&Full> for Compact` impl. ~34 `From` impls total (24 outcome pairs + ~10 sub-structure pairs). @@ -53,8 +82,8 @@ Each pair: `derive(Debug, Serialize)` on both, `From<&Full> for Compact` impl. ~ ### Key methods -- `Outcome::to_compact(&self, substeps: &[Step]) -> CompactOutcome` — ~50-line match -- `Outcome::contains_violations(&self) -> bool` — for exit code logic +- `Outcome::to_compact(&self, substeps: &[Step]) -> CompactOutcome` — match with `_ => todo!()` initially, filled incrementally +- `Outcome::contains_violations(&self) -> bool` — for exit code logic, filled incrementally ### Notes diff --git a/docs/spec/todos/TODO-0124.md b/docs/spec/todos/TODO-0124.md index 14a9393..a473177 100644 --- a/docs/spec/todos/TODO-0124.md +++ b/docs/spec/todos/TODO-0124.md @@ -12,7 +12,7 @@ blocks: [125, 126, 127, 128, 129, 130, 131] ## Summary -Hand-implement `Serialize` for `Step` and `StepOutcome` to produce clean JSON output. `Step` cannot derive `Serialize` because `StepOutcome` requires custom serialization to flatten the `Result` nesting. +Hand-implement `Serialize` for `Step` and `StepOutcome` to produce clean JSON output. Done once — works for all Outcome variants as they're added incrementally in TODO-0122. ## Details @@ -43,18 +43,18 @@ impl Serialize for Step { } ``` -This works recursively: `Vec>` serializes each element via the same hand-impl. +Works recursively: `Vec>` serializes each element via the same hand-impl. ### Outcome and CompactOutcome -These use `derive(Serialize)` with externally tagged (serde default). Each variant serializes as `{ "VariantName": { fields } }`. Box is transparent to serde. +These use `derive(Serialize)` with externally tagged (serde default). Each variant serializes as `{ "VariantName": { fields } }`. Box is transparent to serde. No work needed here — the derive handles it as variants are added in TODO-0122. ### Notes +- This TODO can be implemented as soon as TODO-0120 and the initial Outcome enum exist (even with just 2 variants). +- Once done, it never needs updating — the generic impl works for any `O: Serialize`. - `StepError` derives `Serialize` normally. - `ErrorKind` derives `Serialize` with `#[serde(rename_all = "snake_case")]`. -- Verbose JSON: `serde_json::to_string_pretty(&step)` where step is `Step`. -- Compact JSON: `serde_json::to_string_pretty(&step.to_compact())` where result is `Step`. ## Files diff --git a/docs/spec/todos/TODO-0125.md b/docs/spec/todos/TODO-0125.md index dd2629f..1c5ac0a 100644 --- a/docs/spec/todos/TODO-0125.md +++ b/docs/spec/todos/TODO-0125.md @@ -12,33 +12,35 @@ blocks: [129] ## Summary -Convert the `clean` command to the new Step tree architecture. Simplest command — one leaf step (delete_index). Good first test of the full pipeline. +Convert the `clean` command to the new Step tree architecture. Simplest command — one leaf step (delete_index). First end-to-end test of the full pipeline. -## Details +## Checklist -### Delete +### Outcome structs (part of TODO-0122 phase 1) +- [ ] Add `DeleteIndex` + `Clean` variants to `Outcome` and `CompactOutcome` enums +- [ ] Create `DeleteIndexOutcome` / `DeleteIndexOutcomeCompact` + `From` + `Render` +- [ ] Create `CleanOutcome` / `CleanOutcomeCompact` + `From` + `Render` +- [ ] Implement `Outcome::to_compact()` and `contains_violations()` with `_ => todo!()` catch-all -- `CleanProcessOutput`, `CleanCommandOutput`, `CleanResult` structs -- `CommandOutput` impl on `CleanResult` and `CleanCommandOutput` -- `has_failed_step()` on `CleanCommandOutput` +### Command rewrite +- [ ] Rewrite `clean::run()` → returns `Step` +- [ ] Delete `CleanProcessOutput`, `CleanCommandOutput`, `CleanResult` structs +- [ ] Delete `CommandOutput` impl on `CleanResult` and `CleanCommandOutput` +- [ ] Delete `has_failed_step()` on `CleanCommandOutput` -### Rewrite - -- `clean::run()` returns `Step` instead of `CleanCommandOutput` -- Single substep: `DeleteIndex` step -- Command outcome: `Outcome::Clean(CleanOutcome { ... })` -- `CleanOutcome` and `CleanOutcomeCompact` already defined in TODO-0122 +### main.rs +- [ ] Update clean match arm: `Step` dispatch with exit codes, `to_compact()`, formatter ### Tests - -- Rewrite all tests in `src/cmd/clean.rs` to use `#[cfg(test)]` helpers (`unwrap_clean()`, `has_failed()`, `is_skipped()`) +- [ ] Rewrite clean tests with `#[cfg(test)]` helpers (`unwrap_clean()`, `has_failed()`, `is_skipped()`) ### Verification - -- `cargo test` — all clean tests pass -- `cargo clippy` + `cargo fmt` -- Manual: `cargo run -- clean example_kb` produces correct text and JSON output (both verbose and compact) +- [ ] `cargo test` — all tests pass (clean tests + other commands still using old model) +- [ ] `cargo clippy` + `cargo fmt` +- [ ] Manual: `cargo run -- clean example_kb` — text compact, text verbose, JSON compact, JSON verbose all correct ## Files - `src/cmd/clean.rs` (rewrite run(), delete old structs, rewrite tests) +- `src/outcome/` (add DeleteIndex + Clean variants and structs) +- `src/main.rs` (update clean match arm) diff --git a/docs/spec/todos/TODO-0126.md b/docs/spec/todos/TODO-0126.md index 308bd98..145b345 100644 --- a/docs/spec/todos/TODO-0126.md +++ b/docs/spec/todos/TODO-0126.md @@ -14,26 +14,33 @@ blocks: [129] Convert the `info` command to the new Step tree architecture. Simple query/display command with 2-3 leaf steps. -## Details +## Checklist -### Delete +### Outcome structs (part of TODO-0122 phase 2) +- [ ] Add `ReadConfig`, `Scan`, `ReadIndex`, `Info` variants to `Outcome` and `CompactOutcome` enums +- [ ] Create `ReadConfigOutcome` / `ReadConfigOutcomeCompact` + `From` + `Render` +- [ ] Create `ScanOutcome` / `ScanOutcomeCompact` + `From` + `Render` +- [ ] Create `ReadIndexOutcome` / `ReadIndexOutcomeCompact` + `From` + `Render` +- [ ] Create `InfoOutcome` / `InfoOutcomeCompact` + `From` + `Render` -- `InfoProcessOutput`, `InfoCommandOutput`, `InfoResult` structs -- `CommandOutput` impl on `InfoResult` and `InfoCommandOutput` +### Command rewrite +- [ ] Rewrite `info::run()` → returns `Step` +- [ ] Delete `InfoProcessOutput`, `InfoCommandOutput`, `InfoResult` structs +- [ ] Delete `CommandOutput` impls -### Rewrite - -- `info::run()` returns `Step` instead of `InfoCommandOutput` -- Substeps: ReadConfig, Scan (optional), ReadIndex (optional) -- Command outcome: `Outcome::Info(Box)` -- Define `InfoOutcome` fields: scan_glob, files_on_disk, fields (Vec), ignored_fields, index status -- Define `InfoOutcomeCompact` fields: files_on_disk, field_count, has_index +### main.rs +- [ ] Update info match arm: `Step` dispatch ### Tests +- [ ] Rewrite info tests with `#[cfg(test)]` helpers -- Rewrite tests to use `#[cfg(test)]` helpers +### Verification +- [ ] `cargo test` — all tests pass +- [ ] `cargo clippy` + `cargo fmt` +- [ ] Manual: `cargo run -- info example_kb` — all output formats correct ## Files - `src/cmd/info.rs` (rewrite run(), delete old structs, rewrite tests) -- `src/outcome/commands.rs` (add InfoOutcome, InfoOutcomeCompact if not already defined) +- `src/outcome/` (add ReadConfig, Scan, ReadIndex, Info variants and structs) +- `src/main.rs` (update info match arm) diff --git a/docs/spec/todos/TODO-0127.md b/docs/spec/todos/TODO-0127.md index 73dc5e8..9ed0b68 100644 --- a/docs/spec/todos/TODO-0127.md +++ b/docs/spec/todos/TODO-0127.md @@ -14,34 +14,38 @@ blocks: [129] Convert the `check` command to the new Step tree architecture. Medium complexity — has violations rendering, new fields, and optional auto-update nesting. -## Details - -### Delete - -- `CheckProcessOutput`, `CheckCommandOutput`, `CheckResult` structs -- `CommandOutput` impl on `CheckResult` and `CheckCommandOutput` - -### Keep - -- `check::validate()` core function — stays as-is, both check and build call it - -### Rewrite - -- `check::run()` returns `Step` instead of `CheckCommandOutput` -- Substeps: ReadConfig, auto-update (nested `Step` from `update::run()` — only if update is already converted, otherwise stub), Scan, Validate -- Command outcome: `Outcome::Check(Box)` -- `CheckOutcome` carries `files_checked`, `violations: Vec`, `new_fields: Vec` -- `CheckOutcomeCompact` carries `files_checked`, `violations: Vec`, `new_fields: Vec` — self-sufficient for rendering since Validate substep is silent in compact - -### Render impls - -- `CheckOutcome::render()` — one-liner + violation record tables + new field tables -- `CheckOutcomeCompact::render()` — one-liner + compact violation table + compact new field table +## Checklist + +### Outcome structs (part of TODO-0122 phase 3) +- [ ] Add `Validate`, `Check` variants to `Outcome` and `CompactOutcome` enums +- [ ] Create `ValidateOutcome` / `ValidateOutcomeCompact` + `From` + `Render` +- [ ] Create `CheckOutcome` / `CheckOutcomeCompact` + `From` + `Render` +- [ ] Create `FieldViolationCompact` + `From<&FieldViolation>` +- [ ] Create `NewFieldCompact` + `From<&NewField>` +- [ ] Update `Outcome::contains_violations()` for Validate and Check variants + +### Command rewrite +- [ ] Rewrite `check::run()` → returns `Step` +- [ ] Keep `check::validate()` as core function (reusable by build) +- [ ] Delete `CheckProcessOutput`, `CheckCommandOutput`, `CheckResult` structs +- [ ] Delete `CommandOutput` impls +- [ ] Handle auto-update nesting: if update is already converted, nest its Step; otherwise stub + +### main.rs +- [ ] Update check match arm: `Step` dispatch ### Tests +- [ ] Rewrite check tests with `#[cfg(test)]` helpers (`unwrap_check()`, `has_failed()`) +- [ ] Test violation detection, new fields, auto-update -- Rewrite tests to use `#[cfg(test)]` helpers (`unwrap_check()`, `has_failed()`) +### Verification +- [ ] `cargo test` — all tests pass +- [ ] `cargo clippy` + `cargo fmt` +- [ ] Manual: `cargo run -- check example_kb` — all output formats correct ## Files -- `src/cmd/check.rs` (rewrite run(), delete old structs, keep validate(), rewrite tests) +- `src/cmd/check.rs` (rewrite run(), keep validate(), delete old structs, rewrite tests) +- `src/outcome/` (add Validate, Check variants and structs) +- `src/output.rs` (add FieldViolationCompact, NewFieldCompact) +- `src/main.rs` (update check match arm) diff --git a/docs/spec/todos/TODO-0128.md b/docs/spec/todos/TODO-0128.md index 845ac1a..859f63d 100644 --- a/docs/spec/todos/TODO-0128.md +++ b/docs/spec/todos/TODO-0128.md @@ -14,30 +14,34 @@ blocks: [129] Convert the `init` command to the new Step tree architecture. Simple pipeline — scan, infer, write_config. -## Details +## Checklist -### Delete +### Outcome structs (part of TODO-0122 phase 4) +- [ ] Add `Infer`, `WriteConfig`, `Init` variants to `Outcome` and `CompactOutcome` enums +- [ ] Create `InferOutcome` / `InferOutcomeCompact` + `From` + `Render` +- [ ] Create `WriteConfigOutcome` / `WriteConfigOutcomeCompact` + `From` + `Render` +- [ ] Create `InitOutcome` / `InitOutcomeCompact` + `From` + `Render` +- [ ] Create `DiscoveredFieldCompact` + `From<&DiscoveredField>` -- `InitProcessOutput`, `InitCommandOutput`, `InitResult` structs -- `CommandOutput` impl on `InitResult` and `InitCommandOutput` +### Command rewrite +- [ ] Rewrite `init::run()` → returns `Step` +- [ ] Delete `InitProcessOutput`, `InitCommandOutput`, `InitResult` structs +- [ ] Delete `CommandOutput` impls -### Rewrite - -- `init::run()` returns `Step` instead of `InitCommandOutput` -- Substeps: Scan, Infer, WriteConfig (Skipped if dry_run) -- Command outcome: `Outcome::Init(Box)` -- `InitOutcome` carries `path`, `files_scanned`, `fields: Vec`, `dry_run` -- `InitOutcomeCompact` carries `path`, `files_scanned`, `field_count`, `dry_run` - -### Render impls - -- `InitOutcome::render()` — one-liner + per-field record tables with detail (allowed/required/nullable/hints) -- `InitOutcomeCompact::render()` — one-liner + compact field table +### main.rs +- [ ] Update init match arm: `Step` dispatch ### Tests +- [ ] Rewrite init tests with `#[cfg(test)]` helpers (`unwrap_init()`, `has_failed()`) -- Rewrite tests to use `#[cfg(test)]` helpers (`unwrap_init()`, `has_failed()`) +### Verification +- [ ] `cargo test` — all tests pass +- [ ] `cargo clippy` + `cargo fmt` +- [ ] Manual: `cargo run -- init /tmp/test-vault` — all output formats correct ## Files - `src/cmd/init.rs` (rewrite run(), delete old structs, rewrite tests) +- `src/outcome/` (add Infer, WriteConfig, Init variants and structs) +- `src/output.rs` (add DiscoveredFieldCompact) +- `src/main.rs` (update init match arm) diff --git a/docs/spec/todos/TODO-0129.md b/docs/spec/todos/TODO-0129.md index 0a001dd..9a0a422 100644 --- a/docs/spec/todos/TODO-0129.md +++ b/docs/spec/todos/TODO-0129.md @@ -14,31 +14,35 @@ blocks: [130] Convert the `update` command to the new Step tree architecture. Must be done before build (which nests update as auto-update). -## Details +## Checklist -### Delete +### Outcome structs (part of TODO-0122 phase 5) +- [ ] Add `Update` variant to `Outcome` and `CompactOutcome` enums +- [ ] Create `UpdateOutcome` / `UpdateOutcomeCompact` + `From` + `Render` +- [ ] Create `ChangedFieldCompact` + `From<&ChangedField>` +- [ ] Create `RemovedFieldCompact` + `From<&RemovedField>` -- `UpdateProcessOutput`, `UpdateCommandOutput`, `UpdateResult` structs -- `CommandOutput` impl on `UpdateResult` and `UpdateCommandOutput` +### Command rewrite +- [ ] Rewrite `update::run()` → returns `Step` +- [ ] Field comparison logic stays inline in `run()` +- [ ] Delete `UpdateProcessOutput`, `UpdateCommandOutput`, `UpdateResult` structs +- [ ] Delete `CommandOutput` impls -### Rewrite - -- `update::run()` returns `Step` instead of `UpdateCommandOutput` -- Substeps: ReadConfig, Scan, Infer, WriteConfig (Skipped if dry_run or no changes) -- Field comparison logic stays inline in `run()` (not extracted as a step) -- Command outcome: `Outcome::Update(Box)` -- `UpdateOutcome` carries `files_scanned`, `added: Vec`, `changed: Vec`, `removed: Vec`, `unchanged: usize`, `dry_run` -- `UpdateOutcomeCompact` carries `files_scanned`, `added_count`, `changed_count`, `removed_count`, `unchanged`, `dry_run` - -### Render impls - -- `UpdateOutcome::render()` — one-liner + per-category tables (added, changed, removed) with detail rows -- `UpdateOutcomeCompact::render()` — one-liner + compact change summary tables +### main.rs +- [ ] Update update match arm: `Step` dispatch ### Tests +- [ ] Rewrite update tests with `#[cfg(test)]` helpers (`unwrap_update()`, `has_failed()`) +- [ ] Test reinfer, reinfer-all, dry-run, new fields, changed fields -- Rewrite tests to use `#[cfg(test)]` helpers (`unwrap_update()`, `has_failed()`) +### Verification +- [ ] `cargo test` — all tests pass +- [ ] `cargo clippy` + `cargo fmt` +- [ ] Manual: `cargo run -- update example_kb` — all output formats correct ## Files - `src/cmd/update.rs` (rewrite run(), delete old structs, rewrite tests) +- `src/outcome/` (add Update variant and structs) +- `src/output.rs` (add ChangedFieldCompact, RemovedFieldCompact) +- `src/main.rs` (update update match arm) diff --git a/docs/spec/todos/TODO-0130.md b/docs/spec/todos/TODO-0130.md index 7f9b840..f2d83c6 100644 --- a/docs/spec/todos/TODO-0130.md +++ b/docs/spec/todos/TODO-0130.md @@ -14,52 +14,59 @@ blocks: [131] Convert `build` and `search` commands together to the new Step tree architecture. These are tightly coupled — search nests build (auto-build), build nests update (auto-update). Most complex piece of the migration. -## Details - -### Build command - -#### Delete -- `BuildProcessOutput`, `BuildCommandOutput`, `BuildResult` structs -- `CommandOutput` impl on `BuildResult` and `BuildCommandOutput` -- `has_violations()` on `BuildCommandOutput` - -#### Rewrite -- `build::run()` returns `Step` -- Substeps: ReadConfig, auto-update (nested Step from `update::run()`), MutateConfig, Scan, Validate, ReadIndexMetadata (new — split from classify), CheckConfigChanged (new), Classify, LoadModel, EmbedFiles, WriteIndex -- Violations: Validate substep succeeds with violations as Ok data. Build aborts with `Err(StepError { kind: User, message: "N violations found" })`. -- `BuildOutcome` carries full rebuild stats + per-file details -- `BuildOutcomeCompact` carries summary stats only - -#### New leaf steps -- `MutateConfig` — side-effect step: fills missing config sections, applies --set-* flags. Outcome: `MutateConfigOutcome { sections_added: usize }` -- `ReadIndexMetadata` — reads parquet metadata without full file index. Outcome: `ReadIndexMetadataOutcome { model, revision, chunk_size }` -- `CheckConfigChanged` — compares toml config vs parquet metadata. Outcome: `CheckConfigChangedOutcome { changed: bool }` - -### Search command - -#### Delete -- `SearchProcessOutput`, `SearchCommandOutput`, `SearchResult` structs -- `CommandOutput` impl on `SearchResult` and `SearchCommandOutput` - -#### Rewrite -- `search::run()` returns `Step` -- Substeps: ReadConfig, auto-build (nested Step from `build::run()`), ReadIndex, LoadModel, EmbedQuery, ExecuteSearch, ReadChunkText (Skipped if !verbose) -- `SearchOutcome` carries query, hits (Vec), model_name, limit -- `SearchOutcomeCompact` carries query, hits (Vec), model_name, limit +## Checklist + +### Outcome structs — build (part of TODO-0122 phase 6) +- [ ] Add `MutateConfig`, `ReadIndexMetadata`, `CheckConfigChanged`, `Classify`, `LoadModel`, `EmbedFiles`, `WriteIndex`, `Build` variants to both enums +- [ ] Create `MutateConfigOutcome` / `MutateConfigOutcomeCompact` + `From` + `Render` +- [ ] Create `ReadIndexMetadataOutcome` / `ReadIndexMetadataOutcomeCompact` + `From` + `Render` +- [ ] Create `CheckConfigChangedOutcome` / `CheckConfigChangedOutcomeCompact` + `From` + `Render` +- [ ] Create `ClassifyOutcome` / `ClassifyOutcomeCompact` + `From` + `Render` +- [ ] Create `LoadModelOutcome` / `LoadModelOutcomeCompact` + `From` + `Render` +- [ ] Create `EmbedFilesOutcome` / `EmbedFilesOutcomeCompact` + `From` + `Render` +- [ ] Create `WriteIndexOutcome` / `WriteIndexOutcomeCompact` + `From` + `Render` +- [ ] Create `BuildOutcome` / `BuildOutcomeCompact` + `From` + `Render` +- [ ] Create `BuildFileDetailCompact` + `From<&BuildFileDetail>` + +### Outcome structs — search (part of TODO-0122 phase 6) +- [ ] Add `EmbedQuery`, `ExecuteSearch`, `ReadChunkText`, `Search` variants to both enums +- [ ] Create `EmbedQueryOutcome` / `EmbedQueryOutcomeCompact` + `From` + `Render` +- [ ] Create `ExecuteSearchOutcome` / `ExecuteSearchOutcomeCompact` + `From` + `Render` +- [ ] Create `ReadChunkTextOutcome` / `ReadChunkTextOutcomeCompact` + `From` + `Render` +- [ ] Create `SearchOutcome` / `SearchOutcomeCompact` + `From` + `Render` +- [ ] Create `SearchHitCompact` + `From<&SearchHit>` +- [ ] Remove `_ => todo!()` catch-all arms — all variants now covered + +### Build command rewrite +- [ ] Rewrite `build::run()` → returns `Step` +- [ ] New leaf steps: MutateConfig, ReadIndexMetadata, CheckConfigChanged +- [ ] Auto-update nesting: `update::run()` returns `Step`, included as substep +- [ ] Violations: Validate succeeds with Ok data, Build aborts with Err +- [ ] Delete `BuildProcessOutput`, `BuildCommandOutput`, `BuildResult` structs + +### Search command rewrite +- [ ] Rewrite `search::run()` → returns `Step` +- [ ] Auto-build nesting: `build::run()` returns `Step`, included as substep +- [ ] ReadChunkText step: Skipped if !verbose +- [ ] Delete `SearchProcessOutput`, `SearchCommandOutput`, `SearchResult` structs + +### main.rs +- [ ] Update build match arm: `Step` dispatch +- [ ] Update search match arm: `Step` dispatch ### Tests - -- Rewrite all build and search tests to use `#[cfg(test)]` helpers -- End-to-end tests: init → build → search pipeline +- [ ] Rewrite build tests with helpers (end-to-end, violations, incremental, model mismatch) +- [ ] Rewrite search tests with helpers (end-to-end, missing config, where clause, model mismatch) ### Verification - -- `cargo test` — all 245+ tests pass -- `cargo clippy` + `cargo fmt` -- Manual: test against `example_kb/` with all output formats (text compact, text verbose, JSON compact, JSON verbose) +- [ ] `cargo test` — ALL 245+ tests pass +- [ ] `cargo clippy` + `cargo fmt` +- [ ] Manual: test against `example_kb/` with all output formats +- [ ] Manual: test search with auto-build (nested Step tree renders correctly) ## Files - `src/cmd/build.rs` (rewrite run(), delete old structs, rewrite tests) - `src/cmd/search.rs` (rewrite run(), delete old structs, rewrite tests) -- `src/outcome/` (add MutateConfig, ReadIndexMetadata, CheckConfigChanged, ReadChunkText outcomes if not done in 0122) +- `src/outcome/` (add all remaining variants and structs) +- `src/main.rs` (update build + search match arms) diff --git a/docs/spec/todos/TODO-0131.md b/docs/spec/todos/TODO-0131.md index d07e800..8ac141f 100644 --- a/docs/spec/todos/TODO-0131.md +++ b/docs/spec/todos/TODO-0131.md @@ -12,54 +12,55 @@ blocks: [132, 133] ## Summary -Final cleanup after all commands are converted. Delete the old pipeline modules, update main.rs to use the new Step-based dispatch, and simplify output.rs by removing the obsolete CommandOutput trait. - -## Details - -### Delete - -- All `src/pipeline/*.rs` step wrapper modules: - - `scan.rs`, `infer.rs`, `validate.rs`, `classify.rs`, `load_model.rs`, `embed.rs`, `read_config.rs`, `write_config.rs`, `read_index.rs`, `write_index.rs`, `execute_search.rs`, `delete_index.rs` -- `src/pipeline/mod.rs` — `ProcessingStepResult`, `ProcessingStep`, `StepOutput`, `ProcessingStepError`, `ErrorKind` (replaced by types in `src/step.rs`) - -### Simplify output.rs - -- Remove `CommandOutput` trait (`format_text`, `format_json`, `print`) -- Remove `OutputFormat` enum (moved to main.rs or render.rs) -- Remove `format_json_compact` helper -- Keep shared sub-types: `FieldViolation`, `ViolatingFile`, `FieldViolationCompact`, `SearchHit`, `SearchHitCompact`, `NewField`, `NewFieldCompact`, `DiscoveredField`, `DiscoveredFieldCompact`, `ChangedField`, `RemovedField`, `BuildFileDetail`, `FieldHint`, `format_file_count`, `format_size`, `field_hints`, `format_hints` - -### Update main.rs - -- All command match arms receive `Step` (not command-specific output types) -- Exit code logic: call `has_failed(&step)` and `has_violations(&step)` before rendering -- Output dispatch: - ```rust - match (format, verbose) { - (Text, true) => format_text(&step.render()), - (Text, false) => format_text(&step.to_compact().render()), - (Json, true) => serde_json::to_string_pretty(&step), - (Json, false) => serde_json::to_string_pretty(&step.to_compact()), - } - ``` -- Exit with appropriate code (0, 1, 2, 3) - -### Update lib.rs - -- Remove `pub mod pipeline` -- Add `pub mod step`, `pub mod block`, `pub mod outcome`, `pub mod render` (if not already) - -### Verification - -- `cargo test` — all tests pass -- `cargo clippy` — no warnings -- `cargo fmt` — formatted -- Manual end-to-end: test all 7 commands against `example_kb/` with all output formats -- Verify JSON output shape matches TODO-0119 spec +Final cleanup after all commands are converted. Delete the old pipeline modules, simplify output.rs by removing the obsolete CommandOutput trait, and finalize main.rs dispatch. + +## Incremental checklist + +main.rs is updated incrementally as each command is converted. By the time this TODO runs, main.rs already has a mix of old and new dispatch. This TODO finalizes it. + +### main.rs dispatch (tracked per command conversion) +- [ ] clean match arm uses Step (TODO-0125) +- [ ] info match arm uses Step (TODO-0126) +- [ ] check match arm uses Step (TODO-0127) +- [ ] init match arm uses Step (TODO-0128) +- [ ] update match arm uses Step (TODO-0129) +- [ ] build match arm uses Step (TODO-0130) +- [ ] search match arm uses Step (TODO-0130) + +### Pipeline module deletion (after all commands converted) +- [ ] Delete `src/pipeline/scan.rs` +- [ ] Delete `src/pipeline/infer.rs` +- [ ] Delete `src/pipeline/validate.rs` +- [ ] Delete `src/pipeline/classify.rs` +- [ ] Delete `src/pipeline/load_model.rs` +- [ ] Delete `src/pipeline/embed.rs` +- [ ] Delete `src/pipeline/read_config.rs` +- [ ] Delete `src/pipeline/write_config.rs` +- [ ] Delete `src/pipeline/read_index.rs` +- [ ] Delete `src/pipeline/write_index.rs` +- [ ] Delete `src/pipeline/execute_search.rs` +- [ ] Delete `src/pipeline/delete_index.rs` +- [ ] Delete `src/pipeline/mod.rs` + +### output.rs simplification +- [ ] Remove `CommandOutput` trait +- [ ] Remove `OutputFormat` enum (moved to main.rs or render.rs) +- [ ] Remove `format_json_compact` helper +- [ ] Keep shared sub-types (`FieldViolation`, `SearchHit`, `NewField`, `DiscoveredField`, etc.) + +### lib.rs cleanup +- [ ] Remove `pub mod pipeline` +- [ ] Confirm `pub mod step`, `pub mod block`, `pub mod outcome`, `pub mod render` are present + +### Final verification +- [ ] `cargo test` — all tests pass +- [ ] `cargo clippy` — no warnings +- [ ] `cargo fmt` — formatted +- [ ] Manual end-to-end: test all 7 commands against `example_kb/` with all output formats (text compact, text verbose, JSON compact, JSON verbose) ## Files - `src/pipeline/` (delete entire directory) - `src/output.rs` (simplify) -- `src/main.rs` (rewrite dispatch) +- `src/main.rs` (finalize dispatch) - `src/lib.rs` (update module declarations) From 736864276e85084b2c0181000eb6d08611cae61c Mon Sep 17 00:00:00 2001 From: edoch Date: Thu, 19 Mar 2026 23:28:25 +0100 Subject: [PATCH 03/35] refactor: add Step tree infrastructure and outcome types New modules for the unified Step tree architecture (TODO-0119): - step.rs: generic Step, StepOutcome, StepError, has_failed/has_violations, custom Serialize, Render impls, migration helpers (from_pipeline_result) - block.rs: Block enum (Line/Table/Section), TableStyle, Render trait - render.rs: format_text and format_markdown shared formatters - outcome/: Outcome + CompactOutcome enums with all leaf and command outcome structs, compact counterparts, From impls, and Render impls Implements TODOs 0120-0124 and outcome structs from TODO-0122. --- src/block.rs | 131 +++++++ src/lib.rs | 8 + src/outcome/classify.rs | 65 ++++ src/outcome/commands/build.rs | 224 +++++++++++ src/outcome/commands/check.rs | 198 ++++++++++ src/outcome/commands/clean.rs | 140 +++++++ src/outcome/commands/info.rs | 147 ++++++++ src/outcome/commands/init.rs | 171 +++++++++ src/outcome/commands/mod.rs | 20 + src/outcome/commands/search.rs | 160 ++++++++ src/outcome/commands/update.rs | 183 +++++++++ src/outcome/config.rs | 81 ++++ src/outcome/embed.rs | 83 ++++ src/outcome/index.rs | 226 +++++++++++ src/outcome/infer.rs | 42 +++ src/outcome/mod.rs | 388 +++++++++++++++++++ src/outcome/model.rs | 44 +++ src/outcome/scan.rs | 48 +++ src/outcome/search.rs | 37 ++ src/outcome/validate.rs | 64 ++++ src/render.rs | 243 ++++++++++++ src/step.rs | 667 +++++++++++++++++++++++++++++++++ 22 files changed, 3370 insertions(+) create mode 100644 src/block.rs create mode 100644 src/outcome/classify.rs create mode 100644 src/outcome/commands/build.rs create mode 100644 src/outcome/commands/check.rs create mode 100644 src/outcome/commands/clean.rs create mode 100644 src/outcome/commands/info.rs create mode 100644 src/outcome/commands/init.rs create mode 100644 src/outcome/commands/mod.rs create mode 100644 src/outcome/commands/search.rs create mode 100644 src/outcome/commands/update.rs create mode 100644 src/outcome/config.rs create mode 100644 src/outcome/embed.rs create mode 100644 src/outcome/index.rs create mode 100644 src/outcome/infer.rs create mode 100644 src/outcome/mod.rs create mode 100644 src/outcome/model.rs create mode 100644 src/outcome/scan.rs create mode 100644 src/outcome/search.rs create mode 100644 src/outcome/validate.rs create mode 100644 src/render.rs create mode 100644 src/step.rs diff --git a/src/block.rs b/src/block.rs new file mode 100644 index 0000000..f4fbbb8 --- /dev/null +++ b/src/block.rs @@ -0,0 +1,131 @@ +//! Rendering primitives and the `Render` trait. +//! +//! Data types produce `Vec` via the `Render` trait. Shared formatters +//! (`format_text`, `format_markdown`) consume blocks and produce formatted +//! output strings. This separation means adding a new output format requires +//! writing one formatter function, not touching any command or outcome type. + +/// A rendering primitive — the intermediate representation between data and +/// formatted output. +#[derive(Debug, Clone)] +pub enum Block { + /// A single line of text. + Line(String), + /// A table with optional headers and styled rows. + Table { + /// Column headers (displayed as the first row in most formats). + headers: Option>, + /// Table rows, each a vector of cell strings. + rows: Vec>, + /// How the table should be styled by the formatter. + style: TableStyle, + }, + /// A labeled group of child blocks (for nested command output, sections). + Section { + /// Section header label. + label: String, + /// Child blocks within this section. + children: Vec, + }, +} + +/// Table styling hints consumed by formatters. +#[derive(Debug, Clone)] +pub enum TableStyle { + /// No internal horizontal separators. For compact summary tables. + Compact, + /// Detail rows span all columns (via ColumnSpan in text formatter). + /// For per-item record tables with expandable detail. + Record { + /// Zero-based row indices that should span all columns as detail rows. + detail_rows: Vec, + }, +} + +/// Trait for types that render themselves as a sequence of blocks. +/// +/// No parameters — the struct IS the verbose/compact decision. `Outcome` +/// structs produce full detail blocks; `CompactOutcome` structs produce +/// summary blocks or empty vecs (for silent leaf steps). +pub trait Render { + /// Produce rendering blocks for this type. + fn render(&self) -> Vec; +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn block_line() { + let block = Block::Line("hello".into()); + match block { + Block::Line(s) => assert_eq!(s, "hello"), + _ => panic!("expected Line"), + } + } + + #[test] + fn block_table() { + let block = Block::Table { + headers: Some(vec!["name".into(), "type".into()]), + rows: vec![vec!["title".into(), "String".into()]], + style: TableStyle::Compact, + }; + match block { + Block::Table { headers, rows, .. } => { + assert_eq!(headers.unwrap().len(), 2); + assert_eq!(rows.len(), 1); + } + _ => panic!("expected Table"), + } + } + + #[test] + fn block_section() { + let block = Block::Section { + label: "Auto-build".into(), + children: vec![Block::Line("Scan: 5 files".into())], + }; + match block { + Block::Section { label, children } => { + assert_eq!(label, "Auto-build"); + assert_eq!(children.len(), 1); + } + _ => panic!("expected Section"), + } + } + + #[test] + fn block_is_clone() { + let block = Block::Line("test".into()); + let cloned = block.clone(); + match cloned { + Block::Line(s) => assert_eq!(s, "test"), + _ => panic!("expected Line"), + } + } + + struct DummyOutcome { + label: String, + } + + impl Render for DummyOutcome { + fn render(&self) -> Vec { + vec![Block::Line(self.label.clone())] + } + } + + #[test] + fn render_trait_on_dummy() { + let outcome = DummyOutcome { + label: "Scan: 5 files".into(), + }; + let blocks = outcome.render(); + assert_eq!(blocks.len(), 1); + match &blocks[0] { + Block::Line(s) => assert_eq!(s, "Scan: 5 files"), + _ => panic!("expected Line"), + } + } +} diff --git a/src/lib.rs b/src/lib.rs index 7e7b4be..706ccf4 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -5,19 +5,27 @@ //! A database of markdown documents: schema inference, frontmatter validation, //! and semantic search with SQL filtering. Single binary, no cloud, no setup. +/// Rendering primitives (`Block`, `TableStyle`) and the `Render` trait. +pub mod block; /// CLI command implementations. pub mod cmd; /// File scanning, type inference, and schema discovery. pub mod discover; /// Chunking, embedding, storage, and backend abstraction. pub mod index; +/// Outcome types for all pipeline steps and commands. +pub mod outcome; /// Output formatting types and the `CommandOutput` trait. pub mod output; /// Core pipeline abstractions for structured command output. pub mod pipeline; +/// Shared formatters (`format_text`, `format_markdown`) that consume `Vec`. +pub mod render; /// Configuration file types (`mdvs.toml`) and shared data structures. pub mod schema; /// DataFusion-based search context with cosine similarity UDF. pub mod search; +/// Step tree types for the unified command output architecture. +pub mod step; /// Table rendering helpers (compact and record styles via `tabled`). pub mod table; diff --git a/src/outcome/classify.rs b/src/outcome/classify.rs new file mode 100644 index 0000000..951543f --- /dev/null +++ b/src/outcome/classify.rs @@ -0,0 +1,65 @@ +//! Outcome types for the classify leaf step. + +use serde::Serialize; + +use crate::block::{Block, Render}; +use crate::output::format_file_count; + +/// Full outcome for the classify step. +#[derive(Debug, Serialize)] +pub struct ClassifyOutcome { + /// Whether this is a full rebuild. + pub full_rebuild: bool, + /// Number of files that need embedding (new + edited). + pub needs_embedding: usize, + /// Number of files unchanged from previous build. + pub unchanged: usize, + /// Number of files removed since previous build. + pub removed: usize, +} + +impl Render for ClassifyOutcome { + fn render(&self) -> Vec { + if self.full_rebuild { + vec![Block::Line(format!( + "Classify: {} (full rebuild)", + format_file_count(self.needs_embedding) + ))] + } else { + vec![Block::Line(format!( + "Classify: {} to embed, {} unchanged, {} removed", + self.needs_embedding, self.unchanged, self.removed + ))] + } + } +} + +/// Compact outcome for the classify step (identical — no verbose-only fields). +#[derive(Debug, Serialize)] +pub struct ClassifyOutcomeCompact { + /// Whether this is a full rebuild. + pub full_rebuild: bool, + /// Number of files that need embedding. + pub needs_embedding: usize, + /// Number of files unchanged. + pub unchanged: usize, + /// Number of files removed. + pub removed: usize, +} + +impl Render for ClassifyOutcomeCompact { + fn render(&self) -> Vec { + vec![] + } +} + +impl From<&ClassifyOutcome> for ClassifyOutcomeCompact { + fn from(o: &ClassifyOutcome) -> Self { + Self { + full_rebuild: o.full_rebuild, + needs_embedding: o.needs_embedding, + unchanged: o.unchanged, + removed: o.removed, + } + } +} diff --git a/src/outcome/commands/build.rs b/src/outcome/commands/build.rs new file mode 100644 index 0000000..9dda255 --- /dev/null +++ b/src/outcome/commands/build.rs @@ -0,0 +1,224 @@ +//! Build command outcome types. + +use serde::Serialize; + +use crate::block::{Block, Render, TableStyle}; +use crate::output::BuildFileDetail; +use crate::output::{format_file_count, NewField}; + +fn format_chunk_count(n: usize) -> String { + if n == 1 { + "1 chunk".to_string() + } else { + format!("{n} chunks") + } +} + +/// Full outcome for the build command. +#[derive(Debug, Serialize)] +pub struct BuildOutcome { + /// Whether this was a full rebuild (vs incremental). + pub full_rebuild: bool, + /// Total number of files in the final index. + pub files_total: usize, + /// Number of files that were chunked and embedded this run. + pub files_embedded: usize, + /// Number of files reused from the previous index. + pub files_unchanged: usize, + /// Number of files removed since the last build. + pub files_removed: usize, + /// Total number of chunks in the final index. + pub chunks_total: usize, + /// Number of chunks produced by newly embedded files. + pub chunks_embedded: usize, + /// Number of chunks retained from unchanged files. + pub chunks_unchanged: usize, + /// Number of chunks dropped from removed files. + pub chunks_removed: usize, + /// Fields found in frontmatter but not yet in `mdvs.toml`. + pub new_fields: Vec, + /// Per-file chunk counts for embedded files. + pub embedded_files: Vec, + /// Per-file chunk counts for removed files. + pub removed_files: Vec, +} + +impl Render for BuildOutcome { + fn render(&self) -> Vec { + let mut blocks = vec![]; + + // New fields (shown before stats) + for nf in &self.new_fields { + blocks.push(Block::Line(format!( + " new field: {} ({})", + nf.name, + format_file_count(nf.files_found) + ))); + } + if !self.new_fields.is_empty() { + blocks.push(Block::Line( + "Run 'mdvs update' to incorporate new fields.".into(), + )); + } + + // One-liner + let rebuild_suffix = if self.full_rebuild { + " (full rebuild)" + } else { + "" + }; + blocks.push(Block::Line(format!( + "Built index — {}, {}{rebuild_suffix}", + format_file_count(self.files_total), + format_chunk_count(self.chunks_total) + ))); + + // Verbose: record tables per category with file-by-file detail + if self.files_embedded > 0 { + let detail = self + .embedded_files + .iter() + .map(|f| format!(" - \"{}\" ({})", f.filename, format_chunk_count(f.chunks))) + .collect::>() + .join("\n"); + blocks.push(Block::Table { + headers: None, + rows: vec![ + vec![ + "embedded".to_string(), + format_file_count(self.files_embedded), + format_chunk_count(self.chunks_embedded), + ], + vec![detail, String::new(), String::new()], + ], + style: TableStyle::Record { + detail_rows: vec![1], + }, + }); + } + if self.files_unchanged > 0 { + blocks.push(Block::Table { + headers: None, + rows: vec![vec![ + "unchanged".to_string(), + format_file_count(self.files_unchanged), + format_chunk_count(self.chunks_unchanged), + ]], + style: TableStyle::Compact, + }); + } + if self.files_removed > 0 { + let detail = self + .removed_files + .iter() + .map(|f| format!(" - \"{}\" ({})", f.filename, format_chunk_count(f.chunks))) + .collect::>() + .join("\n"); + blocks.push(Block::Table { + headers: None, + rows: vec![ + vec![ + "removed".to_string(), + format_file_count(self.files_removed), + format_chunk_count(self.chunks_removed), + ], + vec![detail, String::new(), String::new()], + ], + style: TableStyle::Record { + detail_rows: vec![1], + }, + }); + } + + blocks + } +} + +/// Compact outcome for the build command. +#[derive(Debug, Serialize)] +pub struct BuildOutcomeCompact { + /// Whether this was a full rebuild. + pub full_rebuild: bool, + /// Total files in the final index. + pub files_total: usize, + /// Files embedded this run. + pub files_embedded: usize, + /// Files unchanged from previous build. + pub files_unchanged: usize, + /// Files removed since last build. + pub files_removed: usize, + /// Total chunks in the final index. + pub chunks_total: usize, + /// Chunks produced by new embeddings. + pub chunks_embedded: usize, + /// Chunks retained from unchanged files. + pub chunks_unchanged: usize, + /// Chunks dropped from removed files. + pub chunks_removed: usize, +} + +impl Render for BuildOutcomeCompact { + fn render(&self) -> Vec { + let mut blocks = vec![]; + + let rebuild_suffix = if self.full_rebuild { + " (full rebuild)" + } else { + "" + }; + blocks.push(Block::Line(format!( + "Built index — {}, {}{rebuild_suffix}", + format_file_count(self.files_total), + format_chunk_count(self.chunks_total) + ))); + + // Compact stats table + let mut rows = vec![]; + if self.files_embedded > 0 { + rows.push(vec![ + "embedded".to_string(), + format_file_count(self.files_embedded), + format_chunk_count(self.chunks_embedded), + ]); + } + if self.files_unchanged > 0 { + rows.push(vec![ + "unchanged".to_string(), + format_file_count(self.files_unchanged), + format_chunk_count(self.chunks_unchanged), + ]); + } + if self.files_removed > 0 { + rows.push(vec![ + "removed".to_string(), + format_file_count(self.files_removed), + format_chunk_count(self.chunks_removed), + ]); + } + if !rows.is_empty() { + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::Compact, + }); + } + + blocks + } +} + +impl From<&BuildOutcome> for BuildOutcomeCompact { + fn from(o: &BuildOutcome) -> Self { + Self { + full_rebuild: o.full_rebuild, + files_total: o.files_total, + files_embedded: o.files_embedded, + files_unchanged: o.files_unchanged, + files_removed: o.files_removed, + chunks_total: o.chunks_total, + chunks_embedded: o.chunks_embedded, + chunks_unchanged: o.chunks_unchanged, + chunks_removed: o.chunks_removed, + } + } +} diff --git a/src/outcome/commands/check.rs b/src/outcome/commands/check.rs new file mode 100644 index 0000000..279bac2 --- /dev/null +++ b/src/outcome/commands/check.rs @@ -0,0 +1,198 @@ +//! Check command outcome types. + +use serde::Serialize; + +use crate::block::{Block, Render, TableStyle}; +use crate::output::{ + format_file_count, FieldViolation, FieldViolationCompact, NewField, NewFieldCompact, + ViolationKind, +}; + +/// Full outcome for the check command. +#[derive(Debug, Serialize)] +pub struct CheckOutcome { + /// Number of markdown files checked. + pub files_checked: usize, + /// Violations grouped by field and kind. + pub violations: Vec, + /// Fields found in frontmatter but not defined in `mdvs.toml`. + pub new_fields: Vec, +} + +impl Render for CheckOutcome { + fn render(&self) -> Vec { + let mut blocks = vec![]; + + let violation_part = if self.violations.is_empty() { + "no violations".to_string() + } else { + format!("{} violation(s)", self.violations.len()) + }; + let new_field_part = if self.new_fields.is_empty() { + String::new() + } else { + format!(", {} new field(s)", self.new_fields.len()) + }; + blocks.push(Block::Line(format!( + "Checked {} — {violation_part}{new_field_part}", + format_file_count(self.files_checked), + ))); + + for v in &self.violations { + let kind_str = match v.kind { + ViolationKind::MissingRequired => "MissingRequired", + ViolationKind::WrongType => "WrongType", + ViolationKind::Disallowed => "Disallowed", + ViolationKind::NullNotAllowed => "NullNotAllowed", + }; + let detail_text = v + .files + .iter() + .map(|f| match &f.detail { + Some(d) => format!(" - \"{}\" ({d})", f.path.display()), + None => format!(" - \"{}\"", f.path.display()), + }) + .collect::>() + .join("\n"); + + blocks.push(Block::Table { + headers: None, + rows: vec![ + vec![ + format!("\"{}\"", v.field), + kind_str.to_string(), + format_file_count(v.files.len()), + ], + vec![detail_text, String::new(), String::new()], + ], + style: TableStyle::Record { + detail_rows: vec![1], + }, + }); + } + + for nf in &self.new_fields { + let detail_text = match &nf.files { + Some(files) => files + .iter() + .map(|p| format!(" - \"{}\"", p.display())) + .collect::>() + .join("\n"), + None => String::new(), + }; + let mut rows = vec![vec![ + format!("\"{}\"", nf.name), + "new".to_string(), + format_file_count(nf.files_found), + ]]; + if !detail_text.is_empty() { + rows.push(vec![detail_text, String::new(), String::new()]); + } + blocks.push(Block::Table { + headers: None, + rows: rows.clone(), + style: if rows.len() > 1 { + TableStyle::Record { + detail_rows: vec![1], + } + } else { + TableStyle::Compact + }, + }); + } + + blocks + } +} + +/// Compact outcome for the check command. +#[derive(Debug, Serialize)] +pub struct CheckOutcomeCompact { + /// Number of markdown files checked. + pub files_checked: usize, + /// Compact violations (field + kind + count). + pub violations: Vec, + /// Compact new fields (name + count). + pub new_fields: Vec, +} + +impl Render for CheckOutcomeCompact { + fn render(&self) -> Vec { + let mut blocks = vec![]; + + let violation_part = if self.violations.is_empty() { + "no violations".to_string() + } else { + format!("{} violation(s)", self.violations.len()) + }; + let new_field_part = if self.new_fields.is_empty() { + String::new() + } else { + format!(", {} new field(s)", self.new_fields.len()) + }; + blocks.push(Block::Line(format!( + "Checked {} — {violation_part}{new_field_part}", + format_file_count(self.files_checked), + ))); + + if !self.violations.is_empty() { + let rows: Vec> = self + .violations + .iter() + .map(|v| { + let kind_str = match v.kind { + ViolationKind::MissingRequired => "MissingRequired", + ViolationKind::WrongType => "WrongType", + ViolationKind::Disallowed => "Disallowed", + ViolationKind::NullNotAllowed => "NullNotAllowed", + }; + vec![ + format!("\"{}\"", v.field), + kind_str.to_string(), + format_file_count(v.file_count), + ] + }) + .collect(); + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::Compact, + }); + } + + if !self.new_fields.is_empty() { + let rows: Vec> = self + .new_fields + .iter() + .map(|nf| { + vec![ + format!("\"{}\"", nf.name), + "new".to_string(), + format_file_count(nf.files_found), + ] + }) + .collect(); + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::Compact, + }); + } + + blocks + } +} + +impl From<&CheckOutcome> for CheckOutcomeCompact { + fn from(o: &CheckOutcome) -> Self { + Self { + files_checked: o.files_checked, + violations: o + .violations + .iter() + .map(FieldViolationCompact::from) + .collect(), + new_fields: o.new_fields.iter().map(NewFieldCompact::from).collect(), + } + } +} diff --git a/src/outcome/commands/clean.rs b/src/outcome/commands/clean.rs new file mode 100644 index 0000000..b64414b --- /dev/null +++ b/src/outcome/commands/clean.rs @@ -0,0 +1,140 @@ +//! Clean command outcome types. + +use std::path::PathBuf; + +use serde::Serialize; + +use crate::block::{Block, Render}; +use crate::output::{format_file_count, format_size}; + +/// Full outcome for the clean command. +#[derive(Debug, Serialize)] +pub struct CleanOutcome { + /// Whether `.mdvs/` was actually removed. + pub removed: bool, + /// Path to the `.mdvs/` directory. + pub path: PathBuf, + /// Number of files that were in `.mdvs/`. + pub files_removed: usize, + /// Total size of `.mdvs/` in bytes. + pub size_bytes: u64, +} + +impl Render for CleanOutcome { + fn render(&self) -> Vec { + if self.removed { + vec![ + Block::Line(format!("Cleaned \"{}\"", self.path.display())), + Block::Line(format!( + "{} | {}", + format_file_count(self.files_removed), + format_size(self.size_bytes), + )), + ] + } else { + vec![Block::Line(format!( + "Nothing to clean — \"{}\" does not exist", + self.path.display() + ))] + } + } +} + +/// Compact outcome for the clean command. +#[derive(Debug, Serialize)] +pub struct CleanOutcomeCompact { + /// Whether `.mdvs/` was actually removed. + pub removed: bool, + /// Path to the `.mdvs/` directory. + pub path: PathBuf, +} + +impl Render for CleanOutcomeCompact { + fn render(&self) -> Vec { + if self.removed { + vec![Block::Line(format!("Cleaned \"{}\"", self.path.display()))] + } else { + vec![Block::Line(format!( + "Nothing to clean — \"{}\" does not exist", + self.path.display() + ))] + } + } +} + +impl From<&CleanOutcome> for CleanOutcomeCompact { + fn from(o: &CleanOutcome) -> Self { + Self { + removed: o.removed, + path: o.path.clone(), + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn clean_render_removed() { + let outcome = CleanOutcome { + removed: true, + path: PathBuf::from(".mdvs"), + files_removed: 2, + size_bytes: 1024, + }; + let blocks = outcome.render(); + assert_eq!(blocks.len(), 2); + match &blocks[0] { + Block::Line(s) => assert_eq!(s, "Cleaned \".mdvs\""), + _ => panic!("expected Line"), + } + match &blocks[1] { + Block::Line(s) => assert!(s.contains("2 files") && s.contains("1.0 KB")), + _ => panic!("expected Line"), + } + } + + #[test] + fn clean_render_nothing() { + let outcome = CleanOutcome { + removed: false, + path: PathBuf::from(".mdvs"), + files_removed: 0, + size_bytes: 0, + }; + let blocks = outcome.render(); + assert_eq!(blocks.len(), 1); + match &blocks[0] { + Block::Line(s) => assert!(s.contains("Nothing to clean")), + _ => panic!("expected Line"), + } + } + + #[test] + fn clean_compact_removed() { + let outcome = CleanOutcomeCompact { + removed: true, + path: PathBuf::from(".mdvs"), + }; + let blocks = outcome.render(); + assert_eq!(blocks.len(), 1); + match &blocks[0] { + Block::Line(s) => assert_eq!(s, "Cleaned \".mdvs\""), + _ => panic!("expected Line"), + } + } + + #[test] + fn clean_compact_from_full() { + let full = CleanOutcome { + removed: true, + path: PathBuf::from(".mdvs"), + files_removed: 5, + size_bytes: 4096, + }; + let compact = CleanOutcomeCompact::from(&full); + assert!(compact.removed); + assert_eq!(compact.path, PathBuf::from(".mdvs")); + } +} diff --git a/src/outcome/commands/info.rs b/src/outcome/commands/info.rs new file mode 100644 index 0000000..b87430a --- /dev/null +++ b/src/outcome/commands/info.rs @@ -0,0 +1,147 @@ +//! Info command outcome types. + +use serde::Serialize; + +use crate::block::{Block, Render, TableStyle}; +use crate::cmd::info::{IndexInfo, InfoField}; +use crate::output::format_hints; + +/// Full outcome for the info command. +#[derive(Debug, Serialize)] +pub struct InfoOutcome { + /// Glob pattern from `[scan]` config. + pub scan_glob: String, + /// Number of markdown files matching the scan pattern. + pub files_on_disk: usize, + /// Field definitions from `[[fields.field]]`. + pub fields: Vec, + /// Field names in the `[fields].ignore` list. + pub ignored_fields: Vec, + /// Index info, if a built index exists. + pub index: Option, +} + +impl Render for InfoOutcome { + fn render(&self) -> Vec { + let mut blocks = vec![]; + + let one_liner = match &self.index { + Some(idx) => format!( + "{} files, {} fields, {} chunks", + self.files_on_disk, + self.fields.len(), + idx.chunks, + ), + None => format!("{} files, {} fields", self.files_on_disk, self.fields.len()), + }; + blocks.push(Block::Line(one_liner)); + + if let Some(idx) = &self.index { + let rev = idx.revision.as_deref().unwrap_or("none"); + let rows = vec![ + vec!["model:".into(), idx.model.clone()], + vec!["revision:".into(), rev.to_string()], + vec!["chunk size:".into(), idx.chunk_size.to_string()], + vec!["built:".into(), idx.built_at.clone()], + vec!["config:".into(), idx.config_status.clone()], + vec![ + "files:".into(), + format!("{}/{}", idx.files_indexed, idx.files_on_disk), + ], + ]; + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::Compact, + }); + } + + for f in &self.fields { + let count_str = match (f.count, f.total_files) { + (Some(c), Some(t)) => format!("{c}/{t}"), + _ => String::new(), + }; + let mut detail_lines = Vec::new(); + if !f.required.is_empty() { + detail_lines.push(" required:".to_string()); + for g in &f.required { + detail_lines.push(format!(" - \"{g}\"")); + } + } + detail_lines.push(" allowed:".to_string()); + for g in &f.allowed { + detail_lines.push(format!(" - \"{g}\"")); + } + if f.nullable { + detail_lines.push(" nullable: true".to_string()); + } + if !f.hints.is_empty() { + detail_lines.push(format!(" hints: {}", format_hints(&f.hints))); + } + + blocks.push(Block::Table { + headers: None, + rows: vec![ + vec![format!("\"{}\"", f.name), f.field_type.clone(), count_str], + vec![detail_lines.join("\n"), String::new(), String::new()], + ], + style: TableStyle::Record { + detail_rows: vec![1], + }, + }); + } + + blocks + } +} + +/// Compact outcome for the info command. +#[derive(Debug, Serialize)] +pub struct InfoOutcomeCompact { + /// Glob pattern from `[scan]` config. + pub scan_glob: String, + /// Number of markdown files matching the scan pattern. + pub files_on_disk: usize, + /// Number of fields defined. + pub field_count: usize, + /// Number of ignored fields. + pub ignored_count: usize, + /// Whether an index exists. + pub has_index: bool, + /// Brief index summary. + #[serde(skip_serializing_if = "Option::is_none")] + pub index_summary: Option, +} + +impl Render for InfoOutcomeCompact { + fn render(&self) -> Vec { + let one_liner = if let Some(ref summary) = self.index_summary { + format!( + "{} files, {} fields, {summary}", + self.files_on_disk, self.field_count, + ) + } else { + format!("{} files, {} fields", self.files_on_disk, self.field_count) + }; + vec![Block::Line(one_liner)] + } +} + +impl From<&InfoOutcome> for InfoOutcomeCompact { + fn from(o: &InfoOutcome) -> Self { + let index_summary = o.index.as_ref().map(|idx| { + format!( + "{} files, {} chunks, model: {}", + idx.files_indexed, idx.chunks, idx.model, + ) + }); + Self { + scan_glob: o.scan_glob.clone(), + files_on_disk: o.files_on_disk, + field_count: o.fields.len(), + ignored_count: o.ignored_fields.len(), + has_index: o.index.is_some(), + index_summary, + } + } +} diff --git a/src/outcome/commands/init.rs b/src/outcome/commands/init.rs new file mode 100644 index 0000000..ed02993 --- /dev/null +++ b/src/outcome/commands/init.rs @@ -0,0 +1,171 @@ +//! Init command outcome types. + +use std::path::PathBuf; + +use serde::Serialize; + +use crate::block::{Block, Render, TableStyle}; +use crate::output::{format_file_count, format_hints, DiscoveredField, DiscoveredFieldCompact}; + +/// Full outcome for the init command. +#[derive(Debug, Serialize)] +pub struct InitOutcome { + /// Directory where `mdvs.toml` was written. + pub path: PathBuf, + /// Number of markdown files scanned. + pub files_scanned: usize, + /// Fields inferred from frontmatter. + pub fields: Vec, + /// Whether this was a dry run (no files written). + pub dry_run: bool, +} + +impl Render for InitOutcome { + fn render(&self) -> Vec { + let mut blocks = vec![]; + + // One-liner + let field_summary = if self.fields.is_empty() { + "no fields found".to_string() + } else { + format!("{} field(s)", self.fields.len()) + }; + let dry_run_suffix = if self.dry_run { " (dry run)" } else { "" }; + blocks.push(Block::Line(format!( + "Initialized {} — {field_summary}{dry_run_suffix}", + format_file_count(self.files_scanned) + ))); + + // Per-field record tables + for field in &self.fields { + let mut detail_lines = Vec::new(); + if let Some(ref req) = field.required { + if !req.is_empty() { + detail_lines.push(" required:".to_string()); + for g in req { + detail_lines.push(format!(" - \"{g}\"")); + } + } + } + if let Some(ref allowed) = field.allowed { + detail_lines.push(" allowed:".to_string()); + for g in allowed { + detail_lines.push(format!(" - \"{g}\"")); + } + } + if field.nullable { + detail_lines.push(" nullable: true".to_string()); + } + if !field.hints.is_empty() { + detail_lines.push(format!(" hints: {}", format_hints(&field.hints))); + } + + blocks.push(Block::Table { + headers: None, + rows: vec![ + vec![ + format!("\"{}\"", field.name), + field.field_type.clone(), + format!("{}/{}", field.files_found, field.total_files), + ], + vec![detail_lines.join("\n"), String::new(), String::new()], + ], + style: TableStyle::Record { + detail_rows: vec![1], + }, + }); + } + + // Footer + if self.dry_run { + blocks.push(Block::Line("(dry run, nothing written)".into())); + } else { + blocks.push(Block::Line(format!( + "Initialized mdvs in '{}'", + self.path.display() + ))); + } + + blocks + } +} + +/// Compact outcome for the init command. +#[derive(Debug, Serialize)] +pub struct InitOutcomeCompact { + /// Directory where `mdvs.toml` was written. + pub path: PathBuf, + /// Number of markdown files scanned. + pub files_scanned: usize, + /// Number of fields inferred. + pub field_count: usize, + /// Whether this was a dry run. + pub dry_run: bool, + /// Compact field summaries. + pub fields: Vec, +} + +impl Render for InitOutcomeCompact { + fn render(&self) -> Vec { + let mut blocks = vec![]; + + let field_summary = if self.fields.is_empty() { + "no fields found".to_string() + } else { + format!("{} field(s)", self.field_count) + }; + let dry_run_suffix = if self.dry_run { " (dry run)" } else { "" }; + blocks.push(Block::Line(format!( + "Initialized {} — {field_summary}{dry_run_suffix}", + format_file_count(self.files_scanned) + ))); + + // Compact fields table + if !self.fields.is_empty() { + let rows: Vec> = self + .fields + .iter() + .map(|f| { + let type_str = if f.nullable { + format!("{}?", f.field_type) + } else { + f.field_type.clone() + }; + vec![ + format!("\"{}\"", f.name), + type_str, + format!("{}/{}", f.files_found, f.total_files), + ] + }) + .collect(); + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::Compact, + }); + } + + if self.dry_run { + blocks.push(Block::Line("(dry run, nothing written)".into())); + } else { + blocks.push(Block::Line(format!( + "Initialized mdvs in '{}'", + self.path.display() + ))); + } + + blocks + } +} + +impl From<&InitOutcome> for InitOutcomeCompact { + fn from(o: &InitOutcome) -> Self { + Self { + path: o.path.clone(), + files_scanned: o.files_scanned, + field_count: o.fields.len(), + dry_run: o.dry_run, + fields: o.fields.iter().map(DiscoveredFieldCompact::from).collect(), + } + } +} diff --git a/src/outcome/commands/mod.rs b/src/outcome/commands/mod.rs new file mode 100644 index 0000000..4201b6c --- /dev/null +++ b/src/outcome/commands/mod.rs @@ -0,0 +1,20 @@ +//! Outcome types for command-level results. +//! +//! One file per command. Each defines a full + compact outcome pair +//! with `Render` and `From` impls. + +pub mod build; +pub mod check; +pub mod clean; +pub mod info; +pub mod init; +pub mod search; +pub mod update; + +pub use build::{BuildOutcome, BuildOutcomeCompact}; +pub use check::{CheckOutcome, CheckOutcomeCompact}; +pub use clean::{CleanOutcome, CleanOutcomeCompact}; +pub use info::{InfoOutcome, InfoOutcomeCompact}; +pub use init::{InitOutcome, InitOutcomeCompact}; +pub use search::{SearchOutcome, SearchOutcomeCompact}; +pub use update::{UpdateOutcome, UpdateOutcomeCompact}; diff --git a/src/outcome/commands/search.rs b/src/outcome/commands/search.rs new file mode 100644 index 0000000..583fbbd --- /dev/null +++ b/src/outcome/commands/search.rs @@ -0,0 +1,160 @@ +//! Search command outcome types. + +use serde::Serialize; + +use crate::block::{Block, Render, TableStyle}; +use crate::index::backend::SearchHit; + +/// Full outcome for the search command. +#[derive(Debug, Serialize)] +pub struct SearchOutcome { + /// The query string. + pub query: String, + /// Files ranked by cosine similarity, descending. + pub hits: Vec, + /// Name of the embedding model used. + pub model_name: String, + /// Result limit that was applied. + pub limit: usize, +} + +impl Render for SearchOutcome { + fn render(&self) -> Vec { + let mut blocks = vec![]; + + let hit_word = if self.hits.len() == 1 { "hit" } else { "hits" }; + blocks.push(Block::Line(format!( + "Searched \"{}\" — {} {hit_word}", + self.query, + self.hits.len() + ))); + + if self.hits.is_empty() { + return blocks; + } + + // Per-hit record tables with chunk text + for (i, hit) in self.hits.iter().enumerate() { + let idx = format!("{}", i + 1); + let path = format!("\"{}\"", hit.filename); + let score = format!("{:.3}", hit.score); + + let detail = match (&hit.chunk_text, hit.start_line, hit.end_line) { + (Some(text), Some(start), Some(end)) => { + let indented: String = text + .lines() + .map(|l| format!(" {l}")) + .collect::>() + .join("\n"); + format!(" lines {start}-{end}:\n{indented}") + } + (None, Some(start), Some(end)) => format!(" lines {start}-{end}"), + _ => String::new(), + }; + + let mut rows = vec![vec![idx, path, score]]; + if !detail.is_empty() { + rows.push(vec![detail, String::new(), String::new()]); + } + + blocks.push(Block::Table { + headers: None, + rows: rows.clone(), + style: if rows.len() > 1 { + TableStyle::Record { + detail_rows: vec![1], + } + } else { + TableStyle::Compact + }, + }); + } + + // Footer + blocks.push(Block::Line(format!( + "{} {hit_word} | model: \"{}\" | limit: {}", + self.hits.len(), + self.model_name, + self.limit, + ))); + + blocks + } +} + +/// Compact search hit — filename and score only. +#[derive(Debug, Serialize)] +pub struct SearchHitCompact { + /// Filename of the matched file. + pub filename: String, + /// Cosine similarity score. + pub score: f64, +} + +impl From<&SearchHit> for SearchHitCompact { + fn from(h: &SearchHit) -> Self { + Self { + filename: h.filename.clone(), + score: h.score, + } + } +} + +/// Compact outcome for the search command. +#[derive(Debug, Serialize)] +pub struct SearchOutcomeCompact { + /// The query string. + pub query: String, + /// Compact hits (filename + score only). + pub hits: Vec, + /// Name of the embedding model used. + pub model_name: String, + /// Result limit that was applied. + pub limit: usize, +} + +impl Render for SearchOutcomeCompact { + fn render(&self) -> Vec { + let mut blocks = vec![]; + + let hit_word = if self.hits.len() == 1 { "hit" } else { "hits" }; + blocks.push(Block::Line(format!( + "Searched \"{}\" — {} {hit_word}", + self.query, + self.hits.len() + ))); + + if !self.hits.is_empty() { + let rows: Vec> = self + .hits + .iter() + .enumerate() + .map(|(i, h)| { + vec![ + format!("{}", i + 1), + format!("\"{}\"", h.filename), + format!("{:.3}", h.score), + ] + }) + .collect(); + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::Compact, + }); + } + + blocks + } +} + +impl From<&SearchOutcome> for SearchOutcomeCompact { + fn from(o: &SearchOutcome) -> Self { + Self { + query: o.query.clone(), + hits: o.hits.iter().map(SearchHitCompact::from).collect(), + model_name: o.model_name.clone(), + limit: o.limit, + } + } +} diff --git a/src/outcome/commands/update.rs b/src/outcome/commands/update.rs new file mode 100644 index 0000000..1ac6ada --- /dev/null +++ b/src/outcome/commands/update.rs @@ -0,0 +1,183 @@ +//! Update command outcome types. + +use serde::Serialize; + +use crate::block::{Block, Render, TableStyle}; +use crate::output::{format_file_count, format_hints, ChangedField, DiscoveredField, RemovedField}; + +/// Full outcome for the update command. +#[derive(Debug, Serialize)] +pub struct UpdateOutcome { + /// Number of markdown files scanned. + pub files_scanned: usize, + /// Newly discovered fields not previously in `mdvs.toml`. + pub added: Vec, + /// Fields whose type or glob constraints changed during re-inference. + pub changed: Vec, + /// Fields that disappeared from all files during re-inference. + pub removed: Vec, + /// Number of fields that remained identical. + pub unchanged: usize, + /// Whether this was a dry run (no files written). + pub dry_run: bool, +} + +impl UpdateOutcome { + fn has_changes(&self) -> bool { + !self.added.is_empty() || !self.changed.is_empty() || !self.removed.is_empty() + } +} + +impl Render for UpdateOutcome { + fn render(&self) -> Vec { + let mut blocks = vec![]; + + // One-liner + let total_changes = self.added.len() + self.changed.len() + self.removed.len(); + let summary = if total_changes == 0 { + "no changes".to_string() + } else { + format!("{total_changes} field(s) changed") + }; + let dry_run_suffix = if self.dry_run { " (dry run)" } else { "" }; + blocks.push(Block::Line(format!( + "Scanned {} — {summary}{dry_run_suffix}", + format_file_count(self.files_scanned) + ))); + + if !self.has_changes() { + return blocks; + } + + // Per-added: record tables + for field in &self.added { + let mut detail_lines = Vec::new(); + if let Some(ref globs) = field.allowed { + detail_lines.push(" found in:".to_string()); + for g in globs { + detail_lines.push(format!(" - \"{g}\"")); + } + } + if field.nullable { + detail_lines.push(" nullable: true".to_string()); + } + if !field.hints.is_empty() { + detail_lines.push(format!(" hints: {}", format_hints(&field.hints))); + } + blocks.push(Block::Table { + headers: None, + rows: vec![ + vec![ + format!("\"{}\"", field.name), + "added".to_string(), + field.field_type.clone(), + ], + vec![detail_lines.join("\n"), String::new(), String::new()], + ], + style: TableStyle::Record { + detail_rows: vec![1], + }, + }); + } + + // Per-changed: compact table with aspect columns + for field in &self.changed { + let mut rows = vec![vec![ + "field".into(), + "aspect".into(), + "old".into(), + "new".into(), + ]]; + for (i, change) in field.changes.iter().enumerate() { + let name_col = if i == 0 { + format!("\"{}\"", field.name) + } else { + String::new() + }; + let (old, new) = change.format_old_new(); + rows.push(vec![name_col, change.label().to_string(), old, new]); + } + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::Compact, + }); + } + + // Per-removed: record tables + for field in &self.removed { + let detail = match &field.allowed { + Some(globs) => { + let mut lines = vec![" previously in:".to_string()]; + for g in globs { + lines.push(format!(" - \"{g}\"")); + } + lines.join("\n") + } + None => String::new(), + }; + blocks.push(Block::Table { + headers: None, + rows: vec![ + vec![ + format!("\"{}\"", field.name), + "removed".to_string(), + String::new(), + ], + vec![detail, String::new(), String::new()], + ], + style: TableStyle::Record { + detail_rows: vec![1], + }, + }); + } + + blocks + } +} + +/// Compact outcome for the update command. +#[derive(Debug, Serialize)] +pub struct UpdateOutcomeCompact { + /// Number of markdown files scanned. + pub files_scanned: usize, + /// Number of newly discovered fields. + pub added_count: usize, + /// Number of changed fields. + pub changed_count: usize, + /// Number of removed fields. + pub removed_count: usize, + /// Number of unchanged fields. + pub unchanged: usize, + /// Whether this was a dry run. + pub dry_run: bool, +} + +impl Render for UpdateOutcomeCompact { + fn render(&self) -> Vec { + let total = self.added_count + self.changed_count + self.removed_count; + let summary = if total == 0 { + "no changes".to_string() + } else { + format!("{total} field(s) changed") + }; + let dry_run_suffix = if self.dry_run { " (dry run)" } else { "" }; + vec![Block::Line(format!( + "Scanned {} — {summary}{dry_run_suffix}", + format_file_count(self.files_scanned) + ))] + } +} + +impl From<&UpdateOutcome> for UpdateOutcomeCompact { + fn from(o: &UpdateOutcome) -> Self { + Self { + files_scanned: o.files_scanned, + added_count: o.added.len(), + changed_count: o.changed.len(), + removed_count: o.removed.len(), + unchanged: o.unchanged, + dry_run: o.dry_run, + } + } +} diff --git a/src/outcome/config.rs b/src/outcome/config.rs new file mode 100644 index 0000000..c864f14 --- /dev/null +++ b/src/outcome/config.rs @@ -0,0 +1,81 @@ +//! Outcome types for config-related leaf steps (ReadConfig, WriteConfig, etc.). +//! +//! Only ReadConfig is defined initially. WriteConfig, MutateConfig, and +//! CheckConfigChanged are added when init/build commands are converted. + +use serde::Serialize; + +use crate::block::{Block, Render}; + +/// Full outcome for the read_config step. +#[derive(Debug, Serialize)] +pub struct ReadConfigOutcome { + /// Path to the config file that was read. + pub config_path: String, +} + +impl Render for ReadConfigOutcome { + fn render(&self) -> Vec { + vec![Block::Line(format!("Read config: {}", self.config_path))] + } +} + +/// Compact outcome for the read_config step (identical — no verbose-only fields). +#[derive(Debug, Serialize)] +pub struct ReadConfigOutcomeCompact { + /// Path to the config file that was read. + pub config_path: String, +} + +impl Render for ReadConfigOutcomeCompact { + fn render(&self) -> Vec { + vec![] // Leaf compact outcomes are silent + } +} + +impl From<&ReadConfigOutcome> for ReadConfigOutcomeCompact { + fn from(o: &ReadConfigOutcome) -> Self { + Self { + config_path: o.config_path.clone(), + } + } +} + +/// Full outcome for the write_config step. +#[derive(Debug, Serialize)] +pub struct WriteConfigOutcome { + /// Path to the config file that was written. + pub config_path: String, + /// Number of fields written to the config. + pub fields_written: usize, +} + +impl Render for WriteConfigOutcome { + fn render(&self) -> Vec { + vec![Block::Line(format!("Write config: {}", self.config_path))] + } +} + +/// Compact outcome for the write_config step (identical — no verbose-only fields). +#[derive(Debug, Serialize)] +pub struct WriteConfigOutcomeCompact { + /// Path to the config file that was written. + pub config_path: String, + /// Number of fields written to the config. + pub fields_written: usize, +} + +impl Render for WriteConfigOutcomeCompact { + fn render(&self) -> Vec { + vec![] // Leaf compact outcomes are silent + } +} + +impl From<&WriteConfigOutcome> for WriteConfigOutcomeCompact { + fn from(o: &WriteConfigOutcome) -> Self { + Self { + config_path: o.config_path.clone(), + fields_written: o.fields_written, + } + } +} diff --git a/src/outcome/embed.rs b/src/outcome/embed.rs new file mode 100644 index 0000000..3030702 --- /dev/null +++ b/src/outcome/embed.rs @@ -0,0 +1,83 @@ +//! Outcome types for the embed leaf steps (EmbedFiles, EmbedQuery). +//! +//! Only EmbedFiles is defined initially. EmbedQuery added when search is converted. + +use serde::Serialize; + +use crate::block::{Block, Render}; + +/// Full outcome for the embed_files step. +#[derive(Debug, Serialize)] +pub struct EmbedFilesOutcome { + /// Number of files embedded. + pub files_embedded: usize, + /// Number of chunks produced. + pub chunks_produced: usize, +} + +impl Render for EmbedFilesOutcome { + fn render(&self) -> Vec { + vec![Block::Line(format!( + "Embed: {} files, {} chunks", + self.files_embedded, self.chunks_produced + ))] + } +} + +/// Compact outcome for the embed_files step (identical). +#[derive(Debug, Serialize)] +pub struct EmbedFilesOutcomeCompact { + /// Number of files embedded. + pub files_embedded: usize, + /// Number of chunks produced. + pub chunks_produced: usize, +} + +impl Render for EmbedFilesOutcomeCompact { + fn render(&self) -> Vec { + vec![] + } +} + +/// Full outcome for the embed_query step. +#[derive(Debug, Serialize)] +pub struct EmbedQueryOutcome { + /// The query string that was embedded. + pub query: String, +} + +impl Render for EmbedQueryOutcome { + fn render(&self) -> Vec { + vec![Block::Line(format!("Embed query: \"{}\"", self.query))] + } +} + +/// Compact outcome for the embed_query step (identical). +#[derive(Debug, Serialize)] +pub struct EmbedQueryOutcomeCompact { + /// The query string that was embedded. + pub query: String, +} + +impl Render for EmbedQueryOutcomeCompact { + fn render(&self) -> Vec { + vec![] + } +} + +impl From<&EmbedQueryOutcome> for EmbedQueryOutcomeCompact { + fn from(o: &EmbedQueryOutcome) -> Self { + Self { + query: o.query.clone(), + } + } +} + +impl From<&EmbedFilesOutcome> for EmbedFilesOutcomeCompact { + fn from(o: &EmbedFilesOutcome) -> Self { + Self { + files_embedded: o.files_embedded, + chunks_produced: o.chunks_produced, + } + } +} diff --git a/src/outcome/index.rs b/src/outcome/index.rs new file mode 100644 index 0000000..0946695 --- /dev/null +++ b/src/outcome/index.rs @@ -0,0 +1,226 @@ +//! Outcome types for index-related leaf steps (DeleteIndex, ReadIndex, etc.). +//! +//! Only DeleteIndex is defined initially. Other index outcomes are added +//! incrementally as commands are converted. + +use serde::Serialize; + +use crate::block::{Block, Render}; +use crate::output::{format_file_count, format_size}; + +/// Full outcome for the delete_index step. +#[derive(Debug, Serialize)] +pub struct DeleteIndexOutcome { + /// Whether `.mdvs/` existed and was removed. + pub removed: bool, + /// Path to the `.mdvs/` directory. + pub path: String, + /// Number of files removed. + pub files_removed: usize, + /// Total bytes freed. + pub size_bytes: u64, +} + +impl Render for DeleteIndexOutcome { + fn render(&self) -> Vec { + if self.removed { + vec![Block::Line(format!( + "Delete index: {} ({}, {})", + self.path, + format_file_count(self.files_removed), + format_size(self.size_bytes), + ))] + } else { + vec![Block::Line(format!( + "Delete index: {} does not exist", + self.path + ))] + } + } +} + +/// Compact outcome for the delete_index step (identical fields — leaf step). +#[derive(Debug, Serialize)] +pub struct DeleteIndexOutcomeCompact { + /// Whether `.mdvs/` existed and was removed. + pub removed: bool, + /// Path to the `.mdvs/` directory. + pub path: String, + /// Number of files removed. + pub files_removed: usize, + /// Total bytes freed. + pub size_bytes: u64, +} + +impl Render for DeleteIndexOutcomeCompact { + fn render(&self) -> Vec { + vec![] // Leaf compact outcomes are silent + } +} + +impl From<&DeleteIndexOutcome> for DeleteIndexOutcomeCompact { + fn from(o: &DeleteIndexOutcome) -> Self { + Self { + removed: o.removed, + path: o.path.clone(), + files_removed: o.files_removed, + size_bytes: o.size_bytes, + } + } +} + +/// Full outcome for the read_index step. +#[derive(Debug, Serialize)] +pub struct ReadIndexOutcome { + /// Whether the index exists. + pub exists: bool, + /// Number of files in the index (0 if not exists). + pub files_indexed: usize, + /// Number of chunks in the index (0 if not exists). + pub chunks: usize, +} + +impl Render for ReadIndexOutcome { + fn render(&self) -> Vec { + if self.exists { + vec![Block::Line(format!( + "Read index: {} files, {} chunks", + self.files_indexed, self.chunks + ))] + } else { + vec![Block::Line("Read index: not found".into())] + } + } +} + +/// Compact outcome for the read_index step (identical — no verbose-only fields). +#[derive(Debug, Serialize)] +pub struct ReadIndexOutcomeCompact { + /// Whether the index exists. + pub exists: bool, + /// Number of files in the index. + pub files_indexed: usize, + /// Number of chunks in the index. + pub chunks: usize, +} + +impl Render for ReadIndexOutcomeCompact { + fn render(&self) -> Vec { + vec![] // Leaf compact outcomes are silent + } +} + +impl From<&ReadIndexOutcome> for ReadIndexOutcomeCompact { + fn from(o: &ReadIndexOutcome) -> Self { + Self { + exists: o.exists, + files_indexed: o.files_indexed, + chunks: o.chunks, + } + } +} + +/// Full outcome for the write_index step. +#[derive(Debug, Serialize)] +pub struct WriteIndexOutcome { + /// Number of files written. + pub files_written: usize, + /// Number of chunks written. + pub chunks_written: usize, +} + +impl Render for WriteIndexOutcome { + fn render(&self) -> Vec { + vec![Block::Line(format!( + "Write index: {} files, {} chunks", + self.files_written, self.chunks_written + ))] + } +} + +/// Compact outcome for the write_index step (identical). +#[derive(Debug, Serialize)] +pub struct WriteIndexOutcomeCompact { + /// Number of files written. + pub files_written: usize, + /// Number of chunks written. + pub chunks_written: usize, +} + +impl Render for WriteIndexOutcomeCompact { + fn render(&self) -> Vec { + vec![] + } +} + +impl From<&WriteIndexOutcome> for WriteIndexOutcomeCompact { + fn from(o: &WriteIndexOutcome) -> Self { + Self { + files_written: o.files_written, + chunks_written: o.chunks_written, + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn delete_index_render_removed() { + let outcome = DeleteIndexOutcome { + removed: true, + path: ".mdvs".into(), + files_removed: 2, + size_bytes: 1024, + }; + let blocks = outcome.render(); + assert_eq!(blocks.len(), 1); + match &blocks[0] { + Block::Line(s) => assert!(s.contains(".mdvs") && s.contains("2 files")), + _ => panic!("expected Line"), + } + } + + #[test] + fn delete_index_render_not_exists() { + let outcome = DeleteIndexOutcome { + removed: false, + path: ".mdvs".into(), + files_removed: 0, + size_bytes: 0, + }; + let blocks = outcome.render(); + assert_eq!(blocks.len(), 1); + match &blocks[0] { + Block::Line(s) => assert!(s.contains("does not exist")), + _ => panic!("expected Line"), + } + } + + #[test] + fn delete_index_compact_is_silent() { + let outcome = DeleteIndexOutcomeCompact { + removed: true, + path: ".mdvs".into(), + files_removed: 2, + size_bytes: 1024, + }; + assert!(outcome.render().is_empty()); + } + + #[test] + fn delete_index_from_full() { + let full = DeleteIndexOutcome { + removed: true, + path: ".mdvs".into(), + files_removed: 3, + size_bytes: 2048, + }; + let compact = DeleteIndexOutcomeCompact::from(&full); + assert_eq!(compact.removed, true); + assert_eq!(compact.path, ".mdvs"); + assert_eq!(compact.files_removed, 3); + assert_eq!(compact.size_bytes, 2048); + } +} diff --git a/src/outcome/infer.rs b/src/outcome/infer.rs new file mode 100644 index 0000000..2c93a4b --- /dev/null +++ b/src/outcome/infer.rs @@ -0,0 +1,42 @@ +//! Outcome types for the infer leaf step. + +use serde::Serialize; + +use crate::block::{Block, Render}; + +/// Full outcome for the infer step. +#[derive(Debug, Serialize)] +pub struct InferOutcome { + /// Number of fields inferred from frontmatter. + pub fields_inferred: usize, +} + +impl Render for InferOutcome { + fn render(&self) -> Vec { + vec![Block::Line(format!( + "Infer: {} field(s)", + self.fields_inferred + ))] + } +} + +/// Compact outcome for the infer step (identical — no verbose-only fields). +#[derive(Debug, Serialize)] +pub struct InferOutcomeCompact { + /// Number of fields inferred from frontmatter. + pub fields_inferred: usize, +} + +impl Render for InferOutcomeCompact { + fn render(&self) -> Vec { + vec![] // Leaf compact outcomes are silent + } +} + +impl From<&InferOutcome> for InferOutcomeCompact { + fn from(o: &InferOutcome) -> Self { + Self { + fields_inferred: o.fields_inferred, + } + } +} diff --git a/src/outcome/mod.rs b/src/outcome/mod.rs new file mode 100644 index 0000000..1fd1876 --- /dev/null +++ b/src/outcome/mod.rs @@ -0,0 +1,388 @@ +//! Outcome types for all pipeline steps and commands. +//! +//! The `Outcome` and `CompactOutcome` enums contain one variant per step/command. +//! Variants are added incrementally as commands are converted to the Step tree +//! architecture. + +pub mod classify; +pub mod commands; +pub mod config; +pub mod embed; +pub mod index; +pub mod infer; +pub mod model; +pub mod scan; +pub mod search; +pub mod validate; + +use serde::Serialize; + +use crate::block::{Block, Render}; +use crate::step::Step; + +pub use classify::{ClassifyOutcome, ClassifyOutcomeCompact}; +pub use commands::{ + BuildOutcome, BuildOutcomeCompact, CheckOutcome, CheckOutcomeCompact, CleanOutcome, + CleanOutcomeCompact, InfoOutcome, InfoOutcomeCompact, InitOutcome, InitOutcomeCompact, + SearchOutcome, SearchOutcomeCompact, UpdateOutcome, UpdateOutcomeCompact, +}; +pub use config::{ + ReadConfigOutcome, ReadConfigOutcomeCompact, WriteConfigOutcome, WriteConfigOutcomeCompact, +}; +pub use embed::{ + EmbedFilesOutcome, EmbedFilesOutcomeCompact, EmbedQueryOutcome, EmbedQueryOutcomeCompact, +}; +pub use index::{ + DeleteIndexOutcome, DeleteIndexOutcomeCompact, ReadIndexOutcome, ReadIndexOutcomeCompact, + WriteIndexOutcome, WriteIndexOutcomeCompact, +}; +pub use infer::{InferOutcome, InferOutcomeCompact}; +pub use model::{LoadModelOutcome, LoadModelOutcomeCompact}; +pub use scan::{ScanOutcome, ScanOutcomeCompact}; +pub use search::{ExecuteSearchOutcome, ExecuteSearchOutcomeCompact}; +pub use validate::{ValidateOutcome, ValidateOutcomeCompact}; + +/// Full outcome for all steps and commands. +/// +/// Each variant wraps a named outcome struct carrying all data needed for +/// verbose rendering and JSON serialization. Command-level outcomes are +/// `Box`ed to avoid bloating the enum. +#[derive(Debug, Serialize)] +pub enum Outcome { + /// Delete the `.mdvs/` directory. + DeleteIndex(DeleteIndexOutcome), + /// Read and parse `mdvs.toml`. + ReadConfig(ReadConfigOutcome), + /// Scan the project directory for markdown files. + Scan(ScanOutcome), + /// Read the existing index (parquet files). + ReadIndex(ReadIndexOutcome), + /// Validate frontmatter against the schema. + Validate(ValidateOutcome), + /// Clean command — delete `.mdvs/` and report. + Clean(CleanOutcome), + /// Check command — validate and report violations. + Check(Box), + /// Infer field types and glob patterns. + Infer(InferOutcome), + /// Write `mdvs.toml` to disk. + WriteConfig(WriteConfigOutcome), + /// Info command — display config and index status. + Info(Box), + /// Init command — scan, infer, write config. + Init(Box), + /// Classify files for incremental build. + Classify(ClassifyOutcome), + /// Load the embedding model. + LoadModel(LoadModelOutcome), + /// Embed files that need embedding. + EmbedFiles(EmbedFilesOutcome), + /// Write the index to disk. + WriteIndex(WriteIndexOutcome), + /// Update command — re-scan, re-infer, update config. + Update(Box), + /// Embed a query string. + EmbedQuery(EmbedQueryOutcome), + /// Execute search against the index. + ExecuteSearch(ExecuteSearchOutcome), + /// Build command — validate, embed, write index. + Build(Box), + /// Search command — embed query, search index. + Search(Box), +} + +impl Render for Outcome { + fn render(&self) -> Vec { + match self { + Self::DeleteIndex(o) => o.render(), + Self::ReadConfig(o) => o.render(), + Self::Scan(o) => o.render(), + Self::ReadIndex(o) => o.render(), + Self::Validate(o) => o.render(), + Self::Infer(o) => o.render(), + Self::WriteConfig(o) => o.render(), + Self::Clean(o) => o.render(), + Self::Check(o) => o.render(), + Self::Classify(o) => o.render(), + Self::LoadModel(o) => o.render(), + Self::EmbedFiles(o) => o.render(), + Self::WriteIndex(o) => o.render(), + Self::Info(o) => o.render(), + Self::Init(o) => o.render(), + Self::Update(o) => o.render(), + Self::EmbedQuery(o) => o.render(), + Self::ExecuteSearch(o) => o.render(), + Self::Build(o) => o.render(), + Self::Search(o) => o.render(), + } + } +} + +impl Outcome { + /// Convert this full outcome to its compact counterpart. + /// + /// Command outcomes may read `substeps` to derive summary data. + /// Leaf outcomes ignore `substeps`. + pub fn to_compact(&self, _substeps: &[Step]) -> CompactOutcome { + match self { + Self::DeleteIndex(o) => CompactOutcome::DeleteIndex(o.into()), + Self::ReadConfig(o) => CompactOutcome::ReadConfig(o.into()), + Self::Scan(o) => CompactOutcome::Scan(o.into()), + Self::ReadIndex(o) => CompactOutcome::ReadIndex(o.into()), + Self::Validate(o) => CompactOutcome::Validate(o.into()), + Self::Infer(o) => CompactOutcome::Infer(o.into()), + Self::WriteConfig(o) => CompactOutcome::WriteConfig(o.into()), + Self::Clean(o) => CompactOutcome::Clean(o.into()), + Self::Check(o) => CompactOutcome::Check(Box::new(o.as_ref().into())), + Self::Info(o) => CompactOutcome::Info(Box::new(o.as_ref().into())), + Self::Classify(o) => CompactOutcome::Classify(o.into()), + Self::LoadModel(o) => CompactOutcome::LoadModel(o.into()), + Self::EmbedFiles(o) => CompactOutcome::EmbedFiles(o.into()), + Self::WriteIndex(o) => CompactOutcome::WriteIndex(o.into()), + Self::Init(o) => CompactOutcome::Init(Box::new(o.as_ref().into())), + Self::Update(o) => CompactOutcome::Update(Box::new(o.as_ref().into())), + Self::EmbedQuery(o) => CompactOutcome::EmbedQuery(o.into()), + Self::ExecuteSearch(o) => CompactOutcome::ExecuteSearch(o.into()), + Self::Build(o) => CompactOutcome::Build(Box::new(o.as_ref().into())), + Self::Search(o) => CompactOutcome::Search(Box::new(o.as_ref().into())), + } + } + + /// Returns `true` if this outcome contains validation violations. + /// + /// Used for exit code logic. Only Validate and Check outcomes can + /// return `true` — added when those variants are implemented. + pub fn contains_violations(&self) -> bool { + match self { + Self::Validate(v) => !v.violations.is_empty(), + Self::Check(c) => !c.violations.is_empty(), + Self::DeleteIndex(_) + | Self::ReadConfig(_) + | Self::Scan(_) + | Self::ReadIndex(_) + | Self::Infer(_) + | Self::WriteConfig(_) + | Self::Clean(_) + | Self::Classify(_) + | Self::LoadModel(_) + | Self::EmbedFiles(_) + | Self::WriteIndex(_) + | Self::Info(_) + | Self::Init(_) + | Self::EmbedQuery(_) + | Self::ExecuteSearch(_) + | Self::Update(_) + | Self::Build(_) + | Self::Search(_) => false, + } + } +} + +/// Compact outcome for all steps and commands. +/// +/// Mirrors `Outcome` with compact counterpart structs. Leaf compact outcomes +/// render to empty vecs (silent). Command compact outcomes render summaries. +#[derive(Debug, Serialize)] +pub enum CompactOutcome { + /// Delete the `.mdvs/` directory (compact). + DeleteIndex(DeleteIndexOutcomeCompact), + /// Read and parse `mdvs.toml` (compact). + ReadConfig(ReadConfigOutcomeCompact), + /// Scan the project directory (compact). + Scan(ScanOutcomeCompact), + /// Read the existing index (compact). + ReadIndex(ReadIndexOutcomeCompact), + /// Validate frontmatter (compact). + Validate(ValidateOutcomeCompact), + /// Clean command (compact). + Clean(CleanOutcomeCompact), + /// Check command (compact). + Check(Box), + /// Infer (compact). + Infer(InferOutcomeCompact), + /// Write config (compact). + WriteConfig(WriteConfigOutcomeCompact), + /// Info command (compact). + Info(Box), + /// Init command (compact). + Init(Box), + /// Classify (compact). + Classify(ClassifyOutcomeCompact), + /// Load model (compact). + LoadModel(LoadModelOutcomeCompact), + /// Embed files (compact). + EmbedFiles(EmbedFilesOutcomeCompact), + /// Write index (compact). + WriteIndex(WriteIndexOutcomeCompact), + /// Update command (compact). + Update(Box), + /// Embed query (compact). + EmbedQuery(EmbedQueryOutcomeCompact), + /// Execute search (compact). + ExecuteSearch(ExecuteSearchOutcomeCompact), + /// Build command (compact). + Build(Box), + /// Search command (compact). + Search(Box), +} + +impl Render for CompactOutcome { + fn render(&self) -> Vec { + match self { + Self::DeleteIndex(o) => o.render(), + Self::ReadConfig(o) => o.render(), + Self::Scan(o) => o.render(), + Self::ReadIndex(o) => o.render(), + Self::Validate(o) => o.render(), + Self::Infer(o) => o.render(), + Self::WriteConfig(o) => o.render(), + Self::Clean(o) => o.render(), + Self::Check(o) => o.render(), + Self::Classify(o) => o.render(), + Self::LoadModel(o) => o.render(), + Self::EmbedFiles(o) => o.render(), + Self::WriteIndex(o) => o.render(), + Self::Info(o) => o.render(), + Self::Init(o) => o.render(), + Self::Update(o) => o.render(), + Self::EmbedQuery(o) => o.render(), + Self::ExecuteSearch(o) => o.render(), + Self::Build(o) => o.render(), + Self::Search(o) => o.render(), + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::step::{ErrorKind, StepError, StepOutcome}; + use std::path::PathBuf; + + #[test] + fn outcome_render_delegates() { + let outcome = Outcome::Clean(CleanOutcome { + removed: true, + path: PathBuf::from(".mdvs"), + files_removed: 1, + size_bytes: 100, + }); + let blocks = outcome.render(); + assert_eq!(blocks.len(), 2); + } + + #[test] + fn compact_outcome_render_delegates() { + let outcome = CompactOutcome::Clean(CleanOutcomeCompact { + removed: true, + path: PathBuf::from(".mdvs"), + }); + let blocks = outcome.render(); + assert_eq!(blocks.len(), 1); // Compact command renders summary + } + + #[test] + fn compact_leaf_is_silent() { + let outcome = CompactOutcome::DeleteIndex(DeleteIndexOutcomeCompact { + removed: true, + path: ".mdvs".into(), + files_removed: 1, + size_bytes: 100, + }); + assert!(outcome.render().is_empty()); + } + + #[test] + fn to_compact_roundtrip() { + let outcome = Outcome::Clean(CleanOutcome { + removed: true, + path: PathBuf::from(".mdvs"), + files_removed: 3, + size_bytes: 2048, + }); + let compact = outcome.to_compact(&[]); + match &compact { + CompactOutcome::Clean(c) => { + assert!(c.removed); + assert_eq!(c.path, PathBuf::from(".mdvs")); + } + _ => panic!("expected Clean compact"), + } + } + + #[test] + fn contains_violations_false_for_clean() { + let outcome = Outcome::Clean(CleanOutcome { + removed: true, + path: PathBuf::from(".mdvs"), + files_removed: 1, + size_bytes: 100, + }); + assert!(!outcome.contains_violations()); + } + + #[test] + fn step_to_compact_full_tree() { + let leaf = Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Ok(Outcome::DeleteIndex(DeleteIndexOutcome { + removed: true, + path: ".mdvs".into(), + files_removed: 2, + size_bytes: 1024, + })), + elapsed_ms: 5, + }, + }; + let command = Step { + substeps: vec![leaf], + outcome: StepOutcome::Complete { + result: Ok(Outcome::Clean(CleanOutcome { + removed: true, + path: PathBuf::from(".mdvs"), + files_removed: 2, + size_bytes: 1024, + })), + elapsed_ms: 5, + }, + }; + + let compact = command.to_compact(); + assert_eq!(compact.substeps.len(), 1); + // Leaf compact renders silent + assert!(compact.substeps[0].render().is_empty()); + // Command compact renders summary + let blocks = compact.render(); + assert!(!blocks.is_empty()); + } + + #[test] + fn step_to_compact_error_preserved() { + let step: Step = Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: "test error".into(), + }), + elapsed_ms: 1, + }, + }; + let compact = step.to_compact(); + match &compact.outcome { + StepOutcome::Complete { result: Err(e), .. } => assert_eq!(e.message, "test error"), + _ => panic!("expected error preserved"), + } + } + + #[test] + fn step_to_compact_skipped_preserved() { + let step: Step = Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }; + let compact = step.to_compact(); + assert!(matches!(compact.outcome, StepOutcome::Skipped)); + } +} diff --git a/src/outcome/model.rs b/src/outcome/model.rs new file mode 100644 index 0000000..12f2bfb --- /dev/null +++ b/src/outcome/model.rs @@ -0,0 +1,44 @@ +//! Outcome types for the load_model leaf step. + +use serde::Serialize; + +use crate::block::{Block, Render}; + +/// Full outcome for the load_model step. +#[derive(Debug, Serialize)] +pub struct LoadModelOutcome { + /// Name of the embedding model loaded. + pub model_name: String, + /// Embedding dimension. + pub dimension: usize, +} + +impl Render for LoadModelOutcome { + fn render(&self) -> Vec { + vec![Block::Line(format!("Load model: {}", self.model_name))] + } +} + +/// Compact outcome for the load_model step (identical). +#[derive(Debug, Serialize)] +pub struct LoadModelOutcomeCompact { + /// Name of the embedding model loaded. + pub model_name: String, + /// Embedding dimension. + pub dimension: usize, +} + +impl Render for LoadModelOutcomeCompact { + fn render(&self) -> Vec { + vec![] + } +} + +impl From<&LoadModelOutcome> for LoadModelOutcomeCompact { + fn from(o: &LoadModelOutcome) -> Self { + Self { + model_name: o.model_name.clone(), + dimension: o.dimension, + } + } +} diff --git a/src/outcome/scan.rs b/src/outcome/scan.rs new file mode 100644 index 0000000..d9534bb --- /dev/null +++ b/src/outcome/scan.rs @@ -0,0 +1,48 @@ +//! Outcome types for the scan leaf step. + +use serde::Serialize; + +use crate::block::{Block, Render}; +use crate::output::format_file_count; + +/// Full outcome for the scan step. +#[derive(Debug, Serialize)] +pub struct ScanOutcome { + /// Number of markdown files found. + pub files_found: usize, + /// Glob pattern used for scanning. + pub glob: String, +} + +impl Render for ScanOutcome { + fn render(&self) -> Vec { + vec![Block::Line(format!( + "Scan: {}", + format_file_count(self.files_found) + ))] + } +} + +/// Compact outcome for the scan step (identical — no verbose-only fields). +#[derive(Debug, Serialize)] +pub struct ScanOutcomeCompact { + /// Number of markdown files found. + pub files_found: usize, + /// Glob pattern used for scanning. + pub glob: String, +} + +impl Render for ScanOutcomeCompact { + fn render(&self) -> Vec { + vec![] // Leaf compact outcomes are silent + } +} + +impl From<&ScanOutcome> for ScanOutcomeCompact { + fn from(o: &ScanOutcome) -> Self { + Self { + files_found: o.files_found, + glob: o.glob.clone(), + } + } +} diff --git a/src/outcome/search.rs b/src/outcome/search.rs new file mode 100644 index 0000000..13fdf0a --- /dev/null +++ b/src/outcome/search.rs @@ -0,0 +1,37 @@ +//! Outcome types for search-related leaf steps (ExecuteSearch). + +use serde::Serialize; + +use crate::block::{Block, Render}; + +/// Full outcome for the execute_search step. +#[derive(Debug, Serialize)] +pub struct ExecuteSearchOutcome { + /// Number of hits found. + pub hits: usize, +} + +impl Render for ExecuteSearchOutcome { + fn render(&self) -> Vec { + vec![Block::Line(format!("Execute search: {} hits", self.hits))] + } +} + +/// Compact outcome for the execute_search step (identical). +#[derive(Debug, Serialize)] +pub struct ExecuteSearchOutcomeCompact { + /// Number of hits found. + pub hits: usize, +} + +impl Render for ExecuteSearchOutcomeCompact { + fn render(&self) -> Vec { + vec![] + } +} + +impl From<&ExecuteSearchOutcome> for ExecuteSearchOutcomeCompact { + fn from(o: &ExecuteSearchOutcome) -> Self { + Self { hits: o.hits } + } +} diff --git a/src/outcome/validate.rs b/src/outcome/validate.rs new file mode 100644 index 0000000..98039b5 --- /dev/null +++ b/src/outcome/validate.rs @@ -0,0 +1,64 @@ +//! Outcome types for the validate leaf step. + +use serde::Serialize; + +use crate::block::{Block, Render}; +use crate::output::{ + format_file_count, FieldViolation, FieldViolationCompact, NewField, NewFieldCompact, +}; + +/// Full outcome for the validate step. +#[derive(Debug, Serialize)] +pub struct ValidateOutcome { + /// Number of markdown files validated. + pub files_checked: usize, + /// Violations found during validation. + pub violations: Vec, + /// Fields found in frontmatter but not defined in `mdvs.toml`. + pub new_fields: Vec, +} + +impl Render for ValidateOutcome { + fn render(&self) -> Vec { + let violation_part = if self.violations.is_empty() { + "no violations".to_string() + } else { + format!("{} violation(s)", self.violations.len()) + }; + vec![Block::Line(format!( + "Validate: {} — {violation_part}", + format_file_count(self.files_checked), + ))] + } +} + +/// Compact outcome for the validate step. +#[derive(Debug, Serialize)] +pub struct ValidateOutcomeCompact { + /// Number of markdown files validated. + pub files_checked: usize, + /// Compact violations (count only, no file paths). + pub violations: Vec, + /// Compact new fields (count only, no file paths). + pub new_fields: Vec, +} + +impl Render for ValidateOutcomeCompact { + fn render(&self) -> Vec { + vec![] // Leaf compact outcomes are silent + } +} + +impl From<&ValidateOutcome> for ValidateOutcomeCompact { + fn from(o: &ValidateOutcome) -> Self { + Self { + files_checked: o.files_checked, + violations: o + .violations + .iter() + .map(FieldViolationCompact::from) + .collect(), + new_fields: o.new_fields.iter().map(NewFieldCompact::from).collect(), + } + } +} diff --git a/src/render.rs b/src/render.rs new file mode 100644 index 0000000..04745ee --- /dev/null +++ b/src/render.rs @@ -0,0 +1,243 @@ +//! Shared formatters that consume `Vec` and produce formatted output. +//! +//! Two formatters: `format_text` (terminal, box-drawing tables via tabled) +//! and `format_markdown` (pipe tables, section headers). Adding a new output +//! format means writing one function here — no command code changes needed. + +use tabled::settings::{ + object::Cell, peaker::PriorityMax, span::ColumnSpan, style::Style, themes::BorderCorrection, + width::Width, Modify, +}; + +use crate::block::{Block, TableStyle}; +use crate::table::{style_compact, term_width, Builder}; + +/// Format blocks as terminal text with box-drawing tables. +pub fn format_text(blocks: &[Block]) -> String { + let mut out = String::new(); + for block in blocks { + format_text_block(block, &mut out, 0); + } + out +} + +fn format_text_block(block: &Block, out: &mut String, indent: usize) { + let prefix = " ".repeat(indent); + match block { + Block::Line(s) => { + out.push_str(&prefix); + out.push_str(s); + out.push('\n'); + } + Block::Table { + headers, + rows, + style, + } => { + let mut builder = Builder::default(); + if let Some(hdrs) = headers { + builder.push_record(hdrs.iter().map(String::as_str)); + } + for row in rows { + builder.push_record(row.iter().map(String::as_str)); + } + let mut table = builder.build(); + + match style { + TableStyle::Compact => { + style_compact(&mut table); + } + TableStyle::Record { detail_rows } => { + let col_count = headers + .as_ref() + .map(|h| h.len()) + .or_else(|| rows.first().map(|r| r.len())) + .unwrap_or(1) as isize; + let w = term_width(); + let header_offset = if headers.is_some() { 1 } else { 0 }; + table.with(Style::rounded()); + for &row_idx in detail_rows { + let actual_row = row_idx + header_offset; + table.with( + Modify::new(Cell::new(actual_row, 0)).with(ColumnSpan::new(col_count)), + ); + } + table.with(BorderCorrection {}); + table.with(Width::increase(w)); + table.with(Width::wrap(w).priority(PriorityMax::left())); + } + } + + let rendered = table.to_string(); + if indent > 0 { + for line in rendered.lines() { + out.push_str(&prefix); + out.push_str(line); + out.push('\n'); + } + } else { + out.push_str(&rendered); + out.push('\n'); + } + } + Block::Section { label, children } => { + out.push_str(&prefix); + out.push_str(label); + out.push_str(":\n"); + for child in children { + format_text_block(child, out, indent + 2); + } + } + } +} + +/// Format blocks as markdown (pipe tables, section headers). +/// +/// Basic implementation — sufficient for initial use. Full markdown formatting +/// is tracked in TODO-0101. +pub fn format_markdown(blocks: &[Block]) -> String { + let mut out = String::new(); + for block in blocks { + format_markdown_block(block, &mut out); + } + out +} + +fn format_markdown_block(block: &Block, out: &mut String) { + match block { + Block::Line(s) => { + out.push_str(s); + out.push('\n'); + } + Block::Table { headers, rows, .. } => { + if let Some(hdrs) = headers { + out.push_str("| "); + out.push_str(&hdrs.join(" | ")); + out.push_str(" |\n"); + out.push_str("| "); + out.push_str( + &hdrs + .iter() + .map(|_| "---".to_string()) + .collect::>() + .join(" | "), + ); + out.push_str(" |\n"); + } + for row in rows { + out.push_str("| "); + out.push_str(&row.join(" | ")); + out.push_str(" |\n"); + } + out.push('\n'); + } + Block::Section { label, children } => { + out.push_str("## "); + out.push_str(label); + out.push('\n'); + out.push('\n'); + for child in children { + format_markdown_block(child, out); + } + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::block::Block; + + #[test] + fn text_empty_blocks() { + assert_eq!(format_text(&[]), ""); + } + + #[test] + fn text_line() { + let blocks = vec![Block::Line("hello world".into())]; + assert_eq!(format_text(&blocks), "hello world\n"); + } + + #[test] + fn text_multiple_lines() { + let blocks = vec![Block::Line("line 1".into()), Block::Line("line 2".into())]; + assert_eq!(format_text(&blocks), "line 1\nline 2\n"); + } + + #[test] + fn text_compact_table() { + let blocks = vec![Block::Table { + headers: Some(vec!["name".into(), "type".into()]), + rows: vec![vec!["title".into(), "String".into()]], + style: TableStyle::Compact, + }]; + let output = format_text(&blocks); + assert!(output.contains("title")); + assert!(output.contains("String")); + // Rounded border chars + assert!(output.contains('╭') || output.contains('│')); + } + + #[test] + fn text_record_table() { + let blocks = vec![Block::Table { + headers: None, + rows: vec![ + vec!["\"title\"".into(), "String".into(), "5/5".into()], + vec![ + " required:\n - \"**\"".into(), + String::new(), + String::new(), + ], + ], + style: TableStyle::Record { + detail_rows: vec![1], + }, + }]; + let output = format_text(&blocks); + assert!(output.contains("title")); + assert!(output.contains("required")); + } + + #[test] + fn text_section() { + let blocks = vec![Block::Section { + label: "Auto-build".into(), + children: vec![Block::Line("Scan: 5 files".into())], + }]; + let output = format_text(&blocks); + assert!(output.contains("Auto-build:")); + assert!(output.contains(" Scan: 5 files")); + } + + #[test] + fn markdown_line() { + let blocks = vec![Block::Line("hello".into())]; + assert_eq!(format_markdown(&blocks), "hello\n"); + } + + #[test] + fn markdown_table_with_headers() { + let blocks = vec![Block::Table { + headers: Some(vec!["name".into(), "type".into()]), + rows: vec![vec!["title".into(), "String".into()]], + style: TableStyle::Compact, + }]; + let output = format_markdown(&blocks); + assert!(output.contains("| name | type |")); + assert!(output.contains("| --- | --- |")); + assert!(output.contains("| title | String |")); + } + + #[test] + fn markdown_section() { + let blocks = vec![Block::Section { + label: "Results".into(), + children: vec![Block::Line("5 hits".into())], + }]; + let output = format_markdown(&blocks); + assert!(output.contains("## Results")); + assert!(output.contains("5 hits")); + } +} diff --git a/src/step.rs b/src/step.rs new file mode 100644 index 0000000..7ab014f --- /dev/null +++ b/src/step.rs @@ -0,0 +1,667 @@ +//! Core Step tree types for structured command output. +//! +//! Every command returns a `Step` tree where `O` is the outcome type. +//! Leaf steps (scan, validate, etc.) have empty substeps. Commands (build, +//! search, etc.) have populated substeps forming a tree that mirrors the +//! execution pipeline. +//! +//! Two instantiations: `Step` (full data, verbose) and +//! `Step` (summary data, compact). Conversion between +//! them is recursive via `to_compact()`. + +use crate::block::{Block, Render}; +use crate::outcome::{CompactOutcome, Outcome}; +use serde::ser::SerializeMap; +use serde::{Serialize, Serializer}; + +/// A Step tree with full outcome data (verbose mode). +pub type FullStep = Step; + +/// A Step tree with compact outcome data (compact mode). +pub type CompactStep = Step; + +/// A node in the execution tree. +/// +/// Leaf steps have `substeps: vec![]`. Commands have populated substeps +/// representing the pipeline stages that ran. +#[derive(Debug)] +pub struct Step { + /// Child steps that ran as part of this step's pipeline. + pub substeps: Vec>, + /// The outcome of this step itself. + pub outcome: StepOutcome, +} + +/// The result of executing a step. +#[derive(Debug)] +pub enum StepOutcome { + /// The step ran, producing a successful outcome or an error. + Complete { + /// The step's result: `Ok` with outcome data, or `Err` with error details. + result: Result, + /// Wall-clock time for this step in milliseconds. + elapsed_ms: u64, + }, + /// The step was skipped (upstream failure, not needed, !verbose, etc.). + Skipped, +} + +impl StepOutcome { + /// Returns the elapsed time if the step completed, `None` if skipped. + pub fn elapsed_ms(&self) -> Option { + match self { + Self::Complete { elapsed_ms, .. } => Some(*elapsed_ms), + Self::Skipped => None, + } + } +} + +impl Step { + /// Recursively convert the full tree to a compact tree. + /// + /// Each outcome is converted via `Outcome::to_compact()`, which may + /// read substep data for command-level summaries. Errors and Skipped + /// outcomes are preserved as-is. + pub fn to_compact(&self) -> Step { + let compact_outcome = match &self.outcome { + StepOutcome::Complete { + result: Ok(outcome), + elapsed_ms, + } => StepOutcome::Complete { + result: Ok(outcome.to_compact(&self.substeps)), + elapsed_ms: *elapsed_ms, + }, + StepOutcome::Complete { + result: Err(e), + elapsed_ms, + } => StepOutcome::Complete { + result: Err(e.clone()), + elapsed_ms: *elapsed_ms, + }, + StepOutcome::Skipped => StepOutcome::Skipped, + }; + Step { + substeps: self.substeps.iter().map(|s| s.to_compact()).collect(), + outcome: compact_outcome, + } + } +} + +/// An error that occurred during a step. +#[derive(Debug, Clone, Serialize)] +pub struct StepError { + /// Whether this is a user error (bad input) or application error (internal failure). + pub kind: ErrorKind, + /// Human-readable error message. + pub message: String, +} + +/// Error categorization (HTTP analogy: User ≈ 4xx, Application ≈ 5xx). +#[derive(Debug, Clone, Serialize)] +#[serde(rename_all = "snake_case")] +pub enum ErrorKind { + /// Bad input: config not found, model mismatch, invalid flags. + User, + /// Unexpected internal failure: I/O errors, parquet corruption. + Application, +} + +impl Step { + /// Create a leaf step (no substeps) with a successful outcome. + pub fn leaf(outcome: O, elapsed_ms: u64) -> Self { + Self { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Ok(outcome), + elapsed_ms, + }, + } + } + + /// Create a leaf step with a failed outcome. + pub fn failed(kind: ErrorKind, message: String, elapsed_ms: u64) -> Self { + Self { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { kind, message }), + elapsed_ms, + }, + } + } + + /// Create a skipped step. + pub fn skipped() -> Self { + Self { + substeps: vec![], + outcome: StepOutcome::Skipped, + } + } + + /// Compatibility method during migration — delegates to `has_failed()` free function. + /// TODO: remove after all commands are converted (TODO-0131 wave 7). + pub fn has_failed_step(&self) -> bool { + has_failed(self) + } +} + +// --- Free functions --- + +/// Returns `true` if any step in the tree failed (has `Err` outcome). +pub fn has_failed(step: &Step) -> bool { + step.substeps.iter().any(|s| has_failed(s)) + || matches!(step.outcome, StepOutcome::Complete { result: Err(_), .. }) +} + +/// Returns `true` if any step in the tree contains validation violations. +pub fn has_violations(step: &Step) -> bool { + step.substeps.iter().any(has_violations) + || match &step.outcome { + StepOutcome::Complete { + result: Ok(outcome), + .. + } => outcome.contains_violations(), + _ => false, + } +} + +// --- Pipeline migration helper (temporary, deleted in TODO-0131) --- + +/// Convert an old `ProcessingStepResult` into a new `Step`. +/// +/// This is migration glue: during the transition, commands still call old +/// `run_*()` pipeline functions that return `ProcessingStepResult`. This +/// generic helper converts the result into a `Step` leaf node, +/// using the provided closure to map the output to an `Outcome` variant. +/// +/// Deleted when the old pipeline modules are removed (TODO-0131). +pub fn from_pipeline_result( + result: crate::pipeline::ProcessingStepResult, + to_outcome: F, +) -> Step +where + F: FnOnce(&T) -> Outcome, +{ + match result { + crate::pipeline::ProcessingStepResult::Completed(step) => Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Ok(to_outcome(&step.output)), + elapsed_ms: step.elapsed_ms, + }, + }, + crate::pipeline::ProcessingStepResult::Failed(err) => Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: convert_error_kind(err.kind), + message: err.message, + }), + elapsed_ms: 0, + }, + }, + crate::pipeline::ProcessingStepResult::Skipped => Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }, + } +} + +/// Like [`from_pipeline_result`], but also returns the raw output data +/// (needed when subsequent steps consume data from a prior step). +pub fn from_pipeline_result_with_data( + result: crate::pipeline::ProcessingStepResult, + to_outcome: F, +) -> (Step, Option) +where + F: FnOnce(&T) -> Outcome, +{ + match result { + crate::pipeline::ProcessingStepResult::Completed(step) => { + let outcome = to_outcome(&step.output); + ( + Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Ok(outcome), + elapsed_ms: step.elapsed_ms, + }, + }, + Some(step.output), + ) + } + crate::pipeline::ProcessingStepResult::Failed(err) => ( + Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: convert_error_kind(err.kind), + message: err.message, + }), + elapsed_ms: 0, + }, + }, + None, + ), + crate::pipeline::ProcessingStepResult::Skipped => ( + Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }, + None, + ), + } +} + +/// Convert old pipeline ErrorKind to new step ErrorKind. +/// Temporary migration helper (deleted in TODO-0131). +pub fn convert_error_kind(kind: crate::pipeline::ErrorKind) -> ErrorKind { + match kind { + crate::pipeline::ErrorKind::User => ErrorKind::User, + crate::pipeline::ErrorKind::Application => ErrorKind::Application, + } +} + +// --- Serialize impls (hand-written, not derived) --- + +impl Serialize for Step { + fn serialize(&self, serializer: S) -> Result { + let mut map = serializer.serialize_map(Some(2))?; + map.serialize_entry("substeps", &self.substeps)?; + map.serialize_entry("outcome", &self.outcome)?; + map.end() + } +} + +impl Serialize for StepOutcome { + fn serialize(&self, serializer: S) -> Result { + match self { + Self::Complete { + result: Ok(outcome), + elapsed_ms, + } => { + let mut map = serializer.serialize_map(Some(3))?; + map.serialize_entry("status", "complete")?; + map.serialize_entry("elapsed_ms", elapsed_ms)?; + map.serialize_entry("outcome", outcome)?; + map.end() + } + Self::Complete { + result: Err(error), + elapsed_ms, + } => { + let mut map = serializer.serialize_map(Some(3))?; + map.serialize_entry("status", "failed")?; + map.serialize_entry("elapsed_ms", elapsed_ms)?; + map.serialize_entry("error", error)?; + map.end() + } + Self::Skipped => { + let mut map = serializer.serialize_map(Some(1))?; + map.serialize_entry("status", "skipped")?; + map.end() + } + } + } +} + +// --- Render impls --- + +impl Render for Step { + fn render(&self) -> Vec { + let mut blocks = vec![]; + + // Render all substeps first + for substep in &self.substeps { + blocks.extend(substep.render()); + } + + // Render own outcome — with timing injection for leaf steps + let outcome_blocks = self.outcome.render(); + if self.substeps.is_empty() { + // Leaf step: inject elapsed_ms into the first Block::Line + if let Some(elapsed) = self.outcome.elapsed_ms() { + let mut injected = false; + for block in outcome_blocks { + if !injected { + if let Block::Line(text) = block { + blocks.push(Block::Line(format!("{text} ({elapsed}ms)"))); + injected = true; + continue; + } + } + blocks.push(block); + } + } else { + blocks.extend(outcome_blocks); + } + } else { + // Command step: no timing injection, outcome renders as-is + blocks.extend(outcome_blocks); + } + + blocks + } +} + +impl Render for StepOutcome { + fn render(&self) -> Vec { + match self { + Self::Complete { + result: Ok(outcome), + .. + } => outcome.render(), + Self::Complete { result: Err(e), .. } => { + vec![Block::Line(format!("Error: {}", e.message))] + } + Self::Skipped => vec![], + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn leaf_step_complete() { + let step: Step = Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Ok("scanned 5 files".to_string()), + elapsed_ms: 42, + }, + }; + assert_eq!(step.outcome.elapsed_ms(), Some(42)); + assert!(step.substeps.is_empty()); + } + + #[test] + fn leaf_step_failed() { + let step: Step = Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: "config not found".into(), + }), + elapsed_ms: 2, + }, + }; + assert_eq!(step.outcome.elapsed_ms(), Some(2)); + match &step.outcome { + StepOutcome::Complete { result: Err(e), .. } => { + assert_eq!(e.message, "config not found"); + } + _ => panic!("expected failed step"), + } + } + + #[test] + fn skipped_step() { + let step: Step = Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }; + assert_eq!(step.outcome.elapsed_ms(), None); + } + + #[test] + fn step_tree_with_substeps() { + let leaf1: Step = Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Ok("scan".into()), + elapsed_ms: 10, + }, + }; + let leaf2: Step = Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Ok("validate".into()), + elapsed_ms: 5, + }, + }; + let command: Step = Step { + substeps: vec![leaf1, leaf2], + outcome: StepOutcome::Complete { + result: Ok("build complete".into()), + elapsed_ms: 15, + }, + }; + assert_eq!(command.substeps.len(), 2); + assert_eq!(command.outcome.elapsed_ms(), Some(15)); + } + + // --- Render tests --- + + /// Implement Render for String so we can test Step rendering. + impl Render for String { + fn render(&self) -> Vec { + vec![Block::Line(self.clone())] + } + } + + #[test] + fn render_leaf_step_injects_timing() { + let step: Step = Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Ok("Scan: 43 files".into()), + elapsed_ms: 15, + }, + }; + let blocks = step.render(); + assert_eq!(blocks.len(), 1); + match &blocks[0] { + Block::Line(s) => assert_eq!(s, "Scan: 43 files (15ms)"), + _ => panic!("expected Line"), + } + } + + #[test] + fn render_command_step_no_timing() { + let leaf: Step = Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Ok("Scan: 5 files".into()), + elapsed_ms: 10, + }, + }; + let command: Step = Step { + substeps: vec![leaf], + outcome: StepOutcome::Complete { + result: Ok("Built index".into()), + elapsed_ms: 100, + }, + }; + let blocks = command.render(); + assert_eq!(blocks.len(), 2); + // First block: substep with timing + match &blocks[0] { + Block::Line(s) => assert_eq!(s, "Scan: 5 files (10ms)"), + _ => panic!("expected Line"), + } + // Second block: command outcome WITHOUT timing + match &blocks[1] { + Block::Line(s) => assert_eq!(s, "Built index"), + _ => panic!("expected Line"), + } + } + + #[test] + fn render_skipped_step_empty() { + let step: Step = Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }; + let blocks = step.render(); + assert!(blocks.is_empty()); + } + + #[test] + fn render_error_step() { + let step: Step = Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: "config not found".into(), + }), + elapsed_ms: 2, + }, + }; + let blocks = step.render(); + assert_eq!(blocks.len(), 1); + match &blocks[0] { + Block::Line(s) => assert_eq!(s, "Error: config not found (2ms)"), + _ => panic!("expected Line"), + } + } + + #[test] + fn render_empty_outcome_no_timing_crash() { + // An outcome that renders to empty vec — timing injection does nothing + struct EmptyOutcome; + impl Render for EmptyOutcome { + fn render(&self) -> Vec { + vec![] + } + } + let step: Step = Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Ok(EmptyOutcome), + elapsed_ms: 5, + }, + }; + let blocks = step.render(); + assert!(blocks.is_empty()); + } + + #[test] + fn step_error_is_clone() { + let err = StepError { + kind: ErrorKind::Application, + message: "I/O error".into(), + }; + let cloned = err.clone(); + assert_eq!(cloned.message, "I/O error"); + } + + // --- Serialize tests --- + + #[test] + fn serialize_step_complete_ok() { + use crate::outcome::{CleanOutcome, Outcome}; + use std::path::PathBuf; + + let step = Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Ok(Outcome::Clean(CleanOutcome { + removed: true, + path: PathBuf::from(".mdvs"), + files_removed: 2, + size_bytes: 1024, + })), + elapsed_ms: 5, + }, + }; + let json = serde_json::to_value(&step).unwrap(); + assert_eq!(json["outcome"]["status"], "complete"); + assert_eq!(json["outcome"]["elapsed_ms"], 5); + assert!(json["outcome"]["outcome"]["Clean"].is_object()); + assert_eq!(json["outcome"]["outcome"]["Clean"]["removed"], true); + assert_eq!(json["substeps"].as_array().unwrap().len(), 0); + } + + #[test] + fn serialize_step_complete_err() { + let step: Step = Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: "config not found".into(), + }), + elapsed_ms: 2, + }, + }; + let json = serde_json::to_value(&step).unwrap(); + assert_eq!(json["outcome"]["status"], "failed"); + assert_eq!(json["outcome"]["elapsed_ms"], 2); + assert_eq!(json["outcome"]["error"]["kind"], "user"); + assert_eq!(json["outcome"]["error"]["message"], "config not found"); + } + + #[test] + fn serialize_step_skipped() { + let step: Step = Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }; + let json = serde_json::to_value(&step).unwrap(); + assert_eq!(json["outcome"]["status"], "skipped"); + assert!(json["outcome"].get("elapsed_ms").is_none()); + } + + #[test] + fn serialize_step_tree_recursive() { + use crate::outcome::{DeleteIndexOutcome, Outcome}; + + let leaf = Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Ok(Outcome::DeleteIndex(DeleteIndexOutcome { + removed: true, + path: ".mdvs".into(), + files_removed: 1, + size_bytes: 512, + })), + elapsed_ms: 3, + }, + }; + let command = Step { + substeps: vec![leaf], + outcome: StepOutcome::Complete { + result: Ok(Outcome::Clean(crate::outcome::CleanOutcome { + removed: true, + path: std::path::PathBuf::from(".mdvs"), + files_removed: 1, + size_bytes: 512, + })), + elapsed_ms: 3, + }, + }; + let json = serde_json::to_value(&command).unwrap(); + assert_eq!(json["substeps"].as_array().unwrap().len(), 1); + let sub = &json["substeps"][0]; + assert_eq!(sub["outcome"]["status"], "complete"); + assert!(sub["outcome"]["outcome"]["DeleteIndex"].is_object()); + } + + #[test] + fn serialize_compact_step() { + use crate::outcome::{CleanOutcome, Outcome}; + use std::path::PathBuf; + + let step = Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Ok(Outcome::Clean(CleanOutcome { + removed: true, + path: PathBuf::from(".mdvs"), + files_removed: 2, + size_bytes: 1024, + })), + elapsed_ms: 5, + }, + }; + let compact = step.to_compact(); + let json = serde_json::to_value(&compact).unwrap(); + assert_eq!(json["outcome"]["status"], "complete"); + assert!(json["outcome"]["outcome"]["Clean"].is_object()); + // Compact Clean has removed + path, no files_removed/size_bytes + assert_eq!(json["outcome"]["outcome"]["Clean"]["removed"], true); + } +} From 1243d1c12fee818c2a1a38d6560a22ec0d366916 Mon Sep 17 00:00:00 2001 From: edoch Date: Thu, 19 Mar 2026 23:29:31 +0100 Subject: [PATCH 04/35] refactor: convert all 7 commands to Step tree architecture Rewrite all commands to return Step instead of *CommandOutput: - clean, info, check, init, update, build, search - main.rs dispatch uses Step rendering + to_compact() - Delete all *ProcessOutput, *CommandOutput, *Result structs - Delete CommandOutput trait and format_json_compact from output.rs - Add BuildFileDetail, FieldViolationCompact, NewFieldCompact, DiscoveredFieldCompact, ChangedFieldCompact, RemovedFieldCompact to output.rs - Add Clone derives on FieldViolation, ViolatingFile, NewField Net -913 lines across commands. Implements TODOs 0125-0130. --- src/cmd/build.rs | 1242 ++++++++++++++++----------------------------- src/cmd/check.rs | 581 +++++++++------------ src/cmd/clean.rs | 247 +++++---- src/cmd/info.rs | 550 ++++++++------------ src/cmd/init.rs | 519 ++++++++----------- src/cmd/search.rs | 617 +++++++++------------- src/cmd/update.rs | 736 ++++++++------------------- src/main.rs | 155 +++++- src/output.rs | 164 ++++-- 9 files changed, 1949 insertions(+), 2862 deletions(-) diff --git a/src/cmd/build.rs b/src/cmd/build.rs index b0f6245..705f90b 100644 --- a/src/cmd/build.rs +++ b/src/cmd/build.rs @@ -1,384 +1,29 @@ use crate::discover::field_type::FieldType; use crate::index::backend::Backend; use crate::index::storage::{content_hash, BuildMetadata, FileRow}; -use crate::output::{format_file_count, format_json_compact, CommandOutput, NewField}; -use crate::pipeline::classify::{run_classify, ClassifyOutput}; -use crate::pipeline::embed::{run_embed_files, EmbedFilesOutput}; -use crate::pipeline::load_model::{run_load_model, LoadModelOutput}; -use crate::pipeline::read_config::{run_read_config, ReadConfigOutput}; -use crate::pipeline::scan::{run_scan, ScanOutput}; -use crate::pipeline::validate::{run_validate, ValidateOutput}; -use crate::pipeline::write_index::{run_write_index, BuildFileDetail, WriteIndexOutput}; -use crate::pipeline::{ErrorKind, ProcessingStepError, ProcessingStepResult}; +use crate::outcome::commands::BuildOutcome; +use crate::outcome::{ + ClassifyOutcome, EmbedFilesOutcome, LoadModelOutcome, Outcome, ReadConfigOutcome, ScanOutcome, + ValidateOutcome, WriteIndexOutcome, +}; +use crate::pipeline::classify::run_classify; +use crate::pipeline::embed::run_embed_files; +use crate::pipeline::load_model::run_load_model; +use crate::pipeline::read_config::run_read_config; +use crate::pipeline::scan::run_scan; +use crate::pipeline::validate::run_validate; +use crate::pipeline::write_index::run_write_index; +use crate::pipeline::ProcessingStepResult; use crate::schema::config::{BuildConfig, MdvsToml, SearchConfig}; use crate::schema::shared::{ChunkingConfig, EmbeddingModelConfig}; -use crate::table::{style_compact, style_record, Builder}; -use serde::Serialize; +use crate::step::{from_pipeline_result, ErrorKind, Step, StepError, StepOutcome}; use std::path::Path; +use std::time::Instant; use tracing::instrument; const DEFAULT_MODEL: &str = "minishlab/potion-base-8M"; const DEFAULT_CHUNK_SIZE: usize = 1024; -// ============================================================================ -// BuildResult -// ============================================================================ - -/// Result of the `build` command: embedding and index statistics. -#[derive(Debug, Serialize)] -pub struct BuildResult { - /// Whether this was a full rebuild (vs incremental). - pub full_rebuild: bool, - /// Total number of files in the final index. - pub files_total: usize, - /// Number of files that were chunked and embedded this run. - pub files_embedded: usize, - /// Number of files reused from the previous index (content unchanged). - pub files_unchanged: usize, - /// Number of files removed since the last build. - pub files_removed: usize, - /// Total number of chunks in the final index. - pub chunks_total: usize, - /// Number of chunks produced by newly embedded files. - pub chunks_embedded: usize, - /// Number of chunks retained from unchanged files. - pub chunks_unchanged: usize, - /// Number of chunks dropped from removed files. - pub chunks_removed: usize, - /// Fields found in frontmatter but not yet in `mdvs.toml`. - pub new_fields: Vec, - /// Per-file chunk counts for embedded files (verbose only). - #[serde(skip_serializing_if = "Option::is_none")] - pub embedded_files: Option>, - /// Per-file chunk counts for removed files (verbose only). - #[serde(skip_serializing_if = "Option::is_none")] - pub removed_files: Option>, -} - -fn format_chunk_count(n: usize) -> String { - if n == 1 { - "1 chunk".to_string() - } else { - format!("{n} chunks") - } -} - -impl CommandOutput for BuildResult { - fn format_text(&self, verbose: bool) -> String { - let mut out = String::new(); - - // New fields (shown before stats) - for nf in &self.new_fields { - out.push_str(&format!( - " new field: {} ({})\n", - nf.name, - format_file_count(nf.files_found) - )); - } - if !self.new_fields.is_empty() { - out.push_str("Run 'mdvs update' to incorporate new fields.\n\n"); - } - - // One-liner - let rebuild_suffix = if self.full_rebuild { - " (full rebuild)" - } else { - "" - }; - out.push_str(&format!( - "Built index — {}, {}{rebuild_suffix}\n", - format_file_count(self.files_total), - format_chunk_count(self.chunks_total) - )); - - // Stats table - out.push('\n'); - if verbose { - // Verbose: record tables for embedded/removed, compact for unchanged - if self.files_embedded > 0 { - let mut builder = Builder::default(); - builder.push_record([ - "embedded".to_string(), - format_file_count(self.files_embedded), - format_chunk_count(self.chunks_embedded), - ]); - let detail = match &self.embedded_files { - Some(files) => { - let lines: Vec = files - .iter() - .map(|f| { - format!(" - \"{}\" ({})", f.filename, format_chunk_count(f.chunks)) - }) - .collect(); - lines.join("\n") - } - None => String::new(), - }; - builder.push_record([detail, String::new(), String::new()]); - let mut table = builder.build(); - style_record(&mut table, 3); - out.push_str(&format!("{table}\n")); - } - if self.files_unchanged > 0 { - let mut builder = Builder::default(); - builder.push_record([ - "unchanged".to_string(), - format_file_count(self.files_unchanged), - format_chunk_count(self.chunks_unchanged), - ]); - let mut table = builder.build(); - style_compact(&mut table); - out.push_str(&format!("{table}\n")); - } - if self.files_removed > 0 { - let mut builder = Builder::default(); - builder.push_record([ - "removed".to_string(), - format_file_count(self.files_removed), - format_chunk_count(self.chunks_removed), - ]); - let detail = match &self.removed_files { - Some(files) => { - let lines: Vec = files - .iter() - .map(|f| { - format!(" - \"{}\" ({})", f.filename, format_chunk_count(f.chunks)) - }) - .collect(); - lines.join("\n") - } - None => String::new(), - }; - builder.push_record([detail, String::new(), String::new()]); - let mut table = builder.build(); - style_record(&mut table, 3); - out.push_str(&format!("{table}\n")); - } - } else { - // Compact: single table with all non-zero categories - let mut builder = Builder::default(); - if self.files_embedded > 0 { - builder.push_record([ - "embedded".to_string(), - format_file_count(self.files_embedded), - format_chunk_count(self.chunks_embedded), - ]); - } - if self.files_unchanged > 0 { - builder.push_record([ - "unchanged".to_string(), - format_file_count(self.files_unchanged), - format_chunk_count(self.chunks_unchanged), - ]); - } - if self.files_removed > 0 { - builder.push_record([ - "removed".to_string(), - format_file_count(self.files_removed), - format_chunk_count(self.chunks_removed), - ]); - } - let mut table = builder.build(); - style_compact(&mut table); - out.push_str(&format!("{table}\n")); - } - - out - } -} - -// ============================================================================ -// BuildCommandOutput (pipeline) -// ============================================================================ - -/// Step records for each phase of the build pipeline. -#[derive(Debug, Serialize)] -pub struct BuildProcessOutput { - /// Auto-update output (if auto-update was triggered). - #[serde(skip_serializing_if = "Option::is_none")] - pub auto_update: Option, - /// Read and parse `mdvs.toml`. - pub read_config: ProcessingStepResult, - /// Scan the project directory for markdown files. - pub scan: ProcessingStepResult, - /// Validate frontmatter against the schema. - pub validate: ProcessingStepResult, - /// Classify files as new/edited/unchanged/removed. - pub classify: ProcessingStepResult, - /// Load the embedding model. - pub load_model: ProcessingStepResult, - /// Embed files that need embedding. - pub embed_files: ProcessingStepResult, - /// Write the index to disk. - pub write_index: ProcessingStepResult, -} - -/// Complete output of the `build` command. -#[derive(Debug, Serialize)] -pub struct BuildCommandOutput { - /// Step-by-step process records. - pub process: BuildProcessOutput, - /// Build statistics (present when build completes successfully). - pub result: Option, -} - -impl BuildCommandOutput { - /// Returns `true` if any step failed. - pub fn has_failed_step(&self) -> bool { - self.process - .auto_update - .as_ref() - .is_some_and(|u| u.has_failed_step()) - || matches!(self.process.read_config, ProcessingStepResult::Failed(_)) - || matches!(self.process.scan, ProcessingStepResult::Failed(_)) - || matches!(self.process.validate, ProcessingStepResult::Failed(_)) - || matches!(self.process.classify, ProcessingStepResult::Failed(_)) - || matches!(self.process.load_model, ProcessingStepResult::Failed(_)) - || matches!(self.process.embed_files, ProcessingStepResult::Failed(_)) - || matches!(self.process.write_index, ProcessingStepResult::Failed(_)) - } - - /// Returns `true` if validation found violations (build aborted). - pub fn has_violations(&self) -> bool { - match &self.process.validate { - ProcessingStepResult::Completed(step) => step.output.violation_count > 0, - _ => false, - } - } -} - -impl CommandOutput for BuildCommandOutput { - fn format_json(&self, verbose: bool) -> String { - format_json_compact(self, self.result.as_ref(), verbose) - } - - fn format_text(&self, verbose: bool) -> String { - let mut preamble = String::new(); - if let Some(ref update_output) = self.process.auto_update { - if verbose { - preamble.push_str("Auto-update:\n"); - let update_text = update_output.format_text(true); - for line in update_text.lines() { - preamble.push_str(&format!(" {line}\n")); - } - preamble.push('\n'); - } else if let Some(ref ur) = update_output.result { - let total = ur.added.len() + ur.changed.len() + ur.removed.len(); - if total > 0 { - preamble.push_str(&format!("Auto-updated schema ({total} change(s))\n")); - } - } - } - - let body = if self.has_violations() { - // Validation found violations — show step lines + violation message - let violation_msg = match &self.process.validate { - ProcessingStepResult::Completed(step) => { - format!( - "Build aborted — {} violation(s) found. Run `mdvs check` for details.\n", - step.output.violation_count - ) - } - _ => "Build aborted — validation failed.\n".to_string(), - }; - if verbose { - let mut out = String::new(); - out.push_str(&format!( - "{}\n", - self.process.read_config.format_line("Read config") - )); - out.push_str(&format!("{}\n", self.process.scan.format_line("Scan"))); - out.push_str(&format!( - "{}\n", - self.process.validate.format_line("Validate") - )); - out.push('\n'); - out.push_str(&violation_msg); - out - } else { - violation_msg - } - } else if let Some(result) = &self.result { - if verbose { - let mut out = String::new(); - out.push_str(&format!( - "{}\n", - self.process.read_config.format_line("Read config") - )); - out.push_str(&format!("{}\n", self.process.scan.format_line("Scan"))); - out.push_str(&format!( - "{}\n", - self.process.validate.format_line("Validate") - )); - out.push_str(&format!( - "{}\n", - self.process.classify.format_line("Classify") - )); - out.push_str(&format!( - "{}\n", - self.process.load_model.format_line("Load model") - )); - out.push_str(&format!( - "{}\n", - self.process.embed_files.format_line("Embed") - )); - out.push_str(&format!( - "{}\n", - self.process.write_index.format_line("Write index") - )); - out.push('\n'); - out.push_str(&result.format_text(verbose)); - out - } else { - result.format_text(verbose) - } - } else { - // Pipeline didn't complete — show steps up to the failure - let mut out = String::new(); - out.push_str(&format!( - "{}\n", - self.process.read_config.format_line("Read config") - )); - if !matches!(self.process.scan, ProcessingStepResult::Skipped) { - out.push_str(&format!("{}\n", self.process.scan.format_line("Scan"))); - } - if !matches!(self.process.validate, ProcessingStepResult::Skipped) { - out.push_str(&format!( - "{}\n", - self.process.validate.format_line("Validate") - )); - } - if !matches!(self.process.classify, ProcessingStepResult::Skipped) { - out.push_str(&format!( - "{}\n", - self.process.classify.format_line("Classify") - )); - } - if !matches!(self.process.load_model, ProcessingStepResult::Skipped) { - out.push_str(&format!( - "{}\n", - self.process.load_model.format_line("Load model") - )); - } - if !matches!(self.process.embed_files, ProcessingStepResult::Skipped) { - out.push_str(&format!( - "{}\n", - self.process.embed_files.format_line("Embed") - )); - } - if !matches!(self.process.write_index, ProcessingStepResult::Skipped) { - out.push_str(&format!( - "{}\n", - self.process.write_index.format_line("Write index") - )); - } - out - }; - - format!("{preamble}{body}") - } -} - // ============================================================================ // run() // ============================================================================ @@ -392,83 +37,57 @@ pub async fn run( set_chunk_size: Option, force: bool, no_update: bool, - verbose: bool, -) -> BuildCommandOutput { - // 1. read_config - let (read_config_step, config) = run_read_config(path); + _verbose: bool, +) -> Step { + let start = Instant::now(); + let mut substeps = Vec::new(); + + // 1. Read config + let (read_config_result, config) = run_read_config(path); + substeps.push(from_pipeline_result(read_config_result, |o| { + Outcome::ReadConfig(ReadConfigOutcome { + config_path: o.config_path.clone(), + }) + })); let config = match config { Some(c) => c, - None => { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: None, - read_config: read_config_step, - scan: ProcessingStepResult::Skipped, - validate: ProcessingStepResult::Skipped, - classify: ProcessingStepResult::Skipped, - load_model: ProcessingStepResult::Skipped, - embed_files: ProcessingStepResult::Skipped, - write_index: ProcessingStepResult::Skipped, - }, - result: None, - }; - } + None => return fail_from_last(&mut substeps, start, 7), }; - // Auto-update: run update before building if configured - let auto_update = { - let should_update = !no_update && config.build.as_ref().is_some_and(|b| b.auto_update); - if should_update { - let update_output = crate::cmd::update::run(path, &[], false, false, verbose).await; - if update_output.has_failed_step() { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: Some(update_output), - read_config: read_config_step, - scan: ProcessingStepResult::Skipped, - validate: ProcessingStepResult::Skipped, - classify: ProcessingStepResult::Skipped, - load_model: ProcessingStepResult::Skipped, - embed_files: ProcessingStepResult::Skipped, - write_index: ProcessingStepResult::Skipped, - }, - result: None, - }; - } - Some(update_output) - } else { - None + // 2. Auto-update (conditional) + let should_update = !no_update && config.build.as_ref().is_some_and(|b| b.auto_update); + if should_update { + let update_step = crate::cmd::update::run(path, &[], false, false, false).await; + if update_step.has_failed_step() { + substeps.push(update_step); + return fail_msg( + &mut substeps, + start, + ErrorKind::User, + "auto-update failed", + 6, + ); } - }; + substeps.push(update_step); + } - // Re-read config if auto-update ran (it may have changed the toml) - let (read_config_step, config) = if auto_update.is_some() { - let (step, cfg) = run_read_config(path); + // Re-read config if auto-update ran + let mut config = if should_update { + let (re_read, cfg) = run_read_config(path); + substeps.push(from_pipeline_result(re_read, |o| { + Outcome::ReadConfig(ReadConfigOutcome { + config_path: o.config_path.clone(), + }) + })); match cfg { - Some(c) => (step, c), - None => { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update, - read_config: step, - scan: ProcessingStepResult::Skipped, - validate: ProcessingStepResult::Skipped, - classify: ProcessingStepResult::Skipped, - load_model: ProcessingStepResult::Skipped, - embed_files: ProcessingStepResult::Skipped, - write_index: ProcessingStepResult::Skipped, - }, - result: None, - }; - } + Some(c) => c, + None => return fail_from_last(&mut substeps, start, 6), } } else { - (read_config_step, config) + config }; - let mut config = config; - // Config mutation (inline, not a step) - // Fill missing build sections, apply --set-* flags + // Mutate config (inline, not a step) let mutation_error = mutate_config( &mut config, path, @@ -478,91 +97,111 @@ pub async fn run( force, ); - // 2. scan - let (scan_step, scanned) = if let Some(msg) = mutation_error { - ( - ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::User, - message: msg, - }), - None, - ) - } else { - run_scan(path, &config.scan) - }; + // 3. Scan (mutation errors land here) + if let Some(msg) = mutation_error { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: msg, + }), + elapsed_ms: 0, + }, + }); + return fail_from_last(&mut substeps, start, 5); + } + let (scan_result, scanned) = run_scan(path, &config.scan); + substeps.push(from_pipeline_result(scan_result, |o| { + Outcome::Scan(ScanOutcome { + files_found: o.files_found, + glob: o.glob.clone(), + }) + })); let scanned = match scanned { Some(s) => s, - None => { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: None, - read_config: read_config_step, - scan: scan_step, - validate: ProcessingStepResult::Skipped, - classify: ProcessingStepResult::Skipped, - load_model: ProcessingStepResult::Skipped, - embed_files: ProcessingStepResult::Skipped, - write_index: ProcessingStepResult::Skipped, - }, - result: None, - }; - } + None => return fail_from_last(&mut substeps, start, 5), }; - // 3. validate (no-files check lands here as pre-check) - let (validate_step, validation_data) = if scanned.files.is_empty() { - ( - ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::User, - message: format!("no markdown files found in '{}'", path.display()), - }), - None, - ) - } else { - run_validate(&scanned, &config, false) - }; + // 4. Validate + if scanned.files.is_empty() { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: format!("no markdown files found in '{}'", path.display()), + }), + elapsed_ms: 0, + }, + }); + return fail_from_last(&mut substeps, start, 4); + } - let validation_data = match validation_data { - Some(d) => d, + let (validate_result, validation_data) = run_validate(&scanned, &config, false); + let (violations, new_fields) = match validation_data { + Some((v, nf)) => (v, nf), None => { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: None, - read_config: read_config_step, - scan: scan_step, - validate: validate_step, - classify: ProcessingStepResult::Skipped, - load_model: ProcessingStepResult::Skipped, - embed_files: ProcessingStepResult::Skipped, - write_index: ProcessingStepResult::Skipped, - }, - result: None, - }; + substeps.push(from_pipeline_result(validate_result, |o| { + Outcome::Validate(ValidateOutcome { + files_checked: o.files_checked, + violations: vec![], + new_fields: vec![], + }) + })); + return fail_from_last(&mut substeps, start, 4); } }; - let (violations, new_fields) = validation_data; - let has_violations = !violations.is_empty(); - - // If violations found, abort — remaining steps skipped - if has_violations { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: None, - read_config: read_config_step, - scan: scan_step, - validate: validate_step, - classify: ProcessingStepResult::Skipped, - load_model: ProcessingStepResult::Skipped, - embed_files: ProcessingStepResult::Skipped, - write_index: ProcessingStepResult::Skipped, + // Build validate substep with actual violation/new_fields data + let validate_step = Step { + substeps: vec![], + outcome: match validate_result { + ProcessingStepResult::Completed(step) => StepOutcome::Complete { + result: Ok(Outcome::Validate(ValidateOutcome { + files_checked: step.output.files_checked, + violations: violations.clone(), + new_fields: new_fields.clone(), + })), + elapsed_ms: step.elapsed_ms, + }, + ProcessingStepResult::Failed(err) => StepOutcome::Complete { + result: Err(StepError { + kind: crate::step::convert_error_kind(err.kind), + message: err.message, + }), + elapsed_ms: 0, + }, + ProcessingStepResult::Skipped => StepOutcome::Skipped, + }, + }; + substeps.push(validate_step); + + // Abort on violations + if !violations.is_empty() { + for _ in 0..4 { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); + } + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: format!( + "{} violation(s) found. Run `mdvs check` for details.", + violations.len() + ), + }), + elapsed_ms: start.elapsed().as_millis() as u64, }, - result: None, }; } - // Convert schema fields + // Pre-checks for classify let schema_fields: Vec<(String, FieldType)> = match config .fields .field @@ -576,168 +215,144 @@ pub async fn run( { Ok(sf) => sf, Err(msg) => { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: None, - read_config: read_config_step, - scan: scan_step, - validate: validate_step, - classify: ProcessingStepResult::Failed(ProcessingStepError { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { kind: ErrorKind::Application, message: msg, }), - load_model: ProcessingStepResult::Skipped, - embed_files: ProcessingStepResult::Skipped, - write_index: ProcessingStepResult::Skipped, + elapsed_ms: 0, }, - result: None, - }; + }); + return fail_from_last(&mut substeps, start, 3); } }; let embedding = match config.embedding_model.as_ref() { Some(e) => e, None => { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: None, - read_config: read_config_step, - scan: scan_step, - validate: validate_step, - classify: ProcessingStepResult::Failed(ProcessingStepError { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { kind: ErrorKind::User, - message: "missing [embedding_model] in mdvs.toml".to_string(), + message: "missing [embedding_model] in mdvs.toml".into(), }), - load_model: ProcessingStepResult::Skipped, - embed_files: ProcessingStepResult::Skipped, - write_index: ProcessingStepResult::Skipped, + elapsed_ms: 0, }, - result: None, - }; + }); + return fail_from_last(&mut substeps, start, 3); } }; let chunking = match config.chunking.as_ref() { Some(c) => c, None => { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: None, - read_config: read_config_step, - scan: scan_step, - validate: validate_step, - classify: ProcessingStepResult::Failed(ProcessingStepError { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { kind: ErrorKind::User, - message: "missing [chunking] in mdvs.toml".to_string(), + message: "missing [chunking] in mdvs.toml".into(), }), - load_model: ProcessingStepResult::Skipped, - embed_files: ProcessingStepResult::Skipped, - write_index: ProcessingStepResult::Skipped, + elapsed_ms: 0, }, - result: None, - }; + }); + return fail_from_last(&mut substeps, start, 3); } }; let backend = Backend::parquet(path); - - // Config change detection (pre-check for classify step) let config_change_error = detect_config_changes(&backend, embedding, chunking, &config, force); - // 4. classify + // 5. Classify let full_rebuild = force || !backend.exists(); - let (classify_step, classify_data) = if let Some(msg) = config_change_error { - ( - ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::User, - message: msg, - }), - None, - ) + if let Some(msg) = config_change_error { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: msg, + }), + elapsed_ms: 0, + }, + }); + return fail_from_last(&mut substeps, start, 3); + } + + let existing_index = if full_rebuild { + vec![] } else { - // Read existing index for classification - let existing_index = if full_rebuild { - vec![] - } else { - match backend.read_file_index() { - Ok(idx) => idx, - Err(e) => { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: None, - read_config: read_config_step, - scan: scan_step, - validate: validate_step, - classify: ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::Application, - message: e.to_string(), - }), - load_model: ProcessingStepResult::Skipped, - embed_files: ProcessingStepResult::Skipped, - write_index: ProcessingStepResult::Skipped, - }, - result: None, - }; - } + match backend.read_file_index() { + Ok(idx) => idx, + Err(e) => { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: e.to_string(), + }), + elapsed_ms: 0, + }, + }); + return fail_from_last(&mut substeps, start, 3); } - }; - let existing_chunks = if full_rebuild { - vec![] - } else { - match backend.read_chunk_rows() { - Ok(crs) => crs, - Err(e) => { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: None, - read_config: read_config_step, - scan: scan_step, - validate: validate_step, - classify: ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::Application, - message: e.to_string(), - }), - load_model: ProcessingStepResult::Skipped, - embed_files: ProcessingStepResult::Skipped, - write_index: ProcessingStepResult::Skipped, - }, - result: None, - }; - } + } + }; + let existing_chunks = if full_rebuild { + vec![] + } else { + match backend.read_chunk_rows() { + Ok(crs) => crs, + Err(e) => { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: e.to_string(), + }), + elapsed_ms: 0, + }, + }); + return fail_from_last(&mut substeps, start, 3); } - }; - run_classify(&scanned, &existing_index, existing_chunks, full_rebuild) + } }; + let (classify_result, classify_data) = + run_classify(&scanned, &existing_index, existing_chunks, full_rebuild); + substeps.push(from_pipeline_result(classify_result, |o| { + Outcome::Classify(ClassifyOutcome { + full_rebuild: o.full_rebuild, + needs_embedding: o.needs_embedding, + unchanged: o.unchanged, + removed: o.removed, + }) + })); let classify_data = match classify_data { Some(d) => d, - None => { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: None, - read_config: read_config_step, - scan: scan_step, - validate: validate_step, - classify: classify_step, - load_model: ProcessingStepResult::Skipped, - embed_files: ProcessingStepResult::Skipped, - write_index: ProcessingStepResult::Skipped, - }, - result: None, - }; - } + None => return fail_from_last(&mut substeps, start, 3), }; let needs_embedding = !classify_data.needs_embedding.is_empty(); - // 5. load_model (skip if nothing to embed) - let (load_model_step, embedder) = if needs_embedding { + // 6. Load model + let (load_model_result, embedder) = if needs_embedding { run_load_model(embedding) } else { (ProcessingStepResult::Skipped, None) }; + substeps.push(from_pipeline_result(load_model_result, |o| { + Outcome::LoadModel(LoadModelOutcome { + model_name: o.model_name.clone(), + dimension: o.dimension, + }) + })); - // Dimension check (pre-check for embed_files) - // Skip on full rebuild — old index is being discarded entirely. + // Dimension check let dim_error = if full_rebuild { None } else { @@ -746,9 +361,7 @@ pub async fn run( Ok(Some(existing_dim)) => { let model_dim = emb.dimension() as i32; if existing_dim != model_dim { - Some(format!( - "dimension mismatch: model produces {model_dim}-dim embeddings but existing index has {existing_dim}-dim" - )) + Some(format!("dimension mismatch: model produces {model_dim}-dim embeddings but existing index has {existing_dim}-dim")) } else { None } @@ -756,93 +369,74 @@ pub async fn run( Ok(None) => None, Err(e) => Some(e.to_string()), }, - None if needs_embedding => { - // load_model failed — embed_files will be skipped via embedder check below - None - } + None if needs_embedding => None, None => None, } }; - // If load_model failed, skip embed_files and write_index if needs_embedding && embedder.is_none() { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: None, - read_config: read_config_step, - scan: scan_step, - validate: validate_step, - classify: classify_step, - load_model: load_model_step, - embed_files: ProcessingStepResult::Skipped, - write_index: ProcessingStepResult::Skipped, - }, - result: None, - }; + for _ in 0..2 { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); + } + return fail_msg( + &mut substeps, + start, + ErrorKind::Application, + "model loading failed", + 0, + ); } - // 6. embed_files + // 7. Embed files let max_chunk_size = chunking.max_chunk_size; let built_at = chrono::Utc::now().timestamp_micros(); - let (embed_files_step, embed_data) = if let Some(msg) = dim_error { - ( - ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::User, - message: msg, - }), - None, - ) - } else if needs_embedding { - // Safe: we returned early above if needs_embedding && embedder.is_none() - let emb = match embedder.as_ref() { - Some(e) => e, - None => { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: None, - read_config: read_config_step, - scan: scan_step, - validate: validate_step, - classify: classify_step, - load_model: load_model_step, - embed_files: ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::Application, - message: "embedder not loaded".to_string(), - }), - write_index: ProcessingStepResult::Skipped, - }, - result: None, - }; - } - }; + if let Some(msg) = dim_error { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: msg, + }), + elapsed_ms: 0, + }, + }); + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); // write_index + return fail_from_last_skip(&mut substeps, start, 0); + } + + let (embed_result, embed_data) = if needs_embedding { + let emb = embedder.as_ref().unwrap(); run_embed_files(&classify_data.needs_embedding, emb, max_chunk_size).await } else { (ProcessingStepResult::Skipped, None) }; + substeps.push(from_pipeline_result(embed_result, |o| { + Outcome::EmbedFiles(EmbedFilesOutcome { + files_embedded: o.files_embedded, + chunks_produced: o.chunks_produced, + }) + })); - // If embed_files failed, skip write_index if needs_embedding && embed_data.is_none() - && !matches!(embed_files_step, ProcessingStepResult::Skipped) + && !matches!(substeps.last().unwrap().outcome, StepOutcome::Skipped) { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: None, - read_config: read_config_step, - scan: scan_step, - validate: validate_step, - classify: classify_step, - load_model: load_model_step, - embed_files: embed_files_step, - write_index: ProcessingStepResult::Skipped, - }, - result: None, - }; + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); // write_index + return fail_from_last_skip(&mut substeps, start, 0); } - // 7. write_index — assemble file_rows + chunk_rows from classify_data + embed_data - // File rows: always built fresh from scanned files with file_ids from classify + // 8. Write index let file_rows: Vec = scanned .files .iter() @@ -859,10 +453,8 @@ pub async fn run( }) .collect(); - // Chunk rows: retained chunks + newly embedded chunks let mut chunk_rows = classify_data.retained_chunks; let mut embedded_details = Vec::new(); - if let Some(ed) = embed_data { chunk_rows.extend(ed.chunk_rows); embedded_details = ed.details; @@ -875,70 +467,127 @@ pub async fn run( built_at: chrono::Utc::now().to_rfc3339(), }; - let write_index_step = run_write_index( + let write_index_result = run_write_index( &backend, &schema_fields, &file_rows, &chunk_rows, build_meta, ); + substeps.push(from_pipeline_result(write_index_result, |o| { + Outcome::WriteIndex(WriteIndexOutcome { + files_written: o.files_written, + chunks_written: o.chunks_written, + }) + })); - if matches!(write_index_step, ProcessingStepResult::Failed(_)) { - return BuildCommandOutput { - process: BuildProcessOutput { - auto_update: None, - read_config: read_config_step, - scan: scan_step, - validate: validate_step, - classify: classify_step, - load_model: load_model_step, - embed_files: embed_files_step, - write_index: write_index_step, - }, - result: None, - }; + if crate::step::has_failed(substeps.last().unwrap()) { + return fail_from_last_skip(&mut substeps, start, 0); } - // Assemble BuildResult + // Assemble BuildOutcome let chunks_embedded: usize = embedded_details.iter().map(|d| d.chunks).sum(); let chunks_total = chunk_rows.len(); let chunks_unchanged = chunks_total - chunks_embedded; - let result = BuildResult { - full_rebuild: classify_data.full_rebuild, - files_total: file_rows.len(), - files_embedded: classify_data.needs_embedding.len(), - files_unchanged: file_rows.len() - classify_data.needs_embedding.len(), - files_removed: classify_data.removed_count, - chunks_total, - chunks_embedded, - chunks_unchanged, - chunks_removed: classify_data.chunks_removed, - new_fields, - embedded_files: if verbose { - Some(embedded_details) - } else { - None - }, - removed_files: if verbose && !classify_data.removed_details.is_empty() { - Some(classify_data.removed_details) - } else { - None + Step { + substeps, + outcome: StepOutcome::Complete { + result: Ok(Outcome::Build(Box::new(BuildOutcome { + full_rebuild: classify_data.full_rebuild, + files_total: file_rows.len(), + files_embedded: classify_data.needs_embedding.len(), + files_unchanged: file_rows.len() - classify_data.needs_embedding.len(), + files_removed: classify_data.removed_count, + chunks_total, + chunks_embedded, + chunks_unchanged, + chunks_removed: classify_data.chunks_removed, + new_fields, + embedded_files: embedded_details, + removed_files: classify_data.removed_details, + }))), + elapsed_ms: start.elapsed().as_millis() as u64, }, + } +} + +/// Push N Skipped substeps, extract error from last substep, return failed command Step. +fn fail_from_last( + substeps: &mut Vec>, + start: Instant, + skipped: usize, +) -> Step { + let msg = match substeps.last().map(|s| &s.outcome) { + Some(StepOutcome::Complete { result: Err(e), .. }) => e.message.clone(), + _ => "step failed".into(), }; + for _ in 0..skipped { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); + } + Step { + substeps: std::mem::take(substeps), + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + } +} - BuildCommandOutput { - process: BuildProcessOutput { - auto_update, - read_config: read_config_step, - scan: scan_step, - validate: validate_step, - classify: classify_step, - load_model: load_model_step, - embed_files: embed_files_step, - write_index: write_index_step, +/// Push N Skipped substeps, return failed command Step with a specific message. +fn fail_msg( + substeps: &mut Vec>, + start: Instant, + kind: ErrorKind, + msg: &str, + skipped: usize, +) -> Step { + for _ in 0..skipped { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); + } + Step { + substeps: std::mem::take(substeps), + outcome: StepOutcome::Complete { + result: Err(StepError { + kind, + message: msg.into(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + } +} + +/// Return failed command Step (no additional Skipped substeps). +fn fail_from_last_skip( + substeps: &mut Vec>, + start: Instant, + _skipped: usize, +) -> Step { + let msg = match substeps.iter().rev().find_map(|s| match &s.outcome { + StepOutcome::Complete { result: Err(e), .. } => Some(e.message.clone()), + _ => None, + }) { + Some(m) => m, + None => "step failed".into(), + }; + Step { + substeps: std::mem::take(substeps), + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, }, - result: Some(result), } } @@ -1097,6 +746,23 @@ mod tests { use std::collections::{HashMap, HashSet}; use std::fs; + fn unwrap_build(step: &Step) -> &BuildOutcome { + match &step.outcome { + StepOutcome::Complete { + result: Ok(Outcome::Build(o)), + .. + } => o, + other => panic!("expected Ok(Build), got: {other:?}"), + } + } + + fn unwrap_error(step: &Step) -> &StepError { + match &step.outcome { + StepOutcome::Complete { result: Err(e), .. } => e, + other => panic!("expected Err, got: {other:?}"), + } + } + fn create_test_vault(dir: &Path) { let blog_dir = dir.join("blog"); fs::create_dir_all(&blog_dir).unwrap(); @@ -1119,10 +785,6 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); let output = run(tmp.path(), None, None, None, false, true, false).await; assert!(output.has_failed_step()); - assert!(matches!( - output.process.read_config, - ProcessingStepResult::Failed(_) - )); } #[tokio::test] @@ -1147,17 +809,13 @@ mod tests { assert!( !output.has_failed_step(), "first build failed: {:#?}", - output.process + output ); // Run build again (tests standalone rebuild) let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!( - !output.has_failed_step(), - "build failed: {:#?}", - output.process - ); - assert!(output.result.is_some()); + assert!(!output.has_failed_step(), "build failed: {:#?}", output); + assert!(!output.has_failed_step()); // Verify Parquet files exist let files_path = tmp.path().join(".mdvs/files.parquet"); @@ -1240,11 +898,8 @@ mod tests { // Build should fail with dimension mismatch when model loads let output = run(tmp.path(), None, None, None, false, true, false).await; assert!(output.has_failed_step()); - let err = match &output.process.embed_files { - ProcessingStepResult::Failed(e) => &e.message, - other => panic!("expected embed_files Failed, got: {other:?}"), - }; - assert!(err.contains("dimension mismatch")); + let err = unwrap_error(&output); + assert!(err.message.contains("dimension mismatch")); } #[tokio::test] @@ -1288,8 +943,8 @@ mod tests { !output.has_failed_step(), "expected success with --force, got failed step" ); - assert!(output.result.is_some()); - let result = output.result.unwrap(); + assert!(!output.has_failed_step()); + let result = unwrap_build(&output); assert!(result.full_rebuild); } @@ -1317,11 +972,7 @@ mod tests { // Build should fill defaults and succeed let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!( - !output.has_failed_step(), - "build failed: {:#?}", - output.process - ); + assert!(!output.has_failed_step(), "build failed: {:#?}", output); // Verify sections were written let config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); @@ -1371,11 +1022,8 @@ mod tests { ) .await; assert!(output.has_failed_step()); - let err = match &output.process.scan { - ProcessingStepResult::Failed(e) => &e.message, - other => panic!("expected scan Failed, got: {other:?}"), - }; - assert!(err.contains("--force")); + let err = unwrap_error(&output); + assert!(err.message.contains("--force")); } #[tokio::test] @@ -1392,7 +1040,7 @@ mod tests { false, // skip_gitignore false, // verbose ); - assert!(!init_output.has_failed_step()); + assert!(!crate::step::has_failed(&init_output)); // Build the index (creates build sections) let output = run(tmp.path(), None, None, None, false, true, false).await; @@ -1400,11 +1048,8 @@ mod tests { let output = run(tmp.path(), None, None, Some(512), false, true, false).await; assert!(output.has_failed_step()); - let err = match &output.process.scan { - ProcessingStepResult::Failed(e) => &e.message, - other => panic!("expected scan Failed, got: {other:?}"), - }; - assert!(err.contains("--force")); + let err = unwrap_error(&output); + assert!(err.message.contains("--force")); } #[tokio::test] @@ -1421,7 +1066,7 @@ mod tests { false, // skip_gitignore false, // verbose ); - assert!(!init_output.has_failed_step()); + assert!(!crate::step::has_failed(&init_output)); // Build the index (creates build sections) let output = run(tmp.path(), None, None, None, false, true, false).await; @@ -1432,7 +1077,7 @@ mod tests { assert!( !output.has_failed_step(), "build with --force failed: {:#?}", - output.process + output ); let config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); @@ -1453,7 +1098,7 @@ mod tests { false, // skip_gitignore false, // verbose ); - assert!(!init_output.has_failed_step()); + assert!(!crate::step::has_failed(&init_output)); // Build the index (creates build sections + parquets) let output = run(tmp.path(), None, None, None, false, true, false).await; @@ -1467,19 +1112,16 @@ mod tests { // Build without --force should error let output = run(tmp.path(), None, None, None, false, true, false).await; assert!(output.has_failed_step()); - let err = match &output.process.classify { - ProcessingStepResult::Failed(e) => &e.message, - other => panic!("expected classify Failed, got: {other:?}"), - }; - assert!(err.contains("config changed since last build")); - assert!(err.contains("chunk_size")); + let err = unwrap_error(&output); + assert!(err.message.contains("config changed since last build")); + assert!(err.message.contains("chunk_size")); // Build with --force should succeed let output = run(tmp.path(), None, None, None, true, true, false).await; assert!( !output.has_failed_step(), "build with --force failed: {:#?}", - output.process + output ); } @@ -1544,7 +1186,10 @@ mod tests { config.write(&tmp.path().join("mdvs.toml")).unwrap(); let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(output.has_violations(), "expected validation violations"); + assert!( + crate::step::has_violations(&output), + "expected validation violations" + ); } #[tokio::test] @@ -1616,7 +1261,10 @@ mod tests { config.write(&tmp.path().join("mdvs.toml")).unwrap(); let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(output.has_violations(), "expected validation violations"); + assert!( + crate::step::has_violations(&output), + "expected validation violations" + ); } #[tokio::test] @@ -1675,7 +1323,7 @@ mod tests { assert!( !output.has_failed_step(), "build should succeed with new fields: {:#?}", - output.process + output ); // Verify index was created @@ -1716,7 +1364,7 @@ mod tests { false, // skip_gitignore false, // verbose ); - assert!(!init_output.has_failed_step()); + assert!(!crate::step::has_failed(&init_output)); // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; @@ -1758,7 +1406,7 @@ mod tests { false, // skip_gitignore false, // verbose ); - assert!(!init_output.has_failed_step()); + assert!(!crate::step::has_failed(&init_output)); // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; @@ -1812,7 +1460,7 @@ mod tests { false, // skip_gitignore false, // verbose ); - assert!(!init_output.has_failed_step()); + assert!(!crate::step::has_failed(&init_output)); // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; @@ -1885,7 +1533,7 @@ mod tests { false, // skip_gitignore false, // verbose ); - assert!(!init_output.has_failed_step()); + assert!(!crate::step::has_failed(&init_output)); // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; @@ -1925,7 +1573,7 @@ mod tests { false, // skip_gitignore false, // verbose ); - assert!(!init_output.has_failed_step()); + assert!(!crate::step::has_failed(&init_output)); // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; @@ -1966,7 +1614,7 @@ mod tests { false, // skip_gitignore false, // verbose ); - assert!(!init_output.has_failed_step()); + assert!(!crate::step::has_failed(&init_output)); // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; diff --git a/src/cmd/check.rs b/src/cmd/check.rs index 4e9bc7f..e401449 100644 --- a/src/cmd/check.rs +++ b/src/cmd/check.rs @@ -1,16 +1,11 @@ use crate::discover::field_type::FieldType; use crate::discover::scan::ScannedFiles; -use crate::output::{ - format_file_count, format_json_compact, CommandOutput, FieldViolation, NewField, ViolatingFile, - ViolationKind, -}; -use crate::pipeline::read_config::ReadConfigOutput; -use crate::pipeline::scan::ScanOutput; -use crate::pipeline::validate::ValidateOutput; -use crate::pipeline::ProcessingStepResult; +use crate::outcome::commands::CheckOutcome; +use crate::outcome::{Outcome, ReadConfigOutcome, ScanOutcome, ValidateOutcome}; +use crate::output::{FieldViolation, NewField, ViolatingFile, ViolationKind}; use crate::schema::config::MdvsToml; use crate::schema::shared::FieldTypeSerde; -use crate::table::{style_compact, style_record, Builder}; +use crate::step::{from_pipeline_result, ErrorKind, Step, StepError, StepOutcome}; use globset::Glob; use serde::Serialize; use serde_json::Value; @@ -18,7 +13,12 @@ use std::collections::{BTreeMap, HashMap, HashSet}; use std::path::{Path, PathBuf}; use tracing::{info, instrument}; -/// Result of the `check` command: validation violations and unknown fields. +// ============================================================================ +// CheckResult — kept for build compatibility during migration +// ============================================================================ + +/// Result of validation. Used by both `check` and `build` commands. +/// Kept during migration; build still references this type. #[derive(Debug, Serialize)] pub struct CheckResult { /// Number of markdown files checked. @@ -36,297 +36,202 @@ impl CheckResult { } } -impl CommandOutput for CheckResult { - fn format_text(&self, verbose: bool) -> String { - let mut out = String::new(); - - // One-liner - let violations_part = if self.has_violations() { - format!("{} violation(s)", self.field_violations.len()) - } else { - "no violations".to_string() - }; - let new_fields_part = if self.new_fields.is_empty() { - String::new() - } else { - format!(", {} new field(s)", self.new_fields.len()) - }; - out.push_str(&format!( - "Checked {} files — {violations_part}{new_fields_part}\n", - self.files_checked - )); - - // Violations table - if self.has_violations() { - out.push('\n'); - if verbose { - for v in &self.field_violations { - let mut builder = Builder::default(); - let kind_str = match v.kind { - ViolationKind::MissingRequired => "MissingRequired", - ViolationKind::WrongType => "WrongType", - ViolationKind::Disallowed => "Disallowed", - ViolationKind::NullNotAllowed => "NullNotAllowed", - }; - builder.push_record([ - format!("\"{}\"", v.field), - kind_str.to_string(), - format_file_count(v.files.len()), - ]); - let detail: String = v - .files - .iter() - .map(|f| match &f.detail { - Some(d) => format!(" - \"{}\" ({d})", f.path.display()), - None => format!(" - \"{}\"", f.path.display()), - }) - .collect::>() - .join("\n"); - builder.push_record([detail, String::new(), String::new()]); - let mut table = builder.build(); - style_record(&mut table, 3); - out.push_str(&format!("{table}\n")); - } - } else { - let mut builder = Builder::default(); - for v in &self.field_violations { - let kind_str = match v.kind { - ViolationKind::MissingRequired => "MissingRequired", - ViolationKind::WrongType => "WrongType", - ViolationKind::Disallowed => "Disallowed", - ViolationKind::NullNotAllowed => "NullNotAllowed", - }; - builder.push_record([ - format!("\"{}\"", v.field), - kind_str.to_string(), - format_file_count(v.files.len()), - ]); - } - let mut table = builder.build(); - style_compact(&mut table); - out.push_str(&format!("{table}\n")); - } - } - - // New fields table - if !self.new_fields.is_empty() { - out.push('\n'); - if verbose { - for nf in &self.new_fields { - let mut builder = Builder::default(); - builder.push_record([ - format!("\"{}\"", nf.name), - "new".to_string(), - format_file_count(nf.files_found), - ]); - let detail = match &nf.files { - Some(files) => files - .iter() - .map(|p| format!(" - \"{}\"", p.display())) - .collect::>() - .join("\n"), - None => String::new(), - }; - builder.push_record([detail, String::new(), String::new()]); - let mut table = builder.build(); - style_record(&mut table, 3); - out.push_str(&format!("{table}\n")); - } - } else { - let mut builder = Builder::default(); - for nf in &self.new_fields { - builder.push_record([ - format!("\"{}\"", nf.name), - "new".to_string(), - format_file_count(nf.files_found), - ]); - } - let mut table = builder.build(); - style_compact(&mut table); - out.push_str(&format!("{table}\n")); - } - } - - out - } -} - // ============================================================================ -// CheckCommandOutput (pipeline-based) +// run() // ============================================================================ -/// Pipeline record for the check command. -#[derive(Debug, Serialize)] -pub struct CheckProcessOutput { - /// Auto-update output (if auto-update was triggered). - #[serde(skip_serializing_if = "Option::is_none")] - pub auto_update: Option, - /// Read config step result. - pub read_config: ProcessingStepResult, - /// Scan step result. - pub scan: ProcessingStepResult, - /// Validate step result. - pub validate: ProcessingStepResult, -} - -/// Full output of the check command: pipeline steps + command result. -#[derive(Debug, Serialize)] -pub struct CheckCommandOutput { - /// Processing steps and their outcomes. - pub process: CheckProcessOutput, - /// Command result (None if pipeline didn't complete). - pub result: Option, -} - -impl CheckCommandOutput { - /// Returns `true` if any processing step failed. - pub fn has_failed_step(&self) -> bool { - self.process - .auto_update - .as_ref() - .is_some_and(|u| u.has_failed_step()) - || matches!(self.process.read_config, ProcessingStepResult::Failed(_)) - || matches!(self.process.scan, ProcessingStepResult::Failed(_)) - || matches!(self.process.validate, ProcessingStepResult::Failed(_)) - } -} - -impl CommandOutput for CheckCommandOutput { - fn format_json(&self, verbose: bool) -> String { - format_json_compact(self, self.result.as_ref(), verbose) - } +/// Read config, optionally auto-update, scan files, and validate frontmatter. +#[instrument(name = "check", skip_all)] +pub async fn run(path: &Path, no_update: bool, verbose: bool) -> Step { + use crate::pipeline::read_config::run_read_config; + use crate::pipeline::scan::run_scan; - fn format_text(&self, verbose: bool) -> String { - let mut out = String::new(); + let start = std::time::Instant::now(); + let mut substeps = Vec::new(); - // Render auto-update if present - if let Some(ref update_output) = self.process.auto_update { - if verbose { - out.push_str("Auto-update:\n"); - let update_text = update_output.format_text(true); - for line in update_text.lines() { - out.push_str(&format!(" {line}\n")); - } - out.push('\n'); - } else if let Some(ref ur) = update_output.result { - let total = ur.added.len() + ur.changed.len() + ur.removed.len(); - if total > 0 { - out.push_str(&format!("Auto-updated schema ({total} change(s))\n")); - } - } + // 1. Read config + let (read_config_result, config) = run_read_config(path); + substeps.push(from_pipeline_result(read_config_result, |o| { + Outcome::ReadConfig(ReadConfigOutcome { + config_path: o.config_path.clone(), + }) + })); + + let config = match config { + Some(c) => c, + None => { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); // scan + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); // validate + let msg = match &substeps[0].outcome { + StepOutcome::Complete { result: Err(e), .. } => e.message.clone(), + _ => "failed to read config".into(), + }; + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + }; } + }; - // If pipeline completed successfully, delegate to CheckResult - if let Some(result) = &self.result { - if verbose { - out.push_str(&format!( - "{}\n", - self.process.read_config.format_line("Read config") - )); - out.push_str(&format!("{}\n", self.process.scan.format_line("Scan"))); - out.push_str(&format!( - "{}\n", - self.process.validate.format_line("Validate") - )); - out.push('\n'); - out.push_str(&result.format_text(verbose)); - out - } else { - out.push_str(&result.format_text(verbose)); - out - } - } else { - // Pipeline didn't complete — show steps up to the failure - out.push_str(&format!( - "{}\n", - self.process.read_config.format_line("Read config") - )); - if !matches!(self.process.scan, ProcessingStepResult::Skipped) { - out.push_str(&format!("{}\n", self.process.scan.format_line("Scan"))); - } - if !matches!(self.process.validate, ProcessingStepResult::Skipped) { - out.push_str(&format!( - "{}\n", - self.process.validate.format_line("Validate") - )); - } - out + // 2. Auto-update (calls old update::run(), does not nest as substep during migration) + let should_update = !no_update && config.check.as_ref().is_some_and(|c| c.auto_update); + if should_update { + let update_output = crate::cmd::update::run(path, &[], false, false, verbose).await; + if update_output.has_failed_step() { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); // scan + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); // validate + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: "auto-update failed".into(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + }; } } -} -/// Read config, optionally auto-update, scan files, and validate frontmatter. -#[instrument(name = "check", skip_all)] -pub async fn run(path: &Path, no_update: bool, verbose: bool) -> CheckCommandOutput { - use crate::pipeline::read_config::run_read_config; - use crate::pipeline::scan::run_scan; - use crate::pipeline::validate::run_validate; - - // Read config first to check auto_update setting - let (read_config_step, config) = run_read_config(path); - - // Auto-update: run update before validating if configured - let auto_update = if let Some(ref cfg) = config { - let should_update = !no_update && cfg.check.as_ref().is_some_and(|c| c.auto_update); - if should_update { - let update_output = crate::cmd::update::run(path, &[], false, false, verbose).await; - if update_output.has_failed_step() { - return CheckCommandOutput { - process: CheckProcessOutput { - auto_update: Some(update_output), - read_config: read_config_step, - scan: ProcessingStepResult::Skipped, - validate: ProcessingStepResult::Skipped, + // Re-read config if auto-update ran + let config = if should_update { + match MdvsToml::read(&path.join("mdvs.toml")) { + Ok(c) => c, + Err(e) => { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); // scan + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); // validate + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: format!("failed to re-read config: {e}"), + }), + elapsed_ms: start.elapsed().as_millis() as u64, }, - result: None, }; } - Some(update_output) - } else { - None } } else { - None - }; - - // Re-read config if auto-update ran (it may have changed the toml) - let (read_config_step, config) = if auto_update.is_some() { - run_read_config(path) - } else { - (read_config_step, config) + config }; - let (scan_step, scanned) = match &config { - Some(cfg) => run_scan(path, &cfg.scan), - None => (ProcessingStepResult::Skipped, None), + // 3. Scan + let (scan_result, scanned) = run_scan(path, &config.scan); + substeps.push(from_pipeline_result(scan_result, |o| { + Outcome::Scan(ScanOutcome { + files_found: o.files_found, + glob: o.glob.clone(), + }) + })); + + let scanned = match scanned { + Some(s) => s, + None => { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); // validate + let msg = match &substeps.last().unwrap().outcome { + StepOutcome::Complete { result: Err(e), .. } => e.message.clone(), + _ => "scan failed".into(), + }; + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + }; + } }; - let (validate_step, validation_data) = match (&scanned, &config) { - (Some(scanned), Some(cfg)) => run_validate(scanned, cfg, verbose), - _ => (ProcessingStepResult::Skipped, None), + // 4. Validate + let validate_start = std::time::Instant::now(); + let check_result = match validate(&scanned, &config, verbose) { + Ok(r) => r, + Err(e) => { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: e.to_string(), + }), + elapsed_ms: validate_start.elapsed().as_millis() as u64, + }, + }); + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: "validation failed".into(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + }; + } }; - // Build CheckResult from validation data (if completed) - let result = validation_data.map(|(field_violations, new_fields)| { - let files_checked = scanned.as_ref().map_or(0, |s| s.files.len()); - CheckResult { - files_checked, - field_violations, - new_fields, - } + // Push validate substep + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Ok(Outcome::Validate(ValidateOutcome { + files_checked: check_result.files_checked, + violations: check_result.field_violations.clone(), + new_fields: check_result.new_fields.clone(), + })), + elapsed_ms: validate_start.elapsed().as_millis() as u64, + }, }); - CheckCommandOutput { - process: CheckProcessOutput { - auto_update, - read_config: read_config_step, - scan: scan_step, - validate: validate_step, + // Build command outcome + Step { + substeps, + outcome: StepOutcome::Complete { + result: Ok(Outcome::Check(Box::new(CheckOutcome { + files_checked: check_result.files_checked, + violations: check_result.field_violations, + new_fields: check_result.new_fields, + }))), + elapsed_ms: start.elapsed().as_millis() as u64, }, - result, } } +// ============================================================================ +// validate() — core validation logic, reused by build +// ============================================================================ + /// Accumulator key for grouping violations by field, kind, and rule. #[derive(PartialEq, Eq, Hash)] struct ViolationKey { @@ -481,7 +386,6 @@ fn check_required_fields( match value { None => { - // Key absent → MissingRequired let key = ViolationKey { field: toml_field.name.clone(), kind: ViolationKind::MissingRequired, @@ -493,7 +397,6 @@ fn check_required_fields( }); } Some(v) if v.is_null() && !toml_field.nullable => { - // Key present but null on non-nullable field → NullNotAllowed let key = ViolationKey { field: toml_field.name.clone(), kind: ViolationKind::NullNotAllowed, @@ -504,9 +407,7 @@ fn check_required_fields( detail: None, }); } - _ => { - // Key present with value (or null on nullable field) → pass - } + _ => {} } } } @@ -549,11 +450,10 @@ fn collect_new_fields( fn type_matches(expected: &FieldType, value: &Value) -> bool { match (expected, value) { - // String is the top type in the widening hierarchy — accepts any value (FieldType::String, _) => true, (FieldType::Boolean, Value::Bool(_)) => true, (FieldType::Integer, Value::Number(n)) => n.is_i64() || n.is_u64(), - (FieldType::Float, Value::Number(_)) => true, // lenient: accepts integers + (FieldType::Float, Value::Number(_)) => true, (FieldType::Array(inner), Value::Array(arr)) => arr.iter().all(|v| type_matches(inner, v)), (FieldType::Object(_), Value::Object(_)) => true, _ => false, @@ -576,10 +476,21 @@ fn actual_type_name(value: &Value) -> String { #[cfg(test)] mod tests { use super::*; + use crate::outcome::commands::CheckOutcome; use crate::schema::config::{FieldsConfig, TomlField, UpdateConfig}; use crate::schema::shared::ScanConfig; use std::fs; + fn unwrap_check(step: &Step) -> &CheckOutcome { + match &step.outcome { + StepOutcome::Complete { + result: Ok(Outcome::Check(o)), + .. + } => o, + other => panic!("expected Ok(Check), got: {other:?}"), + } + } + fn create_test_vault(dir: &Path) { let blog_dir = dir.join("blog"); fs::create_dir_all(&blog_dir).unwrap(); @@ -660,9 +571,10 @@ mod tests { vec![], ); - let result = run(tmp.path(), true, false).await.result.unwrap(); + let step = run(tmp.path(), true, false).await; + let result = unwrap_check(&step); - assert!(!result.has_violations()); + assert!(result.violations.is_empty()); assert!(result.new_fields.is_empty()); assert_eq!(result.files_checked, 2); } @@ -672,7 +584,6 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); - // tags is required in blog/**, but post2 doesn't have tags write_toml( tmp.path(), vec![ @@ -691,10 +602,11 @@ mod tests { vec![], ); - let result = run(tmp.path(), true, false).await.result.unwrap(); + let step = run(tmp.path(), true, false).await; + let result = unwrap_check(&step); - assert!(result.has_violations()); - let v = &result.field_violations[0]; + assert!(!result.violations.is_empty()); + let v = &result.violations[0]; assert_eq!(v.field, "tags"); assert!(matches!(v.kind, ViolationKind::MissingRequired)); assert_eq!(v.files.len(), 1); @@ -707,7 +619,6 @@ mod tests { let blog_dir = tmp.path().join("blog"); fs::create_dir_all(&blog_dir).unwrap(); - // draft is declared as Boolean but file has string value fs::write( blog_dir.join("post1.md"), "---\ntitle: Hello\ndraft: \"yes\"\n---\n# Hello\nBody.", @@ -720,10 +631,11 @@ mod tests { vec![], ); - let result = run(tmp.path(), true, false).await.result.unwrap(); + let step = run(tmp.path(), true, false).await; + let result = unwrap_check(&step); - assert!(result.has_violations()); - let v = &result.field_violations[0]; + assert!(!result.violations.is_empty()); + let v = &result.violations[0]; assert_eq!(v.field, "draft"); assert!(matches!(v.kind, ViolationKind::WrongType)); assert_eq!(v.files[0].detail.as_deref(), Some("got String")); @@ -734,7 +646,6 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); fs::create_dir_all(tmp.path().join("blog")).unwrap(); - // rating is declared Float, file has integer 5 — should be OK fs::write( tmp.path().join("blog/post1.md"), "---\nrating: 5\n---\n# Post\nBody.", @@ -753,9 +664,9 @@ mod tests { vec![], ); - let result = run(tmp.path(), true, false).await.result.unwrap(); - - assert!(!result.has_violations()); + let step = run(tmp.path(), true, false).await; + let result = unwrap_check(&step); + assert!(result.violations.is_empty()); } #[tokio::test] @@ -764,7 +675,6 @@ mod tests { let notes_dir = tmp.path().join("notes"); fs::create_dir_all(¬es_dir).unwrap(); - // draft is only allowed in blog/**, but appears in notes/ fs::write( notes_dir.join("idea.md"), "---\ndraft: true\n---\n# Idea\nContent.", @@ -783,10 +693,11 @@ mod tests { vec![], ); - let result = run(tmp.path(), true, false).await.result.unwrap(); + let step = run(tmp.path(), true, false).await; + let result = unwrap_check(&step); - assert!(result.has_violations()); - let v = &result.field_violations[0]; + assert!(!result.violations.is_empty()); + let v = &result.violations[0]; assert_eq!(v.field, "draft"); assert!(matches!(v.kind, ViolationKind::Disallowed)); assert_eq!(v.files[0].path.display().to_string(), "notes/idea.md"); @@ -797,12 +708,12 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); - // Only declare title — tags and draft are new write_toml(tmp.path(), vec![string_field("title")], vec![]); - let result = run(tmp.path(), true, false).await.result.unwrap(); + let step = run(tmp.path(), true, false).await; + let result = unwrap_check(&step); - assert!(!result.has_violations()); + assert!(result.violations.is_empty()); assert_eq!(result.new_fields.len(), 2); assert!(result.new_fields.iter().any(|f| f.name == "draft")); assert!(result.new_fields.iter().any(|f| f.name == "tags")); @@ -814,7 +725,6 @@ mod tests { let blog_dir = tmp.path().join("blog"); fs::create_dir_all(&blog_dir).unwrap(); - // String field with boolean, integer, array, and object values fs::write( blog_dir.join("bool.md"), "---\nfield: false\n---\n# Bool\nBody.", @@ -834,9 +744,10 @@ mod tests { write_toml(tmp.path(), vec![string_field("field")], vec![]); - let result = run(tmp.path(), true, false).await.result.unwrap(); + let step = run(tmp.path(), true, false).await; + let result = unwrap_check(&step); - assert!(!result.has_violations()); + assert!(result.violations.is_empty()); assert_eq!(result.files_checked, 4); } @@ -845,14 +756,12 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); fs::create_dir_all(tmp.path().join("blog")).unwrap(); - // A bare file (no frontmatter) in blog/ — should violate required fs::write( tmp.path().join("blog/bare.md"), "# No frontmatter\nJust content.", ) .unwrap(); - // title required in blog/** with include_bare_files=true let config = MdvsToml { scan: ScanConfig { glob: "**".into(), @@ -878,10 +787,9 @@ mod tests { }; config.write(&tmp.path().join("mdvs.toml")).unwrap(); - let result = run(tmp.path(), true, false).await.result.unwrap(); - - // Bare files missing required fields are violations - assert!(result.has_violations()); + let step = run(tmp.path(), true, false).await; + let result = unwrap_check(&step); + assert!(!result.violations.is_empty()); } #[tokio::test] @@ -889,17 +797,17 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); - // draft is in the ignore list — should not be flagged as new or violating write_toml( tmp.path(), vec![string_field("title")], vec!["draft".into(), "tags".into()], ); - let result = run(tmp.path(), true, false).await.result.unwrap(); + let step = run(tmp.path(), true, false).await; + let result = unwrap_check(&step); - assert!(!result.has_violations()); - assert!(result.new_fields.is_empty()); // draft and tags are ignored, not new + assert!(result.violations.is_empty()); + assert!(result.new_fields.is_empty()); } #[tokio::test] @@ -910,14 +818,12 @@ mod tests { fs::create_dir_all(&blog_dir).unwrap(); fs::create_dir_all(¬es_dir).unwrap(); - // post1: draft is string "yes" (wrong type for Boolean), missing tags fs::write( blog_dir.join("post1.md"), "---\ntitle: Hello\ndraft: \"yes\"\n---\n# Post\nBody.", ) .unwrap(); - // note1: has draft (disallowed outside blog/) fs::write( notes_dir.join("note1.md"), "---\ntitle: Note\ndraft: true\n---\n# Note\nBody.", @@ -954,11 +860,11 @@ mod tests { vec![], ); - let result = run(tmp.path(), true, false).await.result.unwrap(); + let step = run(tmp.path(), true, false).await; + let result = unwrap_check(&step); - assert!(result.has_violations()); - // Should have: draft wrong type, tags missing required, draft disallowed - assert!(result.field_violations.len() >= 3); + assert!(!result.violations.is_empty()); + assert!(result.violations.len() >= 3); } #[tokio::test] @@ -972,7 +878,6 @@ mod tests { ) .unwrap(); - // status: nullable=false, required=[], allowed=["**"] write_toml( tmp.path(), vec![ @@ -988,10 +893,11 @@ mod tests { vec![], ); - let result = run(tmp.path(), true, false).await.result.unwrap(); - assert!(result.has_violations()); + let step = run(tmp.path(), true, false).await; + let result = unwrap_check(&step); + assert!(!result.violations.is_empty()); let v = result - .field_violations + .violations .iter() .find(|v| v.field == "status") .expect("expected NullNotAllowed for status"); @@ -1009,7 +915,6 @@ mod tests { ) .unwrap(); - // status: nullable=true, required=[], allowed=["**"] write_toml( tmp.path(), vec![ @@ -1025,8 +930,9 @@ mod tests { vec![], ); - let result = run(tmp.path(), true, false).await.result.unwrap(); - assert!(!result.has_violations()); + let step = run(tmp.path(), true, false).await; + let result = unwrap_check(&step); + assert!(result.violations.is_empty()); } #[tokio::test] @@ -1040,7 +946,6 @@ mod tests { ) .unwrap(); - // draft: allowed only in blog/**, but file is in notes/ write_toml( tmp.path(), vec![ @@ -1056,10 +961,11 @@ mod tests { vec![], ); - let result = run(tmp.path(), true, false).await.result.unwrap(); - assert!(result.has_violations()); + let step = run(tmp.path(), true, false).await; + let result = unwrap_check(&step); + assert!(!result.violations.is_empty()); let v = result - .field_violations + .violations .iter() .find(|v| v.field == "draft") .expect("expected Disallowed for draft"); @@ -1077,8 +983,6 @@ mod tests { ) .unwrap(); - // draft: allowed only in blog/**, nullable=false - // Should trigger BOTH Disallowed AND NullNotAllowed write_toml( tmp.path(), vec![ @@ -1094,15 +998,16 @@ mod tests { vec![], ); - let result = run(tmp.path(), true, false).await.result.unwrap(); - assert!(result.has_violations()); + let step = run(tmp.path(), true, false).await; + let result = unwrap_check(&step); + assert!(!result.violations.is_empty()); let has_disallowed = result - .field_violations + .violations .iter() .any(|v| v.field == "draft" && matches!(v.kind, ViolationKind::Disallowed)); let has_null_not_allowed = result - .field_violations + .violations .iter() .any(|v| v.field == "draft" && matches!(v.kind, ViolationKind::NullNotAllowed)); diff --git a/src/cmd/clean.rs b/src/cmd/clean.rs index 58e5b19..a2cfa00 100644 --- a/src/cmd/clean.rs +++ b/src/cmd/clean.rs @@ -1,149 +1,173 @@ -use crate::output::{format_file_count, format_json_compact, format_size, CommandOutput}; -use crate::pipeline::delete_index::DeleteIndexOutput; -use crate::pipeline::ProcessingStepResult; -use serde::Serialize; +use crate::index::backend::Backend; +use crate::outcome::commands::CleanOutcome; +use crate::outcome::{DeleteIndexOutcome, Outcome}; +use crate::step::{ErrorKind, Step, StepError, StepOutcome}; use std::path::{Path, PathBuf}; +use std::time::Instant; use tracing::instrument; -/// Result of the `clean` command. -#[derive(Debug, Serialize)] -pub struct CleanResult { - /// Whether the `.mdvs/` directory was actually removed. - pub removed: bool, - /// Path to the `.mdvs/` directory. - pub path: PathBuf, - /// Number of files that were in `.mdvs/` before deletion. - pub files_removed: usize, - /// Total size of `.mdvs/` in bytes before deletion. - pub size_bytes: u64, -} - -impl CommandOutput for CleanResult { - fn format_text(&self, verbose: bool) -> String { - let mut out = String::new(); - if self.removed { - out.push_str(&format!("Cleaned \"{}\"\n", self.path.display())); - if verbose { - out.push_str(&format!( - "\n{} | {}\n", - format_file_count(self.files_removed), - format_size(self.size_bytes), - )); - } +/// Count files and sum their sizes in a directory (recursively). +fn walk_dir_stats(dir: &Path) -> anyhow::Result<(usize, u64)> { + let mut count = 0usize; + let mut size = 0u64; + for entry in std::fs::read_dir(dir)? { + let entry = entry?; + let meta = entry.metadata()?; + if meta.is_dir() { + let (c, s) = walk_dir_stats(&entry.path())?; + count += c; + size += s; } else { - out.push_str(&format!( - "Nothing to clean — \"{}\" does not exist\n", - self.path.display() - )); + count += 1; + size += meta.len(); } - out } + Ok((count, size)) } -// ============================================================================ -// CleanCommandOutput (pipeline-based) -// ============================================================================ - -/// Pipeline record for the clean command. -#[derive(Debug, Serialize)] -pub struct CleanProcessOutput { - /// Delete index step result. - pub delete_index: ProcessingStepResult, -} - -/// Full output of the clean command: pipeline steps + command result. -#[derive(Debug, Serialize)] -pub struct CleanCommandOutput { - /// Processing steps and their outcomes. - pub process: CleanProcessOutput, - /// Command result (None if pipeline didn't complete). - pub result: Option, -} +/// Delete the `.mdvs/` index directory if it exists. +#[instrument(name = "clean", skip_all)] +pub fn run(path: &Path) -> Step { + let start = Instant::now(); + let mut substeps = Vec::new(); -impl CleanCommandOutput { - /// Returns `true` if any processing step failed. - pub fn has_failed_step(&self) -> bool { - matches!(self.process.delete_index, ProcessingStepResult::Failed(_)) - } -} + // Delete index step — inlined from pipeline/delete_index.rs + let delete_start = Instant::now(); + let mdvs_dir = path.join(".mdvs"); -impl CommandOutput for CleanCommandOutput { - fn format_json(&self, verbose: bool) -> String { - format_json_compact(self, self.result.as_ref(), verbose) + if mdvs_dir.is_symlink() { + substeps.push(Step::failed( + ErrorKind::User, + format!( + "'{}' is a symlink — refusing to delete for safety", + mdvs_dir.display() + ), + delete_start.elapsed().as_millis() as u64, + )); + let msg = format!( + "'{}' is a symlink — refusing to delete for safety", + mdvs_dir.display() + ); + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + }; } - fn format_text(&self, verbose: bool) -> String { - if let Some(result) = &self.result { - if verbose { - let mut out = String::new(); - out.push_str(&format!( - "{}\n", - self.process.delete_index.format_line("Delete index") + let (removed, path_str, files_removed, size_bytes) = if mdvs_dir.exists() { + let (files_removed, size_bytes) = match walk_dir_stats(&mdvs_dir) { + Ok(stats) => stats, + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + delete_start.elapsed().as_millis() as u64, )); - out.push('\n'); - out.push_str(&result.format_text(verbose)); - out - } else { - result.format_text(verbose) + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: e.to_string(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + }; } - } else { - // Pipeline failed — show the step error - format!( - "{}\n", - self.process.delete_index.format_line("Delete index") - ) + }; + + let backend = Backend::parquet(path); + if let Err(e) = backend.clean() { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + delete_start.elapsed().as_millis() as u64, + )); + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: e.to_string(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + }; } - } -} - -/// Delete the `.mdvs/` index directory if it exists. -#[instrument(name = "clean", skip_all)] -pub fn run(path: &Path) -> CleanCommandOutput { - use crate::pipeline::delete_index::run_delete_index; - let delete_step = run_delete_index(path); - - let result = match &delete_step { - ProcessingStepResult::Completed(step) => Some(CleanResult { - removed: step.output.removed, - path: PathBuf::from(&step.output.path), - files_removed: step.output.files_removed, - size_bytes: step.output.size_bytes, - }), - _ => None, + ( + true, + mdvs_dir.display().to_string(), + files_removed, + size_bytes, + ) + } else { + (false, mdvs_dir.display().to_string(), 0, 0) }; - CleanCommandOutput { - process: CleanProcessOutput { - delete_index: delete_step, + substeps.push(Step::leaf( + Outcome::DeleteIndex(DeleteIndexOutcome { + removed, + path: path_str.clone(), + files_removed, + size_bytes, + }), + delete_start.elapsed().as_millis() as u64, + )); + + Step { + substeps, + outcome: StepOutcome::Complete { + result: Ok(Outcome::Clean(CleanOutcome { + removed, + path: PathBuf::from(&path_str), + files_removed, + size_bytes, + })), + elapsed_ms: start.elapsed().as_millis() as u64, }, - result, } } #[cfg(test)] mod tests { use super::*; + use crate::outcome::Outcome; + use crate::step::StepOutcome; use std::fs; + fn unwrap_clean(step: &Step) -> &CleanOutcome { + match &step.outcome { + StepOutcome::Complete { + result: Ok(Outcome::Clean(o)), + .. + } => o, + other => panic!("expected Ok(Clean), got: {other:?}"), + } + } + #[test] fn clean_removes_mdvs_dir() { let tmp = tempfile::tempdir().unwrap(); - - // Create mdvs.toml and .mdvs/ with a dummy file fs::write(tmp.path().join("mdvs.toml"), "[scan]\nglob = \"**\"\n").unwrap(); let mdvs_dir = tmp.path().join(".mdvs"); fs::create_dir_all(&mdvs_dir).unwrap(); fs::write(mdvs_dir.join("files.parquet"), "dummy").unwrap(); - let output = run(tmp.path()); - assert!(!output.has_failed_step()); - let result = output.result.unwrap(); + let step = run(tmp.path()); + assert!(!crate::step::has_failed(&step)); + + let result = unwrap_clean(&step); assert!(result.removed); assert!(!mdvs_dir.exists()); assert_eq!(result.files_removed, 1); - assert_eq!(result.size_bytes, 5); // "dummy" is 5 bytes - // mdvs.toml should be untouched + assert_eq!(result.size_bytes, 5); assert!(tmp.path().join("mdvs.toml").exists()); } @@ -151,9 +175,10 @@ mod tests { fn clean_nothing_to_clean() { let tmp = tempfile::tempdir().unwrap(); - let output = run(tmp.path()); - assert!(!output.has_failed_step()); - let result = output.result.unwrap(); + let step = run(tmp.path()); + assert!(!crate::step::has_failed(&step)); + + let result = unwrap_clean(&step); assert!(!result.removed); assert_eq!(result.files_removed, 0); assert_eq!(result.size_bytes, 0); diff --git a/src/cmd/info.rs b/src/cmd/info.rs index 8eee063..b92e3a4 100644 --- a/src/cmd/info.rs +++ b/src/cmd/info.rs @@ -1,16 +1,18 @@ -use crate::output::{field_hints, format_hints, format_json_compact, CommandOutput, FieldHint}; -use crate::pipeline::read_config::ReadConfigOutput; -use crate::pipeline::read_index::ReadIndexOutput; -use crate::pipeline::scan::ScanOutput; -use crate::pipeline::ProcessingStepResult; -use crate::table::{style_compact, style_record, Builder}; +use crate::discover::scan::ScannedFiles; +use crate::index::backend::Backend; +use crate::outcome::commands::InfoOutcome; +use crate::outcome::{Outcome, ReadConfigOutcome, ReadIndexOutcome, ScanOutcome}; +use crate::output::{field_hints, FieldHint}; +use crate::schema::config::MdvsToml; +use crate::step::{ErrorKind, Step, StepError, StepOutcome}; use serde::Serialize; use serde_json::Value; use std::collections::HashMap; use std::path::Path; +use std::time::Instant; use tracing::instrument; -/// A field definition from `mdvs.toml`, rendered for display. +/// A single field definition for info display. #[derive(Debug, Serialize)] pub struct InfoField { /// Field name. @@ -34,7 +36,7 @@ pub struct InfoField { pub hints: Vec, } -/// Metadata and statistics about a built search index. +/// Built index metadata for info display. #[derive(Debug, Serialize)] pub struct IndexInfo { /// Embedding model name. @@ -56,339 +58,226 @@ pub struct IndexInfo { pub config_status: String, } -/// Output of the `info` command. -#[derive(Debug, Serialize)] -pub struct InfoResult { - /// Glob pattern from `[scan]` config. - pub scan_glob: String, - /// Number of markdown files matching the scan pattern. - pub files_on_disk: usize, - /// Field definitions from `[[fields.field]]`. - pub fields: Vec, - /// Field names in the `[fields].ignore` list. - pub ignored_fields: Vec, - /// Index info, if a built index exists. - pub index: Option, -} - -impl CommandOutput for InfoResult { - fn format_text(&self, verbose: bool) -> String { - let mut out = String::new(); - - // One-liner - match &self.index { - Some(idx) => out.push_str(&format!( - "{} files, {} fields, {} chunks\n", - self.files_on_disk, - self.fields.len(), - idx.chunks, - )), - None => out.push_str(&format!( - "{} files, {} fields\n", - self.files_on_disk, - self.fields.len(), - )), - } - - // Metadata table (only when index exists) - if let Some(idx) = &self.index { - out.push('\n'); - let mut builder = Builder::default(); - builder.push_record(["model:", &idx.model]); - if verbose { - let rev = idx.revision.as_deref().unwrap_or("none"); - builder.push_record(["revision:", rev]); - builder.push_record(["chunk size:", &idx.chunk_size.to_string()]); - builder.push_record(["built:", &idx.built_at]); +/// Read config, scan files, and read index metadata. +#[instrument(name = "info", skip_all)] +pub fn run(path: &Path, _verbose: bool) -> Step { + let start = Instant::now(); + let mut substeps = Vec::new(); + + // 1. Read config — calls MdvsToml::read() + validate() directly + let config_start = Instant::now(); + let config_path_buf = path.join("mdvs.toml"); + let config = match MdvsToml::read(&config_path_buf) { + Ok(cfg) => match cfg.validate() { + Ok(()) => { + substeps.push(Step::leaf( + Outcome::ReadConfig(ReadConfigOutcome { + config_path: config_path_buf.display().to_string(), + }), + config_start.elapsed().as_millis() as u64, + )); + Some(cfg) } - builder.push_record(["config:", &idx.config_status]); - builder.push_record([ - "files:", - &format!("{}/{}", idx.files_indexed, idx.files_on_disk), - ]); - let mut table = builder.build(); - style_compact(&mut table); - out.push_str(&format!("{table}\n")); - } - - // Fields table - if !self.fields.is_empty() { - out.push('\n'); - if verbose { - for f in &self.fields { - let mut builder = Builder::default(); - let count_str = match (f.count, f.total_files) { - (Some(c), Some(t)) => format!("{c}/{t}"), - _ => String::new(), - }; - builder.push_record([ - format!("\"{}\"", f.name), - f.field_type.clone(), - count_str, - ]); - - let mut detail_lines = Vec::new(); - if !f.required.is_empty() { - detail_lines.push(" required:".to_string()); - for g in &f.required { - detail_lines.push(format!(" - \"{g}\"")); - } - } - detail_lines.push(" allowed:".to_string()); - for g in &f.allowed { - detail_lines.push(format!(" - \"{g}\"")); - } - if f.nullable { - detail_lines.push(" nullable: true".to_string()); - } - if !f.hints.is_empty() { - detail_lines.push(format!(" hints: {}", format_hints(&f.hints))); - } - - builder.push_record([detail_lines.join("\n"), String::new(), String::new()]); - let mut table = builder.build(); - style_record(&mut table, 3); - out.push_str(&format!("{table}\n")); - } - } else { - let mut builder = Builder::default(); - for f in &self.fields { - let required_str = if f.required.is_empty() { - String::new() - } else { - let globs: Vec = - f.required.iter().map(|g| format!("\"{g}\"")).collect(); - format!("required: {}", globs.join(", ")) - }; - let allowed_str = { - let globs: Vec = - f.allowed.iter().map(|g| format!("\"{g}\"")).collect(); - format!("allowed: {}", globs.join(", ")) - }; - let type_str = if f.nullable { - format!("{}?", f.field_type) - } else { - f.field_type.clone() - }; - let mut row = vec![ - format!("\"{}\"", f.name), - type_str, - required_str, - allowed_str, - ]; - let hints_str = format_hints(&f.hints); - if !hints_str.is_empty() { - row.push(hints_str); - } - builder.push_record(row); - } - let mut table = builder.build(); - style_compact(&mut table); - out.push_str(&format!("{table}\n")); + Err(e) => { + substeps.push(Step::failed( + ErrorKind::User, + format!("mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'"), + config_start.elapsed().as_millis() as u64, + )); + None } + }, + Err(e) => { + substeps.push(Step::failed( + ErrorKind::User, + e.to_string(), + config_start.elapsed().as_millis() as u64, + )); + None } + }; - out - } -} - -// ============================================================================ -// InfoCommandOutput (pipeline-based) -// ============================================================================ - -/// Pipeline record for the info command. -#[derive(Debug, Serialize)] -pub struct InfoProcessOutput { - /// Read config step result. - pub read_config: ProcessingStepResult, - /// Scan step result. - pub scan: ProcessingStepResult, - /// Read index step result. - pub read_index: ProcessingStepResult, -} - -/// Full output of the info command: pipeline steps + command result. -#[derive(Debug, Serialize)] -pub struct InfoCommandOutput { - /// Processing steps and their outcomes. - pub process: InfoProcessOutput, - /// Command result (None if pipeline didn't complete). - pub result: Option, -} - -impl InfoCommandOutput { - /// Returns `true` if any processing step failed. - pub fn has_failed_step(&self) -> bool { - matches!(self.process.read_config, ProcessingStepResult::Failed(_)) - || matches!(self.process.scan, ProcessingStepResult::Failed(_)) - || matches!(self.process.read_index, ProcessingStepResult::Failed(_)) - } -} + let config = match config { + Some(c) => c, + None => { + substeps.push(Step::skipped()); // scan + substeps.push(Step::skipped()); // read_index + let msg = match &substeps[0].outcome { + StepOutcome::Complete { result: Err(e), .. } => e.message.clone(), + _ => "failed to read config".into(), + }; + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + }; + } + }; -impl CommandOutput for InfoCommandOutput { - fn format_json(&self, verbose: bool) -> String { - format_json_compact(self, self.result.as_ref(), verbose) - } + // 2. Scan — calls ScannedFiles::scan() directly + let scan_start = Instant::now(); + let scanned = match ScannedFiles::scan(path, &config.scan) { + Ok(s) => { + substeps.push(Step::leaf( + Outcome::Scan(ScanOutcome { + files_found: s.files.len(), + glob: config.scan.glob.clone(), + }), + scan_start.elapsed().as_millis() as u64, + )); + Some(s) + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + scan_start.elapsed().as_millis() as u64, + )); + None + } + }; - fn format_text(&self, verbose: bool) -> String { - if let Some(result) = &self.result { - if verbose { - let mut out = String::new(); - out.push_str(&format!( - "{}\n", - self.process.read_config.format_line("Read config") - )); - out.push_str(&format!("{}\n", self.process.scan.format_line("Scan"))); - out.push_str(&format!( - "{}\n", - self.process.read_index.format_line("Read index") + // 3. Read index — calls Backend methods directly + let index_start = Instant::now(); + let backend = Backend::parquet(path); + let index_data = if !backend.exists() { + substeps.push(Step::leaf( + Outcome::ReadIndex(ReadIndexOutcome { + exists: false, + files_indexed: 0, + chunks: 0, + }), + index_start.elapsed().as_millis() as u64, + )); + None + } else { + let build_meta = backend.read_metadata().ok().flatten(); + let idx_stats = backend.stats().ok().flatten(); + match (build_meta, idx_stats) { + (Some(metadata), Some(stats)) => { + substeps.push(Step::leaf( + Outcome::ReadIndex(ReadIndexOutcome { + exists: true, + files_indexed: stats.files_indexed, + chunks: stats.chunks, + }), + index_start.elapsed().as_millis() as u64, )); - out.push('\n'); - out.push_str(&result.format_text(verbose)); - out - } else { - result.format_text(verbose) - } - } else { - // Pipeline didn't complete — show steps up to the failure - let mut out = String::new(); - out.push_str(&format!( - "{}\n", - self.process.read_config.format_line("Read config") - )); - if !matches!(self.process.scan, ProcessingStepResult::Skipped) { - out.push_str(&format!("{}\n", self.process.scan.format_line("Scan"))); + Some((metadata, stats)) } - if !matches!(self.process.read_index, ProcessingStepResult::Skipped) { - out.push_str(&format!( - "{}\n", - self.process.read_index.format_line("Read index") + _ => { + substeps.push(Step::leaf( + Outcome::ReadIndex(ReadIndexOutcome { + exists: false, + files_indexed: 0, + chunks: 0, + }), + index_start.elapsed().as_millis() as u64, )); + None } - out } - } -} - -/// Read config, scan files, and read index metadata. -#[instrument(name = "info", skip_all)] -pub fn run(path: &Path, verbose: bool) -> InfoCommandOutput { - use crate::pipeline::read_config::run_read_config; - use crate::pipeline::read_index::run_read_index; - use crate::pipeline::scan::run_scan; - - let (read_config_step, config) = run_read_config(path); - - let (scan_step, scanned) = match &config { - Some(cfg) => run_scan(path, &cfg.scan), - None => (ProcessingStepResult::Skipped, None), }; - let (read_index_step, index_data) = match &config { - Some(_) => run_read_index(path), - None => (ProcessingStepResult::Skipped, None), + // Build InfoOutcome from config + scanned + index_data + let empty_files = Vec::new(); + let files = scanned.as_ref().map(|s| &s.files).unwrap_or(&empty_files); + let total_files = files.len(); + + let field_counts: HashMap = { + let mut counts = HashMap::new(); + for file in files { + if let Some(Value::Object(map)) = &file.data { + for key in map.keys() { + *counts.entry(key.clone()).or_insert(0) += 1; + } + } + } + counts }; - // Build InfoResult from config + scanned + index_data - let result = match (&config, &scanned) { - (Some(cfg), Some(scanned)) => { - let total_files = scanned.files.len(); - - // Count files per field (verbose only) - let field_counts: HashMap = if verbose { - let mut counts = HashMap::new(); - for file in &scanned.files { - if let Some(Value::Object(map)) = &file.data { - for key in map.keys() { - *counts.entry(key.clone()).or_insert(0) += 1; - } - } - } - counts + let fields: Vec = config + .fields + .field + .iter() + .map(|f| InfoField { + name: f.name.clone(), + field_type: f.field_type.to_string(), + allowed: f.allowed.clone(), + required: f.required.clone(), + nullable: f.nullable, + count: Some(*field_counts.get(&f.name).unwrap_or(&0)), + total_files: Some(total_files), + hints: field_hints(&f.name), + }) + .collect(); + + let index = index_data.map(|(metadata, stats)| { + let config_match = config.embedding_model.as_ref() == Some(&metadata.embedding_model) + && config.chunking.as_ref() == Some(&metadata.chunking); + IndexInfo { + model: metadata.embedding_model.name, + revision: metadata.embedding_model.revision, + chunk_size: metadata.chunking.max_chunk_size, + files_indexed: stats.files_indexed, + files_on_disk: total_files, + chunks: stats.chunks, + built_at: metadata.built_at, + config_status: if config_match { + "match".to_string() } else { - HashMap::new() - }; - - // Fields from toml - let fields: Vec = cfg - .fields - .field - .iter() - .map(|f| InfoField { - name: f.name.clone(), - field_type: f.field_type.to_string(), - allowed: f.allowed.clone(), - required: f.required.clone(), - nullable: f.nullable, - count: if verbose { - Some(*field_counts.get(&f.name).unwrap_or(&0)) - } else { - None - }, - total_files: if verbose { Some(total_files) } else { None }, - hints: field_hints(&f.name), - }) - .collect(); - - // Index info from index_data - let index = index_data.map(|data| { - let config_match = cfg.embedding_model.as_ref() - == Some(&data.metadata.embedding_model) - && cfg.chunking.as_ref() == Some(&data.metadata.chunking); - IndexInfo { - model: data.metadata.embedding_model.name, - revision: data.metadata.embedding_model.revision, - chunk_size: data.metadata.chunking.max_chunk_size, - files_indexed: data.stats.files_indexed, - files_on_disk: total_files, - chunks: data.stats.chunks, - built_at: data.metadata.built_at, - config_status: if config_match { - "match".to_string() - } else { - "changed — rebuild recommended".to_string() - }, - } - }); + "changed — rebuild recommended".to_string() + }, + } + }); - Some(InfoResult { - scan_glob: cfg.scan.glob.clone(), + Step { + substeps, + outcome: StepOutcome::Complete { + result: Ok(Outcome::Info(Box::new(InfoOutcome { + scan_glob: config.scan.glob.clone(), files_on_disk: total_files, fields, - ignored_fields: cfg.fields.ignore.clone(), + ignored_fields: config.fields.ignore.clone(), index, - }) - } - _ => None, - }; - - InfoCommandOutput { - process: InfoProcessOutput { - read_config: read_config_step, - scan: scan_step, - read_index: read_index_step, + }))), + elapsed_ms: start.elapsed().as_millis() as u64, }, - result, } } #[cfg(test)] mod tests { use super::*; + use crate::outcome::Outcome; use crate::schema::config::{FieldsConfig, MdvsToml, SearchConfig, UpdateConfig}; use crate::schema::shared::{ChunkingConfig, EmbeddingModelConfig, FieldTypeSerde, ScanConfig}; + use crate::step::StepOutcome; use std::fs; + fn unwrap_info(step: &Step) -> &InfoOutcome { + match &step.outcome { + StepOutcome::Complete { + result: Ok(Outcome::Info(o)), + .. + } => o, + other => panic!("expected Ok(Info), got: {other:?}"), + } + } + fn create_test_vault(dir: &Path) { let blog_dir = dir.join("blog"); fs::create_dir_all(&blog_dir).unwrap(); - fs::write( blog_dir.join("post1.md"), "---\ntitle: Rust Programming\ntags:\n - rust\n - code\ndraft: false\n---\n# Rust Programming\nBody.", ) .unwrap(); - fs::write( blog_dir.join("post2.md"), "---\ntitle: Cooking Recipes\ndraft: true\n---\n# Cooking Recipes\nBody.", @@ -454,13 +343,8 @@ mod tests { } async fn init_and_build(dir: &Path) { - let output = crate::cmd::init::run( - dir, "**", false, false, true, false, // skip_gitignore - false, // verbose - ); - assert!(!output.has_failed_step()); - - // Build the index + let step = crate::cmd::init::run(dir, "**", false, false, true, false, false); + assert!(!crate::step::has_failed(&step)); let output = crate::cmd::build::run(dir, None, None, None, false, true, false).await; assert!(!output.has_failed_step()); } @@ -470,20 +354,13 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); write_config(tmp.path()); - - let output = run(tmp.path(), false); - assert!(!output.has_failed_step()); - let result = output.result.unwrap(); - + let step = run(tmp.path(), false); + assert!(!crate::step::has_failed(&step)); + let result = unwrap_info(&step); assert_eq!(result.scan_glob, "**"); assert_eq!(result.files_on_disk, 2); assert_eq!(result.fields.len(), 3); assert_eq!(result.fields[0].name, "title"); - assert_eq!(result.fields[0].field_type, "String"); - assert_eq!(result.fields[1].name, "tags"); - assert_eq!(result.fields[1].field_type, "String[]"); - assert_eq!(result.fields[2].name, "draft"); - assert_eq!(result.fields[2].field_type, "Boolean"); assert_eq!(result.ignored_fields, vec!["internal_id"]); assert!(result.index.is_none()); } @@ -493,14 +370,12 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); init_and_build(tmp.path()).await; - - let output = run(tmp.path(), false); - assert!(!output.has_failed_step()); - let result = output.result.unwrap(); - + let step = run(tmp.path(), false); + assert!(!crate::step::has_failed(&step)); + let result = unwrap_info(&step); assert_eq!(result.files_on_disk, 2); assert!(result.index.is_some()); - let idx = result.index.unwrap(); + let idx = result.index.as_ref().unwrap(); assert_eq!(idx.model, "minishlab/potion-base-8M"); assert_eq!(idx.files_indexed, 2); assert!(idx.chunks > 0); @@ -512,19 +387,17 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); init_and_build(tmp.path()).await; - - // Change chunk_size in toml let mut config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); config.chunking.as_mut().unwrap().max_chunk_size = 512; config.write(&tmp.path().join("mdvs.toml")).unwrap(); - - let output = run(tmp.path(), false); - assert!(!output.has_failed_step()); - let result = output.result.unwrap(); - + let step = run(tmp.path(), false); + assert!(!crate::step::has_failed(&step)); + let result = unwrap_info(&step); assert!(result.index.is_some()); - let idx = result.index.unwrap(); - assert_eq!(idx.config_status, "changed — rebuild recommended"); + assert_eq!( + result.index.as_ref().unwrap().config_status, + "changed — rebuild recommended" + ); } #[test] @@ -535,7 +408,6 @@ mod tests { "---\nauthor's_note: hello\n---\n# Note\nBody.", ) .unwrap(); - let config = MdvsToml { scan: ScanConfig { glob: "**".into(), @@ -560,11 +432,9 @@ mod tests { search: None, }; config.write(&tmp.path().join("mdvs.toml")).unwrap(); - - let output = run(tmp.path(), false); - assert!(!output.has_failed_step()); - let result = output.result.unwrap(); - + let step = run(tmp.path(), false); + assert!(!crate::step::has_failed(&step)); + let result = unwrap_info(&step); assert_eq!(result.fields.len(), 1); assert_eq!(result.fields[0].name, "author's_note"); assert!(result.fields[0] diff --git a/src/cmd/init.rs b/src/cmd/init.rs index bc577ae..63185b3 100644 --- a/src/cmd/init.rs +++ b/src/cmd/init.rs @@ -1,216 +1,12 @@ -use crate::output::{ - format_file_count, format_hints, format_json_compact, CommandOutput, DiscoveredField, -}; -use crate::pipeline::infer::{run_infer, InferOutput}; -use crate::pipeline::scan::{run_scan, ScanOutput}; -use crate::pipeline::write_config::{run_write_config, WriteConfigOutput}; -use crate::pipeline::{ErrorKind, ProcessingStepError, ProcessingStepResult}; +use crate::outcome::commands::InitOutcome; +use crate::outcome::{InferOutcome, Outcome, ScanOutcome, WriteConfigOutcome}; +use crate::output::DiscoveredField; use crate::schema::shared::ScanConfig; -use crate::table::{style_compact, style_record, Builder}; -use serde::Serialize; -use std::path::{Path, PathBuf}; +use crate::step::{from_pipeline_result, ErrorKind, Step, StepError, StepOutcome}; +use std::path::Path; +use std::time::Instant; use tracing::{info, instrument}; -// ============================================================================ -// InitResult -// ============================================================================ - -/// Result of the `init` command: discovered fields from schema inference. -#[derive(Debug, Serialize)] -pub struct InitResult { - /// Directory where `mdvs.toml` was written. - pub path: PathBuf, - /// Number of markdown files scanned. - pub files_scanned: usize, - /// Fields inferred from frontmatter. - pub fields: Vec, - /// Whether this was a dry run (no files written). - pub dry_run: bool, -} - -impl CommandOutput for InitResult { - fn format_text(&self, verbose: bool) -> String { - let mut out = String::new(); - - // One-liner - let field_summary = if self.fields.is_empty() { - "no fields found".to_string() - } else { - format!("{} field(s)", self.fields.len()) - }; - let dry_run_suffix = if self.dry_run { " (dry run)" } else { "" }; - out.push_str(&format!( - "Initialized {} — {field_summary}{dry_run_suffix}\n", - format_file_count(self.files_scanned) - )); - - if !self.fields.is_empty() { - out.push('\n'); - if verbose { - // Record tables per field - for field in &self.fields { - let mut builder = Builder::default(); - builder.push_record([ - format!("\"{}\"", field.name), - field.field_type.clone(), - format!("{}/{}", field.files_found, field.total_files), - ]); - let mut detail_lines = Vec::new(); - if let Some(ref req) = field.required { - if !req.is_empty() { - detail_lines.push(" required:".to_string()); - for g in req { - detail_lines.push(format!(" - \"{g}\"")); - } - } - } - if let Some(ref allowed) = field.allowed { - detail_lines.push(" allowed:".to_string()); - for g in allowed { - detail_lines.push(format!(" - \"{g}\"")); - } - } - if field.nullable { - detail_lines.push(" nullable: true".to_string()); - } - if !field.hints.is_empty() { - detail_lines.push(format!(" hints: {}", format_hints(&field.hints))); - } - builder.push_record([detail_lines.join("\n"), String::new(), String::new()]); - let mut table = builder.build(); - style_record(&mut table, 3); - out.push_str(&format!("{table}\n")); - } - } else { - // Compact table - let mut builder = Builder::default(); - for field in &self.fields { - let type_str = if field.nullable { - format!("{}?", field.field_type) - } else { - field.field_type.clone() - }; - let mut row = vec![ - format!("\"{}\"", field.name), - type_str, - format!("{}/{}", field.files_found, field.total_files), - ]; - let hints_str = format_hints(&field.hints); - if !hints_str.is_empty() { - row.push(hints_str); - } - builder.push_record(row); - } - let mut table = builder.build(); - style_compact(&mut table); - out.push_str(&format!("{table}\n")); - } - } - - if self.dry_run { - out.push_str("(dry run, nothing written)\n"); - } else { - out.push_str(&format!( - "\nInitialized mdvs in '{}'\n", - self.path.display() - )); - } - - out - } -} - -// ============================================================================ -// InitCommandOutput (pipeline) -// ============================================================================ - -/// Step records for each phase of the init pipeline. -#[derive(Debug, Serialize)] -pub struct InitProcessOutput { - /// Scan the project directory for markdown files. - pub scan: ProcessingStepResult, - /// Infer field types and glob patterns. - pub infer: ProcessingStepResult, - /// Write `mdvs.toml` to disk. - pub write_config: ProcessingStepResult, -} - -/// Complete output of the `init` command. -#[derive(Debug, Serialize)] -pub struct InitCommandOutput { - /// Step-by-step process records. - pub process: InitProcessOutput, - /// Init result (present when init completes successfully). - pub result: Option, -} - -impl InitCommandOutput { - /// Returns `true` if any step failed. - pub fn has_failed_step(&self) -> bool { - matches!(self.process.scan, ProcessingStepResult::Failed(_)) - || matches!(self.process.infer, ProcessingStepResult::Failed(_)) - || matches!(self.process.write_config, ProcessingStepResult::Failed(_)) - } -} - -impl CommandOutput for InitCommandOutput { - fn format_json(&self, verbose: bool) -> String { - format_json_compact(self, self.result.as_ref(), verbose) - } - - fn format_text(&self, verbose: bool) -> String { - if let Some(result) = &self.result { - if verbose { - let mut out = String::new(); - out.push_str(&format!("{}\n", self.process.scan.format_line("Scan"))); - out.push_str(&format!("{}\n", self.process.infer.format_line("Infer"))); - out.push_str(&format!( - "{}\n", - self.process.write_config.format_line("Write config") - )); - out.push('\n'); - out.push_str(&result.format_text(verbose)); - out - } else { - result.format_text(verbose) - } - } else { - // Pipeline didn't complete — show steps up to the failure - let mut out = String::new(); - out.push_str(&format!("{}\n", self.process.scan.format_line("Scan"))); - if !matches!(self.process.infer, ProcessingStepResult::Skipped) { - out.push_str(&format!("{}\n", self.process.infer.format_line("Infer"))); - } - if !matches!(self.process.write_config, ProcessingStepResult::Skipped) { - out.push_str(&format!( - "{}\n", - self.process.write_config.format_line("Write config") - )); - } - out - } - } -} - -// ============================================================================ -// run() -// ============================================================================ - -/// Helper to construct a failed InitCommandOutput where failure lands on the scan step. -fn fail_at_scan(message: String) -> InitCommandOutput { - InitCommandOutput { - process: InitProcessOutput { - scan: ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::User, - message, - }), - infer: ProcessingStepResult::Skipped, - write_config: ProcessingStepResult::Skipped, - }, - result: None, - } -} - /// Scan a directory, infer frontmatter schema, and write `mdvs.toml`. /// Schema-only — no model download, no embedding, no `.mdvs/` created. #[instrument(name = "init", skip_all)] @@ -221,22 +17,41 @@ pub fn run( dry_run: bool, ignore_bare_files: bool, skip_gitignore: bool, - verbose: bool, -) -> InitCommandOutput { + _verbose: bool, +) -> Step { + use crate::pipeline::infer::run_infer; + use crate::pipeline::scan::run_scan; + use crate::pipeline::write_config::run_write_config; + + let start = Instant::now(); + let mut substeps = Vec::new(); + info!(path = %path.display(), "initializing"); // Pre-checks if !path.is_dir() { - return fail_at_scan(format!("'{}' is not a directory", path.display())); + return fail_early( + substeps, + start, + ErrorKind::User, + format!("'{}' is not a directory", path.display()), + 3, // scan + infer + write_config + ); } let config_path = path.join("mdvs.toml"); let mdvs_dir = path.join(".mdvs"); if !force && (config_path.exists() || mdvs_dir.exists()) { - return fail_at_scan(format!( - "mdvs is already initialized in '{}' (use --force to reinitialize)", - path.display() - )); + return fail_early( + substeps, + start, + ErrorKind::User, + format!( + "mdvs is already initialized in '{}' (use --force to reinitialize)", + path.display() + ), + 3, + ); } // --force: delete existing artifacts @@ -249,91 +64,178 @@ pub fn run( } } - // 1. scan + // 1. Scan let scan_config = ScanConfig { glob: glob.to_string(), include_bare_files: !ignore_bare_files, skip_gitignore, }; - let (scan_step, scanned) = run_scan(path, &scan_config); + let (scan_result, scanned) = run_scan(path, &scan_config); + substeps.push(from_pipeline_result(scan_result, |o| { + Outcome::Scan(ScanOutcome { + files_found: o.files_found, + glob: o.glob.clone(), + }) + })); + let scanned = match scanned { Some(s) => s, None => { - return InitCommandOutput { - process: InitProcessOutput { - scan: scan_step, - infer: ProcessingStepResult::Skipped, - write_config: ProcessingStepResult::Skipped, - }, - result: None, - }; + return fail_from_last_substep(&mut substeps, start, 2); // infer + write_config } }; - // 2. infer - let (infer_step, schema) = if scanned.files.is_empty() { - ( - ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::User, - message: format!("no markdown files found in '{}'", path.display()), - }), - None, - ) - } else { - run_infer(&scanned) - }; + // 2. Infer + if scanned.files.is_empty() { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: format!("no markdown files found in '{}'", path.display()), + }), + elapsed_ms: 0, + }, + }); + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); // write_config + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: format!("no markdown files found in '{}'", path.display()), + }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + }; + } + + let (infer_result, schema) = run_infer(&scanned); + substeps.push(from_pipeline_result(infer_result, |o| { + Outcome::Infer(InferOutcome { + fields_inferred: o.fields_inferred, + }) + })); + let schema = match schema { Some(s) => s, None => { - return InitCommandOutput { - process: InitProcessOutput { - scan: scan_step, - infer: infer_step, - write_config: ProcessingStepResult::Skipped, - }, - result: None, - }; + return fail_from_last_substep(&mut substeps, start, 1); // write_config } }; let total_files = scanned.files.len(); info!(fields = schema.fields.len(), "schema inferred"); - // Build InitResult from scan+infer data - let init_result = InitResult { - path: path.to_path_buf(), - files_scanned: total_files, - fields: schema - .fields - .iter() - .map(|f| f.to_discovered(total_files, verbose)) - .collect(), - dry_run, - }; - - // 3. write_config (Skipped if dry_run) - let write_config_step = if dry_run { - ProcessingStepResult::Skipped + // Build fields — always with full detail (verbose=true) since the full outcome carries all data + let fields: Vec = schema + .fields + .iter() + .map(|f| f.to_discovered(total_files, true)) + .collect(); + + // 3. Write config (Skipped if dry_run) + if dry_run { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); } else { - let (step, _config) = run_write_config(path, &schema, scan_config); - step - }; + let (write_result, _config) = run_write_config(path, &schema, scan_config); + substeps.push(from_pipeline_result(write_result, |o| { + Outcome::WriteConfig(WriteConfigOutcome { + config_path: o.config_path.clone(), + fields_written: o.fields_written, + }) + })); + } + + Step { + substeps, + outcome: StepOutcome::Complete { + result: Ok(Outcome::Init(Box::new(InitOutcome { + path: path.to_path_buf(), + files_scanned: total_files, + fields, + dry_run, + }))), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + } +} + +/// Helper: push N Skipped substeps and return a failed Step. +fn fail_early( + mut substeps: Vec>, + start: Instant, + kind: ErrorKind, + message: String, + skipped_count: usize, +) -> Step { + for _ in 0..skipped_count { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); + } + Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { kind, message }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + } +} - InitCommandOutput { - process: InitProcessOutput { - scan: scan_step, - infer: infer_step, - write_config: write_config_step, +/// Helper: extract error from last substep, push N Skipped, return failed Step. +fn fail_from_last_substep( + substeps: &mut Vec>, + start: Instant, + skipped_count: usize, +) -> Step { + let msg = match substeps.last().map(|s| &s.outcome) { + Some(StepOutcome::Complete { result: Err(e), .. }) => e.message.clone(), + _ => "step failed".into(), + }; + for _ in 0..skipped_count { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); + } + Step { + substeps: std::mem::take(substeps), + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, }, - result: Some(init_result), } } #[cfg(test)] mod tests { use super::*; + use crate::outcome::Outcome; + use crate::output::FieldHint; + use crate::step::StepOutcome; use std::fs; + fn unwrap_init(step: &Step) -> &InitOutcome { + match &step.outcome { + StepOutcome::Complete { + result: Ok(Outcome::Init(o)), + .. + } => o, + other => panic!("expected Ok(Init), got: {other:?}"), + } + } + fn create_test_vault(root: &Path) { let blog_dir = root.join("blog"); fs::create_dir_all(&blog_dir).unwrap(); @@ -354,18 +256,14 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); - let output = run(tmp.path(), "**", false, false, false, true, false); - assert!(!output.has_failed_step()); - assert!(output.result.is_some()); + let step = run(tmp.path(), "**", false, false, false, true, false); + assert!(!crate::step::has_failed(&step)); - let result = output.result.unwrap(); + let result = unwrap_init(&step); assert_eq!(result.files_scanned, 2); assert!(!result.fields.is_empty()); assert!(!result.dry_run); - - // mdvs.toml should exist assert!(tmp.path().join("mdvs.toml").exists()); - // .mdvs/ should NOT exist (no build) assert!(!tmp.path().join(".mdvs").exists()); } @@ -374,12 +272,10 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); - let output = run(tmp.path(), "**", false, true, false, true, false); - assert!(!output.has_failed_step()); - assert!(output.result.is_some()); - assert!(output.result.unwrap().dry_run); - - // Nothing written + let step = run(tmp.path(), "**", false, true, false, true, false); + assert!(!crate::step::has_failed(&step)); + let result = unwrap_init(&step); + assert!(result.dry_run); assert!(!tmp.path().join("mdvs.toml").exists()); } @@ -388,13 +284,11 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); - // First init - let output = run(tmp.path(), "**", false, false, false, true, false); - assert!(!output.has_failed_step()); + let step = run(tmp.path(), "**", false, false, false, true, false); + assert!(!crate::step::has_failed(&step)); - // Second init without --force - let output = run(tmp.path(), "**", false, false, false, true, false); - assert!(output.has_failed_step()); + let step = run(tmp.path(), "**", false, false, false, true, false); + assert!(crate::step::has_failed(&step)); } #[test] @@ -402,13 +296,11 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); - // First init - let output = run(tmp.path(), "**", false, false, false, true, false); - assert!(!output.has_failed_step()); + let step = run(tmp.path(), "**", false, false, false, true, false); + assert!(!crate::step::has_failed(&step)); - // Second init with --force - let output = run(tmp.path(), "**", true, false, false, true, false); - assert!(!output.has_failed_step()); + let step = run(tmp.path(), "**", true, false, false, true, false); + assert!(!crate::step::has_failed(&step)); } #[test] @@ -416,17 +308,14 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); - // Create a fake .mdvs/ directory fs::create_dir_all(tmp.path().join(".mdvs")).unwrap(); fs::write(tmp.path().join(".mdvs/files.parquet"), "fake").unwrap(); - // Init without --force should fail (mdvs already initialized) - let output = run(tmp.path(), "**", false, false, false, true, false); - assert!(output.has_failed_step()); + let step = run(tmp.path(), "**", false, false, false, true, false); + assert!(crate::step::has_failed(&step)); - // Init with --force should succeed and clean .mdvs/ - let output = run(tmp.path(), "**", true, false, false, true, false); - assert!(!output.has_failed_step()); + let step = run(tmp.path(), "**", true, false, false, true, false); + assert!(!crate::step::has_failed(&step)); assert!(!tmp.path().join(".mdvs").exists()); } @@ -435,8 +324,8 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); fs::create_dir_all(tmp.path().join("empty")).unwrap(); - let output = run(tmp.path(), "empty/**", false, false, false, true, false); - assert!(output.has_failed_step()); + let step = run(tmp.path(), "empty/**", false, false, false, true, false); + assert!(crate::step::has_failed(&step)); } #[test] @@ -445,8 +334,8 @@ mod tests { let file = tmp.path().join("not-a-dir"); fs::write(&file, "hello").unwrap(); - let output = run(&file, "**", false, false, false, true, false); - assert!(output.has_failed_step()); + let step = run(&file, "**", false, false, false, true, false); + assert!(crate::step::has_failed(&step)); } #[test] @@ -454,17 +343,14 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); - let output = run(tmp.path(), "**", false, false, false, true, false); - assert!(!output.has_failed_step()); + let step = run(tmp.path(), "**", false, false, false, true, false); + assert!(!crate::step::has_failed(&step)); - // Read back the config and verify [check] section let config = crate::schema::config::MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); assert!(config.check.is_some()); assert!(config.check.unwrap().auto_update); - // No model/chunking sections (filled by first build) assert!(config.embedding_model.is_none()); assert!(config.chunking.is_none()); - // Auto-flag sections are present assert!(config.build.is_some()); assert!(config.build.unwrap().auto_update); assert!(config.search.is_some()); @@ -474,8 +360,6 @@ mod tests { #[test] fn hints_for_special_char_field_names() { - use crate::output::FieldHint; - let tmp = tempfile::tempdir().unwrap(); fs::create_dir_all(tmp.path()).unwrap(); fs::write( @@ -484,10 +368,10 @@ mod tests { ) .unwrap(); - let output = run(tmp.path(), "**", false, false, false, true, false); - assert!(!output.has_failed_step()); + let step = run(tmp.path(), "**", false, false, false, true, false); + assert!(!crate::step::has_failed(&step)); - let result = output.result.unwrap(); + let result = unwrap_init(&step); let sq_field = result .fields .iter() @@ -495,7 +379,6 @@ mod tests { .unwrap(); assert!(sq_field.hints.contains(&FieldHint::EscapeSingleQuotes)); - // Normal fields should have no hints let title_field = result.fields.iter().find(|f| f.name == "title").unwrap(); assert!(title_field.hints.is_empty()); } diff --git a/src/cmd/search.rs b/src/cmd/search.rs index 465da09..fe054ec 100644 --- a/src/cmd/search.rs +++ b/src/cmd/search.rs @@ -1,102 +1,14 @@ -use crate::index::backend::{Backend, SearchHit}; -use crate::output::{format_json_compact, CommandOutput}; -use crate::pipeline::embed::EmbedQueryOutput; -use crate::pipeline::execute_search::ExecuteSearchOutput; -use crate::pipeline::load_model::LoadModelOutput; -use crate::pipeline::read_config::ReadConfigOutput; -use crate::pipeline::read_index::ReadIndexOutput; -use crate::pipeline::{ErrorKind, ProcessingStepError, ProcessingStepResult}; -use crate::table::{style_compact, style_record, Builder}; -use serde::Serialize; +use crate::index::backend::Backend; +use crate::outcome::commands::SearchOutcome; +use crate::outcome::{ + EmbedQueryOutcome, ExecuteSearchOutcome, LoadModelOutcome, Outcome, ReadConfigOutcome, + ReadIndexOutcome, +}; +use crate::step::{from_pipeline_result, ErrorKind, Step, StepError, StepOutcome}; use std::path::Path; +use std::time::Instant; use tracing::{instrument, warn}; -/// Result of the `search` command: ranked list of matching files. -#[derive(Debug, Serialize)] -pub struct SearchResult { - /// The query string. - pub query: String, - /// Files ranked by cosine similarity to the query, descending. - pub hits: Vec, - /// Name of the embedding model used. - pub model_name: String, - /// Result limit that was applied. - pub limit: usize, -} - -impl CommandOutput for SearchResult { - fn format_text(&self, verbose: bool) -> String { - let mut out = String::new(); - - // One-liner - let hit_word = if self.hits.len() == 1 { "hit" } else { "hits" }; - out.push_str(&format!( - "Searched \"{}\" — {} {}\n", - self.query, - self.hits.len(), - hit_word - )); - - if self.hits.is_empty() { - return out; - } - - out.push('\n'); - - if verbose { - // Record tables: one per hit with chunk text - for (i, hit) in self.hits.iter().enumerate() { - let mut builder = Builder::default(); - let idx = format!("{}", i + 1); - let path = format!("\"{}\"", hit.filename); - let score = format!("{:.3}", hit.score); - builder.push_record([idx.as_str(), path.as_str(), score.as_str()]); - - let detail = match (&hit.chunk_text, hit.start_line, hit.end_line) { - (Some(text), Some(start), Some(end)) => { - let indented: String = text - .lines() - .map(|l| format!(" {l}")) - .collect::>() - .join("\n"); - format!(" lines {start}-{end}:\n{indented}") - } - (None, Some(start), Some(end)) => format!(" lines {start}-{end}"), - _ => String::new(), - }; - - builder.push_record([detail.as_str(), "", ""]); - let mut table = builder.build(); - style_record(&mut table, 3); - out.push_str(&format!("{table}\n")); - } - - // Footer - out.push_str(&format!( - "{} {} | model: \"{}\" | limit: {}\n", - self.hits.len(), - hit_word, - self.model_name, - self.limit, - )); - } else { - // Compact table - let mut builder = Builder::default(); - for (i, hit) in self.hits.iter().enumerate() { - let idx = format!("{}", i + 1); - let path = format!("\"{}\"", hit.filename); - let score = format!("{:.3}", hit.score); - builder.push_record([idx.as_str(), path.as_str(), score.as_str()]); - } - let mut table = builder.build(); - style_compact(&mut table); - out.push_str(&format!("{table}\n")); - } - - out - } -} - /// Read lines from a file (1-indexed, inclusive range). fn read_lines(path: &Path, start: i32, end: i32) -> Option { let content = std::fs::read_to_string(path).ok()?; @@ -109,142 +21,6 @@ fn read_lines(path: &Path, start: i32, end: i32) -> Option { Some(lines[start..end].join("\n")) } -// ============================================================================ -// SearchCommandOutput (pipeline-based) -// ============================================================================ - -/// Pipeline record for the search command. -#[derive(Debug, Serialize)] -pub struct SearchProcessOutput { - /// Auto-build output (if auto-build was triggered). - #[serde(skip_serializing_if = "Option::is_none")] - pub auto_build: Option, - /// Read config step result. - pub read_config: ProcessingStepResult, - /// Read index step result. - pub read_index: ProcessingStepResult, - /// Load model step result. - pub load_model: ProcessingStepResult, - /// Embed query step result. - pub embed_query: ProcessingStepResult, - /// Execute search step result. - pub execute_search: ProcessingStepResult, -} - -/// Full output of the search command: pipeline steps + command result. -#[derive(Debug, Serialize)] -pub struct SearchCommandOutput { - /// Processing steps and their outcomes. - pub process: SearchProcessOutput, - /// Command result (None if pipeline didn't complete). - pub result: Option, -} - -impl SearchCommandOutput { - /// Returns `true` if any processing step failed. - pub fn has_failed_step(&self) -> bool { - self.process - .auto_build - .as_ref() - .is_some_and(|b| b.has_failed_step()) - || matches!(self.process.read_config, ProcessingStepResult::Failed(_)) - || matches!(self.process.read_index, ProcessingStepResult::Failed(_)) - || matches!(self.process.load_model, ProcessingStepResult::Failed(_)) - || matches!(self.process.embed_query, ProcessingStepResult::Failed(_)) - || matches!(self.process.execute_search, ProcessingStepResult::Failed(_)) - } -} - -impl CommandOutput for SearchCommandOutput { - fn format_json(&self, verbose: bool) -> String { - format_json_compact(self, self.result.as_ref(), verbose) - } - - fn format_text(&self, verbose: bool) -> String { - let mut preamble = String::new(); - if let Some(ref build_output) = self.process.auto_build { - if verbose { - preamble.push_str("Auto-build:\n"); - let build_text = build_output.format_text(true); - for line in build_text.lines() { - preamble.push_str(&format!(" {line}\n")); - } - preamble.push('\n'); - } else if let Some(ref br) = build_output.result { - preamble.push_str(&format!( - "Built index — {} files, {} chunks\n", - br.files_total, br.chunks_total - )); - } - } - - let body = if let Some(result) = &self.result { - if verbose { - let mut out = String::new(); - out.push_str(&format!( - "{}\n", - self.process.read_config.format_line("Read config") - )); - out.push_str(&format!( - "{}\n", - self.process.read_index.format_line("Read index") - )); - out.push_str(&format!( - "{}\n", - self.process.load_model.format_line("Load model") - )); - out.push_str(&format!( - "{}\n", - self.process.embed_query.format_line("Embed query") - )); - out.push_str(&format!( - "{}\n", - self.process.execute_search.format_line("Search") - )); - out.push('\n'); - out.push_str(&result.format_text(verbose)); - out - } else { - result.format_text(verbose) - } - } else { - // Pipeline didn't complete — show steps up to the failure - let mut out = String::new(); - out.push_str(&format!( - "{}\n", - self.process.read_config.format_line("Read config") - )); - if !matches!(self.process.read_index, ProcessingStepResult::Skipped) { - out.push_str(&format!( - "{}\n", - self.process.read_index.format_line("Read index") - )); - } - if !matches!(self.process.load_model, ProcessingStepResult::Skipped) { - out.push_str(&format!( - "{}\n", - self.process.load_model.format_line("Load model") - )); - } - if !matches!(self.process.embed_query, ProcessingStepResult::Skipped) { - out.push_str(&format!( - "{}\n", - self.process.embed_query.format_line("Embed query") - )); - } - if !matches!(self.process.execute_search, ProcessingStepResult::Skipped) { - out.push_str(&format!( - "{}\n", - self.process.execute_search.format_line("Search") - )); - } - out - }; - - format!("{preamble}{body}") - } -} - /// Embed a query, search the index, and return ranked results. #[instrument(name = "search", skip_all)] pub async fn run( @@ -254,38 +30,43 @@ pub async fn run( where_clause: Option<&str>, no_update: bool, no_build: bool, - verbose: bool, -) -> SearchCommandOutput { + _verbose: bool, +) -> Step { use crate::pipeline::embed::run_embed_query; use crate::pipeline::execute_search::run_execute_search; use crate::pipeline::load_model::run_load_model; use crate::pipeline::read_config::run_read_config; use crate::pipeline::read_index::run_read_index; - let (read_config_step, config) = run_read_config(path); + let start = Instant::now(); + let mut substeps = Vec::new(); + + // 1. Read config + let (read_config_result, config) = run_read_config(path); + substeps.push(from_pipeline_result(read_config_result, |o| { + Outcome::ReadConfig(ReadConfigOutcome { + config_path: o.config_path.clone(), + }) + })); // Auto-build: run build before searching if configured let auto_build = if let Some(ref cfg) = config { let should_build = !no_build && cfg.search.as_ref().is_some_and(|s| s.auto_build); if should_build { let build_no_update = no_update || !cfg.search.as_ref().is_some_and(|s| s.auto_update); - let build_output = - crate::cmd::build::run(path, None, None, None, false, build_no_update, verbose) - .await; - if build_output.has_failed_step() { - return SearchCommandOutput { - process: SearchProcessOutput { - auto_build: Some(build_output), - read_config: read_config_step, - read_index: ProcessingStepResult::Skipped, - load_model: ProcessingStepResult::Skipped, - embed_query: ProcessingStepResult::Skipped, - execute_search: ProcessingStepResult::Skipped, - }, - result: None, - }; + let build_step = + crate::cmd::build::run(path, None, None, None, false, build_no_update, false).await; + if build_step.has_failed_step() { + substeps.push(build_step); + return fail_msg( + &mut substeps, + start, + ErrorKind::User, + "auto-build failed", + 4, + ); } - Some(build_output) + Some(build_step) } else { None } @@ -293,79 +74,135 @@ pub async fn run( None }; + // Push auto-build substep if it ran + if let Some(build_step) = auto_build { + substeps.push(build_step); + } + // Re-read config if auto-build ran - let (read_config_step, config) = if auto_build.is_some() { - run_read_config(path) + let config = if substeps.len() > 1 { + // auto-build ran, re-read + let (result, cfg) = run_read_config(path); + substeps.push(from_pipeline_result(result, |o| { + Outcome::ReadConfig(ReadConfigOutcome { + config_path: o.config_path.clone(), + }) + })); + match cfg { + Some(c) => Some(c), + None => return fail_from_last(&mut substeps, start, 3), + } } else { - (read_config_step, config) + config }; let embedding = config.as_ref().and_then(|c| c.embedding_model.as_ref()); - let (read_index_step, index_data) = match &config { + // 2. Read index + let (read_index_result, index_data) = match &config { Some(_) => run_read_index(path), - None => (ProcessingStepResult::Skipped, None), - }; - - // Pre-checks before loading model (fail fast on user errors) - let pre_check_error: Option = if config.is_none() { - None // already failed at read_config - } else if embedding.is_none() { - Some("missing [embedding_model] in mdvs.toml (run `mdvs build` first)".to_string()) - } else if matches!(read_index_step, ProcessingStepResult::Failed(_)) { - None // already failed at read_index - } else if index_data.is_none() { - Some("index not found (run `mdvs build` first)".to_string()) - } else { - // Model mismatch check - let data = index_data.as_ref().unwrap(); - let emb = embedding.unwrap(); - if data.metadata.embedding_model != *emb { - Some(format!( - "model mismatch: config has '{}' (rev {:?}) but index was built with '{}' (rev {:?}) — run 'mdvs build' to rebuild", - emb.name, emb.revision, - data.metadata.embedding_model.name, data.metadata.embedding_model.revision, - )) - } else { - None + None => { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); + return fail_from_last(&mut substeps, start, 3); } }; - - let (load_model_step, embedder) = match (embedding, &pre_check_error) { - (Some(emb), None) => run_load_model(emb), - (_, Some(msg)) => { - let err = ProcessingStepError { - kind: ErrorKind::User, - message: msg.clone(), - }; - (ProcessingStepResult::Failed(err), None) + substeps.push(from_pipeline_result(read_index_result, |o| { + Outcome::ReadIndex(ReadIndexOutcome { + exists: o.exists, + files_indexed: o.files_indexed, + chunks: o.chunks, + }) + })); + + // Pre-checks before loading model + let pre_check_error: Option = match (config.as_ref(), embedding, index_data.as_ref()) { + (None, _, _) => None, // already failed + (_, None, _) => { + Some("missing [embedding_model] in mdvs.toml (run `mdvs build` first)".to_string()) + } + (_, _, None) => Some("index not found (run `mdvs build` first)".to_string()), + (_, Some(emb), Some(data)) => { + if data.metadata.embedding_model != *emb { + Some(format!( + "model mismatch: config has '{}' (rev {:?}) but index was built with '{}' (rev {:?}) — run 'mdvs build' to rebuild", + emb.name, emb.revision, + data.metadata.embedding_model.name, data.metadata.embedding_model.revision, + )) + } else { + None + } } - _ => (ProcessingStepResult::Skipped, None), }; - let (embed_query_step, query_embedding) = match &embedder { - Some(emb) => run_embed_query(emb, query).await, - None => (ProcessingStepResult::Skipped, None), + // 3. Load model + if let Some(msg) = pre_check_error { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::User, + message: msg, + }), + elapsed_ms: 0, + }, + }); + return fail_from_last(&mut substeps, start, 2); + } + + let emb_config = embedding.unwrap(); + let (load_model_result, embedder) = run_load_model(emb_config); + substeps.push(from_pipeline_result(load_model_result, |o| { + Outcome::LoadModel(LoadModelOutcome { + model_name: o.model_name.clone(), + dimension: o.dimension, + }) + })); + let embedder = match embedder { + Some(e) => e, + None => return fail_from_last(&mut substeps, start, 2), }; - let (execute_search_step, hits) = match (&config, query_embedding) { - (Some(cfg), Some(qe)) => { - let backend = Backend::parquet(path); - let (prefix, aliases) = match &cfg.search { - Some(sc) => (sc.internal_prefix.as_str(), &sc.aliases), - None => ("", &std::collections::HashMap::new()), - }; - run_execute_search(&backend, qe, where_clause, limit, prefix, aliases).await - } - _ => (ProcessingStepResult::Skipped, None), + // 4. Embed query + let (embed_query_result, query_embedding) = run_embed_query(&embedder, query).await; + substeps.push(from_pipeline_result(embed_query_result, |o| { + Outcome::EmbedQuery(EmbedQueryOutcome { + query: o.query.clone(), + }) + })); + let query_embedding = match query_embedding { + Some(qe) => qe, + None => return fail_from_last(&mut substeps, start, 1), }; - // Build result with chunk text populated if verbose - let result = hits.map(|mut hits| { - if verbose { + // 5. Execute search + let cfg = config.as_ref().unwrap(); + let backend = Backend::parquet(path); + let (prefix, aliases) = match &cfg.search { + Some(sc) => (sc.internal_prefix.as_str(), &sc.aliases), + None => ("", &std::collections::HashMap::new()), + }; + let (execute_result, hits) = run_execute_search( + &backend, + query_embedding, + where_clause, + limit, + prefix, + aliases, + ) + .await; + substeps.push(from_pipeline_result(execute_result, |o| { + Outcome::ExecuteSearch(ExecuteSearchOutcome { hits: o.hits }) + })); + + // Build result with chunk text populated (always — full outcome carries all data) + let hits = match hits { + Some(mut hits) => { for hit in &mut hits { - if let (Some(start), Some(end)) = (hit.start_line, hit.end_line) { - match read_lines(&path.join(&hit.filename), start, end) { + if let (Some(s), Some(e)) = (hit.start_line, hit.end_line) { + match read_lines(&path.join(&hit.filename), s, e) { Some(text) => hit.chunk_text = Some(text), None => warn!( file = %hit.filename, @@ -374,26 +211,78 @@ pub async fn run( } } } + hits } - let model_name = embedding.map(|e| e.name.clone()).unwrap_or_default(); - SearchResult { - query: query.to_string(), - hits, - model_name, - limit, - } - }); - - SearchCommandOutput { - process: SearchProcessOutput { - auto_build, - read_config: read_config_step, - read_index: read_index_step, - load_model: load_model_step, - embed_query: embed_query_step, - execute_search: execute_search_step, + None => return fail_from_last(&mut substeps, start, 0), + }; + + let model_name = emb_config.name.clone(); + Step { + substeps, + outcome: StepOutcome::Complete { + result: Ok(Outcome::Search(Box::new(SearchOutcome { + query: query.to_string(), + hits, + model_name, + limit, + }))), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + } +} + +fn fail_from_last( + substeps: &mut Vec>, + start: Instant, + skipped: usize, +) -> Step { + let msg = match substeps.iter().rev().find_map(|s| match &s.outcome { + StepOutcome::Complete { result: Err(e), .. } => Some(e.message.clone()), + _ => None, + }) { + Some(m) => m, + None => "step failed".into(), + }; + for _ in 0..skipped { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); + } + Step { + substeps: std::mem::take(substeps), + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + } +} + +fn fail_msg( + substeps: &mut Vec>, + start: Instant, + kind: ErrorKind, + msg: &str, + skipped: usize, +) -> Step { + for _ in 0..skipped { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); + } + Step { + substeps: std::mem::take(substeps), + outcome: StepOutcome::Complete { + result: Err(StepError { + kind, + message: msg.into(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, }, - result, } } @@ -401,20 +290,36 @@ pub async fn run( mod tests { use super::*; use crate::index::embed::{Embedder, ModelConfig}; + use crate::outcome::commands::SearchOutcome; use crate::schema::config::{FieldsConfig, MdvsToml, SearchConfig, UpdateConfig}; use crate::schema::shared::{ChunkingConfig, EmbeddingModelConfig, ScanConfig}; use std::fs; + fn unwrap_search(step: &Step) -> &SearchOutcome { + match &step.outcome { + StepOutcome::Complete { + result: Ok(Outcome::Search(o)), + .. + } => o, + other => panic!("expected Ok(Search), got: {other:?}"), + } + } + + fn unwrap_error(step: &Step) -> &StepError { + match &step.outcome { + StepOutcome::Complete { result: Err(e), .. } => e, + other => panic!("expected Err, got: {other:?}"), + } + } + fn create_test_vault(dir: &Path) { let blog_dir = dir.join("blog"); fs::create_dir_all(&blog_dir).unwrap(); - fs::write( blog_dir.join("post1.md"), "---\ntitle: Rust Programming\ntags:\n - rust\n - code\ndraft: false\n---\n# Rust Programming\nRust is a systems programming language focused on safety and performance.", ) .unwrap(); - fs::write( blog_dir.join("post2.md"), "---\ntitle: Cooking Recipes\ndraft: true\n---\n# Cooking Recipes\nDelicious pasta recipes for weeknight dinners.", @@ -456,13 +361,8 @@ mod tests { } async fn init_and_build(dir: &Path) { - let output = crate::cmd::init::run( - dir, "**", false, false, true, false, // skip_gitignore - false, // verbose - ); - assert!(!output.has_failed_step()); - - // Build the index + let step = crate::cmd::init::run(dir, "**", false, false, true, false, false); + assert!(!crate::step::has_failed(&step)); let output = crate::cmd::build::run(dir, None, None, None, false, true, false).await; assert!(!output.has_failed_step()); } @@ -472,7 +372,6 @@ mod tests { let tmp = tempfile::tempdir().unwrap(); let output = run(tmp.path(), "test query", 10, None, true, true, false).await; assert!(output.has_failed_step()); - assert!(output.result.is_none()); } #[tokio::test] @@ -482,12 +381,8 @@ mod tests { let output = run(tmp.path(), "test query", 10, None, true, true, false).await; assert!(output.has_failed_step()); - assert!(output.result.is_none()); - if let ProcessingStepResult::Failed(err) = &output.process.load_model { - assert!(err.message.contains("index not found")); - } else { - panic!("expected load_model to fail"); - } + let err = unwrap_error(&output); + assert!(err.message.contains("index not found")); } #[tokio::test] @@ -498,15 +393,14 @@ mod tests { let output = run(tmp.path(), "rust programming", 10, None, true, true, false).await; assert!(!output.has_failed_step(), "search failed: {:?}", output); - let result = output.result.unwrap(); + let result = unwrap_search(&output); assert_eq!(result.query, "rust programming"); assert!(!result.model_name.is_empty()); assert!(!result.hits.is_empty()); - // start_line/end_line always present assert!(result.hits[0].start_line.is_some()); assert!(result.hits[0].end_line.is_some()); - // chunk_text not populated without verbose - assert!(result.hits[0].chunk_text.is_none()); + // chunk_text always populated now (full outcome carries all data) + assert!(result.hits[0].chunk_text.is_some()); } #[tokio::test] @@ -517,9 +411,8 @@ mod tests { let output = run(tmp.path(), "rust programming", 10, None, true, true, true).await; assert!(!output.has_failed_step()); - let result = output.result.unwrap(); + let result = unwrap_search(&output); assert!(!result.hits.is_empty()); - // chunk_text populated in verbose mode assert!(result.hits[0].chunk_text.is_some()); } @@ -590,11 +483,8 @@ mod tests { let output = run(tmp.path(), "test query", 10, None, true, true, false).await; assert!(output.has_failed_step()); - if let ProcessingStepResult::Failed(err) = &output.process.load_model { - assert!(err.message.contains("model mismatch")); - } else { - panic!("expected load_model to fail"); - } + let err = unwrap_error(&output); + assert!(err.message.contains("model mismatch")); } #[tokio::test] @@ -614,12 +504,8 @@ mod tests { ) .await; assert!(output.has_failed_step()); - if let ProcessingStepResult::Failed(err) = &output.process.execute_search { - assert!(err.message.contains("unmatched single quote")); - assert!(err.message.contains("O''Brien")); - } else { - panic!("expected execute_search to fail"); - } + let err = unwrap_error(&output); + assert!(err.message.contains("unmatched single quote")); } #[tokio::test] @@ -630,11 +516,8 @@ mod tests { let output = run(tmp.path(), "test", 10, Some("x = \"bad"), true, true, false).await; assert!(output.has_failed_step()); - if let ProcessingStepResult::Failed(err) = &output.process.execute_search { - assert!(err.message.contains("unmatched double quote")); - } else { - panic!("expected execute_search to fail"); - } + let err = unwrap_error(&output); + assert!(err.message.contains("unmatched double quote")); } #[tokio::test] @@ -643,7 +526,6 @@ mod tests { create_test_vault(tmp.path()); init_and_build(tmp.path()).await; - // 2 single quotes (even) — passes parity check but DataFusion rejects it let output = run( tmp.path(), "test", @@ -663,7 +545,6 @@ mod tests { create_test_vault(tmp.path()); init_and_build(tmp.path()).await; - // Properly escaped single quotes should pass validation let output = run( tmp.path(), "test", @@ -674,10 +555,10 @@ mod tests { false, ) .await; - // Should not fail with quote parity error (may fail for other reasons like no match) - if let ProcessingStepResult::Failed(err) = &output.process.execute_search { + // Should not fail with quote parity error + if let StepOutcome::Complete { result: Err(e), .. } = &output.outcome { assert!( - !err.message.contains("unmatched"), + !e.message.contains("unmatched"), "balanced quotes should not trigger parity check" ); } diff --git a/src/cmd/update.rs b/src/cmd/update.rs index b77f3a6..bf7863d 100644 --- a/src/cmd/update.rs +++ b/src/cmd/update.rs @@ -1,302 +1,14 @@ -use crate::output::{ - format_file_count, format_hints, format_json_compact, ChangedField, CommandOutput, - DiscoveredField, FieldChange, RemovedField, -}; -use crate::pipeline::infer::{run_infer, InferOutput}; -use crate::pipeline::read_config::{run_read_config, ReadConfigOutput}; -use crate::pipeline::scan::{run_scan, ScanOutput}; -use crate::pipeline::write_config::WriteConfigOutput; -use crate::pipeline::{ErrorKind, ProcessingStep, ProcessingStepError, ProcessingStepResult}; +use crate::outcome::commands::UpdateOutcome; +use crate::outcome::{InferOutcome, Outcome, ReadConfigOutcome, ScanOutcome, WriteConfigOutcome}; +use crate::output::{ChangedField, FieldChange, RemovedField}; use crate::schema::config::TomlField; use crate::schema::shared::FieldTypeSerde; -use crate::table::{style_compact, style_record, Builder}; -use serde::Serialize; +use crate::step::{from_pipeline_result, ErrorKind, Step, StepError, StepOutcome}; use std::collections::HashMap; use std::path::Path; use std::time::Instant; use tracing::{info, instrument}; -// ============================================================================ -// UpdateResult -// ============================================================================ - -/// Result of the `update` command: field changes discovered by re-inference. -#[derive(Debug, Serialize)] -pub struct UpdateResult { - /// Number of markdown files scanned. - pub files_scanned: usize, - /// Newly discovered fields not previously in `mdvs.toml`. - pub added: Vec, - /// Fields whose type or glob constraints changed during re-inference. - pub changed: Vec, - /// Fields that disappeared from all files during re-inference. - pub removed: Vec, - /// Number of fields that remained identical. - pub unchanged: usize, - /// Whether this was a dry run (no files written). - pub dry_run: bool, -} - -impl UpdateResult { - fn has_changes(&self) -> bool { - !self.added.is_empty() || !self.changed.is_empty() || !self.removed.is_empty() - } -} - -impl CommandOutput for UpdateResult { - fn format_text(&self, verbose: bool) -> String { - let mut out = String::new(); - - // One-liner - let total_changes = self.added.len() + self.changed.len() + self.removed.len(); - let summary = if total_changes == 0 { - "no changes".to_string() - } else { - format!("{total_changes} field(s) changed") - }; - let dry_run_suffix = if self.dry_run { " (dry run)" } else { "" }; - out.push_str(&format!( - "Scanned {} — {summary}{dry_run_suffix}\n", - format_file_count(self.files_scanned) - )); - - if !self.has_changes() { - return out; - } - - // Changes table - out.push('\n'); - if verbose { - // Record tables per field change - for field in &self.added { - let mut builder = Builder::default(); - builder.push_record([ - format!("\"{}\"", field.name), - "added".to_string(), - field.field_type.clone(), - ]); - let mut detail_lines = Vec::new(); - if let Some(ref globs) = field.allowed { - detail_lines.push(" found in:".to_string()); - for g in globs { - detail_lines.push(format!(" - \"{g}\"")); - } - } - if field.nullable { - detail_lines.push(" nullable: true".to_string()); - } - if !field.hints.is_empty() { - detail_lines.push(format!(" hints: {}", format_hints(&field.hints))); - } - builder.push_record([detail_lines.join("\n"), String::new(), String::new()]); - let mut table = builder.build(); - style_record(&mut table, 3); - out.push_str(&format!("{table}\n")); - } - for field in &self.changed { - let mut builder = Builder::default(); - builder.push_record(["field", "aspect", "old", "new"]); - for (i, change) in field.changes.iter().enumerate() { - let name_col = if i == 0 { - format!("\"{}\"", field.name) - } else { - String::new() - }; - let (old, new) = change.format_old_new(); - builder.push_record([name_col, change.label().to_string(), old, new]); - } - let mut table = builder.build(); - style_compact(&mut table); - out.push_str(&format!("{table}\n")); - } - for field in &self.removed { - let mut builder = Builder::default(); - builder.push_record([ - format!("\"{}\"", field.name), - "removed".to_string(), - String::new(), - ]); - let detail = match &field.allowed { - Some(globs) => { - let mut lines = vec![" previously in:".to_string()]; - for g in globs { - lines.push(format!(" - \"{g}\"")); - } - lines.join("\n") - } - None => String::new(), - }; - builder.push_record([detail, String::new(), String::new()]); - let mut table = builder.build(); - style_record(&mut table, 3); - out.push_str(&format!("{table}\n")); - } - } else { - // Compact: separate tables per category to avoid empty trailing columns - if !self.added.is_empty() { - let mut builder = Builder::default(); - for field in &self.added { - let globs_summary = field - .allowed - .as_ref() - .map(|g| { - g.iter() - .map(|s| format!("\"{s}\"")) - .collect::>() - .join(", ") - }) - .unwrap_or_default(); - let type_str = if field.nullable { - format!("{}?", field.field_type) - } else { - field.field_type.clone() - }; - let mut row = vec![ - format!("\"{}\"", field.name), - "added".to_string(), - type_str, - globs_summary, - ]; - let hints_str = format_hints(&field.hints); - if !hints_str.is_empty() { - row.push(hints_str); - } - builder.push_record(row); - } - let mut table = builder.build(); - style_compact(&mut table); - out.push_str(&format!("{table}\n")); - } - if !self.changed.is_empty() { - let mut builder = Builder::default(); - for field in &self.changed { - let aspects: Vec<&str> = field.changes.iter().map(FieldChange::label).collect(); - builder.push_record([format!("\"{}\"", field.name), aspects.join(", ")]); - } - let mut table = builder.build(); - style_compact(&mut table); - out.push_str(&format!("{table}\n")); - } - if !self.removed.is_empty() { - let mut builder = Builder::default(); - for field in &self.removed { - builder.push_record([format!("\"{}\"", field.name), "removed".to_string()]); - } - let mut table = builder.build(); - style_compact(&mut table); - out.push_str(&format!("{table}\n")); - } - } - - out - } -} - -// ============================================================================ -// UpdateCommandOutput (pipeline) -// ============================================================================ - -/// Step records for each phase of the update pipeline. -#[derive(Debug, Serialize)] -pub struct UpdateProcessOutput { - /// Read and parse `mdvs.toml`. - pub read_config: ProcessingStepResult, - /// Scan the project directory for markdown files. - pub scan: ProcessingStepResult, - /// Infer field types and glob patterns. - pub infer: ProcessingStepResult, - /// Write updated `mdvs.toml` to disk. - pub write_config: ProcessingStepResult, -} - -/// Complete output of the `update` command. -#[derive(Debug, Serialize)] -pub struct UpdateCommandOutput { - /// Step-by-step process records. - pub process: UpdateProcessOutput, - /// Update result (present when update completes successfully). - pub result: Option, -} - -impl UpdateCommandOutput { - /// Returns `true` if any step failed. - pub fn has_failed_step(&self) -> bool { - matches!(self.process.read_config, ProcessingStepResult::Failed(_)) - || matches!(self.process.scan, ProcessingStepResult::Failed(_)) - || matches!(self.process.infer, ProcessingStepResult::Failed(_)) - || matches!(self.process.write_config, ProcessingStepResult::Failed(_)) - } -} - -impl CommandOutput for UpdateCommandOutput { - fn format_json(&self, verbose: bool) -> String { - format_json_compact(self, self.result.as_ref(), verbose) - } - - fn format_text(&self, verbose: bool) -> String { - if let Some(result) = &self.result { - if verbose { - let mut out = String::new(); - out.push_str(&format!( - "{}\n", - self.process.read_config.format_line("Read config") - )); - out.push_str(&format!("{}\n", self.process.scan.format_line("Scan"))); - out.push_str(&format!("{}\n", self.process.infer.format_line("Infer"))); - out.push_str(&format!( - "{}\n", - self.process.write_config.format_line("Write config") - )); - out.push('\n'); - out.push_str(&result.format_text(verbose)); - out - } else { - result.format_text(verbose) - } - } else { - // Pipeline didn't complete — show steps up to the failure - let mut out = String::new(); - out.push_str(&format!( - "{}\n", - self.process.read_config.format_line("Read config") - )); - if !matches!(self.process.scan, ProcessingStepResult::Skipped) { - out.push_str(&format!("{}\n", self.process.scan.format_line("Scan"))); - } - if !matches!(self.process.infer, ProcessingStepResult::Skipped) { - out.push_str(&format!("{}\n", self.process.infer.format_line("Infer"))); - } - if !matches!(self.process.write_config, ProcessingStepResult::Skipped) { - out.push_str(&format!( - "{}\n", - self.process.write_config.format_line("Write config") - )); - } - out - } - } -} - -// ============================================================================ -// run() -// ============================================================================ - -/// Helper to build a skipped-everything output with a failed read_config step. -fn fail_at_read_config(message: String) -> UpdateCommandOutput { - UpdateCommandOutput { - process: UpdateProcessOutput { - read_config: ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::User, - message, - }), - scan: ProcessingStepResult::Skipped, - infer: ProcessingStepResult::Skipped, - write_config: ProcessingStepResult::Skipped, - }, - result: None, - } -} - /// Re-scan files, infer field changes, and update `mdvs.toml`. /// Pure inference — no build step. #[instrument(name = "update", skip_all)] @@ -305,85 +17,82 @@ pub async fn run( reinfer: &[String], reinfer_all: bool, dry_run: bool, - verbose: bool, -) -> UpdateCommandOutput { - // Pre-check: flag conflict (lands on read_config) + _verbose: bool, +) -> Step { + use crate::pipeline::infer::run_infer; + use crate::pipeline::read_config::run_read_config; + use crate::pipeline::scan::run_scan; + + let start = Instant::now(); + let mut substeps = Vec::new(); + + // Pre-check: flag conflict if !reinfer.is_empty() && reinfer_all { - return fail_at_read_config("cannot use --reinfer and --reinfer-all together".to_string()); + return fail_early( + substeps, + start, + ErrorKind::User, + "cannot use --reinfer and --reinfer-all together".into(), + 4, + ); } - // 1. read_config - let (read_config_step, config) = run_read_config(path); + // 1. Read config + let (read_config_result, config) = run_read_config(path); + substeps.push(from_pipeline_result(read_config_result, |o| { + Outcome::ReadConfig(ReadConfigOutcome { + config_path: o.config_path.clone(), + }) + })); + let mut config = match config { Some(c) => c, - None => { - return UpdateCommandOutput { - process: UpdateProcessOutput { - read_config: read_config_step, - scan: ProcessingStepResult::Skipped, - infer: ProcessingStepResult::Skipped, - write_config: ProcessingStepResult::Skipped, - }, - result: None, - }; - } + None => return fail_from_last_substep(&mut substeps, start, 3), }; - // Pre-check: reinfer field names exist in config (lands on read_config) + // Pre-check: reinfer field names exist for name in reinfer { if !config.fields.field.iter().any(|f| f.name == *name) { - return UpdateCommandOutput { - process: UpdateProcessOutput { - read_config: ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::User, - message: format!("field '{name}' is not in mdvs.toml"), - }), - scan: ProcessingStepResult::Skipped, - infer: ProcessingStepResult::Skipped, - write_config: ProcessingStepResult::Skipped, - }, - result: None, - }; + return fail_early( + std::mem::take(&mut substeps), + start, + ErrorKind::User, + format!("field '{name}' is not in mdvs.toml"), + 3, + ); } } - // 2. scan - let (scan_step, scanned) = run_scan(path, &config.scan); + // 2. Scan + let (scan_result, scanned) = run_scan(path, &config.scan); + substeps.push(from_pipeline_result(scan_result, |o| { + Outcome::Scan(ScanOutcome { + files_found: o.files_found, + glob: o.glob.clone(), + }) + })); + let scanned = match scanned { Some(s) => s, - None => { - return UpdateCommandOutput { - process: UpdateProcessOutput { - read_config: read_config_step, - scan: scan_step, - infer: ProcessingStepResult::Skipped, - write_config: ProcessingStepResult::Skipped, - }, - result: None, - }; - } + None => return fail_from_last_substep(&mut substeps, start, 2), }; - // 3. infer - let (infer_step, schema) = run_infer(&scanned); + // 3. Infer + let (infer_result, schema) = run_infer(&scanned); + substeps.push(from_pipeline_result(infer_result, |o| { + Outcome::Infer(InferOutcome { + fields_inferred: o.fields_inferred, + }) + })); + let schema = match schema { Some(s) => s, - None => { - return UpdateCommandOutput { - process: UpdateProcessOutput { - read_config: read_config_step, - scan: scan_step, - infer: infer_step, - write_config: ProcessingStepResult::Skipped, - }, - result: None, - }; - } + None => return fail_from_last_substep(&mut substeps, start, 1), }; let total_files = scanned.files.len(); - // --- Field comparison logic (inline) --- + // --- Field comparison logic (inline, unchanged from original) --- let (protected, targets): (Vec, Vec) = if reinfer_all { (vec![], config.fields.field.drain(..).collect()) } else if !reinfer.is_empty() { @@ -393,7 +102,6 @@ pub async fn run( .drain(..) .partition(|f| !reinfer.contains(&f.name)) } else { - // Default mode: all existing are protected, no targets (config.fields.field.drain(..).collect(), vec![]) }; @@ -458,7 +166,8 @@ pub async fn run( } new_fields.push(toml_field); } else { - added.push(inf.to_discovered(total_files, verbose)); + // Always collect full detail (verbose=true) — the full outcome carries everything + added.push(inf.to_discovered(total_files, true)); new_fields.push(toml_field); } } @@ -468,11 +177,8 @@ pub async fn run( .filter(|(name, _)| !schema.fields.iter().any(|f| f.name == **name)) .map(|(name, old_field)| RemovedField { name: name.to_string(), - allowed: if verbose { - Some(old_field.allowed.clone()) - } else { - None - }, + // Always collect full detail + allowed: Some(old_field.allowed.clone()), }) .collect(); removed.sort_by(|a, b| a.name.cmp(&b.name)); @@ -484,74 +190,151 @@ pub async fn run( "update complete" ); - let update_result = UpdateResult { - files_scanned: total_files, - added, - changed, - removed, - unchanged, - dry_run, - }; + let has_changes = !added.is_empty() || !changed.is_empty() || !removed.is_empty(); - // 4. write_config (Skipped if dry_run or no changes) - let write_config_step = if dry_run || !update_result.has_changes() { - ProcessingStepResult::Skipped + // 4. Write config (Skipped if dry_run or no changes) + if dry_run || !has_changes { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); } else { - let start = Instant::now(); + let write_start = Instant::now(); let config_path = path.join("mdvs.toml"); - config.fields.field = new_fields; + if let Err(e) = config.write(&config_path) { - return UpdateCommandOutput { - process: UpdateProcessOutput { - read_config: read_config_step, - scan: scan_step, - infer: infer_step, - write_config: ProcessingStepResult::Failed(ProcessingStepError { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { kind: ErrorKind::Application, message: e.to_string(), }), + elapsed_ms: write_start.elapsed().as_millis() as u64, + }, + }); + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: "failed to write config".into(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, }, - result: None, }; } - ProcessingStepResult::Completed(ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: WriteConfigOutput { - config_path: config_path.display().to_string(), - fields_written: config.fields.field.len(), + substeps.push(from_pipeline_result( + crate::pipeline::ProcessingStepResult::Completed(crate::pipeline::ProcessingStep { + elapsed_ms: write_start.elapsed().as_millis() as u64, + output: crate::pipeline::write_config::WriteConfigOutput { + config_path: config_path.display().to_string(), + fields_written: config.fields.field.len(), + }, + }), + |o| { + Outcome::WriteConfig(WriteConfigOutcome { + config_path: o.config_path.clone(), + fields_written: o.fields_written, + }) }, - }) - }; + )); + } - UpdateCommandOutput { - process: UpdateProcessOutput { - read_config: read_config_step, - scan: scan_step, - infer: infer_step, - write_config: write_config_step, + Step { + substeps, + outcome: StepOutcome::Complete { + result: Ok(Outcome::Update(Box::new(UpdateOutcome { + files_scanned: total_files, + added, + changed, + removed, + unchanged, + dry_run, + }))), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + } +} + +fn fail_early( + mut substeps: Vec>, + start: Instant, + kind: ErrorKind, + message: String, + skipped_count: usize, +) -> Step { + for _ in 0..skipped_count { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); + } + Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { kind, message }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + } +} + +fn fail_from_last_substep( + substeps: &mut Vec>, + start: Instant, + skipped_count: usize, +) -> Step { + let msg = match substeps.last().map(|s| &s.outcome) { + Some(StepOutcome::Complete { result: Err(e), .. }) => e.message.clone(), + _ => "step failed".into(), + }; + for _ in 0..skipped_count { + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Skipped, + }); + } + Step { + substeps: std::mem::take(substeps), + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, }, - result: Some(update_result), } } #[cfg(test)] mod tests { use super::*; - use crate::schema::config::MdvsToml; + use crate::outcome::commands::UpdateOutcome; + use crate::output::ViolationKind; + use crate::schema::config::{FieldsConfig, MdvsToml, UpdateConfig}; + use crate::schema::shared::{FieldTypeSerde, ScanConfig}; use std::fs; + fn unwrap_update(step: &Step) -> &UpdateOutcome { + match &step.outcome { + StepOutcome::Complete { + result: Ok(Outcome::Update(o)), + .. + } => o, + other => panic!("expected Ok(Update), got: {other:?}"), + } + } + fn create_test_vault(dir: &Path) { let blog_dir = dir.join("blog"); fs::create_dir_all(&blog_dir).unwrap(); - fs::write( blog_dir.join("post1.md"), "---\ntitle: Hello\ntags:\n - rust\n - code\ndraft: false\n---\n# Hello\nBody text.", ) .unwrap(); - fs::write( blog_dir.join("post2.md"), "---\ntitle: World\ndraft: true\n---\n# World\nMore text.", @@ -560,12 +343,8 @@ mod tests { } fn init_no_build(dir: &Path) { - let output = crate::cmd::init::run( - dir, "**", false, false, true, // ignore bare files - false, // skip_gitignore - false, // verbose - ); - assert!(!output.has_failed_step()); + let step = crate::cmd::init::run(dir, "**", false, false, true, false, false); + assert!(!crate::step::has_failed(&step)); } #[tokio::test] @@ -574,11 +353,13 @@ mod tests { create_test_vault(tmp.path()); init_no_build(tmp.path()); - let output = run(tmp.path(), &[], false, false, false).await; - assert!(!output.has_failed_step()); - let result = output.result.unwrap(); + let step = run(tmp.path(), &[], false, false, false).await; + assert!(!crate::step::has_failed(&step)); + let result = unwrap_update(&step); - assert!(!result.has_changes()); + assert!(result.added.is_empty()); + assert!(result.changed.is_empty()); + assert!(result.removed.is_empty()); assert_eq!(result.files_scanned, 2); assert_eq!(result.unchanged, 3); // title, tags, draft } @@ -589,16 +370,15 @@ mod tests { create_test_vault(tmp.path()); init_no_build(tmp.path()); - // Add a file with a new field fs::write( tmp.path().join("blog/post3.md"), "---\ntitle: New\nauthor: Alice\n---\n# New\nContent.", ) .unwrap(); - let output = run(tmp.path(), &[], false, false, false).await; - assert!(!output.has_failed_step()); - let result = output.result.unwrap(); + let step = run(tmp.path(), &[], false, false, false).await; + assert!(!crate::step::has_failed(&step)); + let result = unwrap_update(&step); assert_eq!(result.added.len(), 1); assert_eq!(result.added[0].name, "author"); @@ -607,9 +387,8 @@ mod tests { assert_eq!(result.added[0].total_files, 3); assert!(result.changed.is_empty()); assert!(result.removed.is_empty()); - assert_eq!(result.unchanged, 3); // title, tags, draft + assert_eq!(result.unchanged, 3); - // Verify toml was updated let toml = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); assert!(toml.fields.field.iter().any(|f| f.name == "author")); } @@ -620,25 +399,22 @@ mod tests { create_test_vault(tmp.path()); init_no_build(tmp.path()); - // Replace files so tags becomes a string instead of array fs::write( tmp.path().join("blog/post1.md"), "---\ntitle: Hello\ntags: single-tag\ndraft: false\n---\n# Hello\nBody text.", ) .unwrap(); - let output = run(tmp.path(), &["tags".to_string()], false, false, false).await; - assert!(!output.has_failed_step()); - let result = output.result.unwrap(); + let step = run(tmp.path(), &["tags".to_string()], false, false, false).await; + assert!(!crate::step::has_failed(&step)); + let result = unwrap_update(&step); assert_eq!(result.changed.len(), 1); assert_eq!(result.changed[0].name, "tags"); - assert!(result.changed[0].changes.iter().any(|c| matches!( - c, - FieldChange::Type { new, .. } if new == "String" - ))); - assert!(result.added.is_empty()); - assert!(result.removed.is_empty()); + assert!(result.changed[0] + .changes + .iter() + .any(|c| matches!(c, FieldChange::Type { new, .. } if new == "String"))); } #[tokio::test] @@ -647,23 +423,19 @@ mod tests { create_test_vault(tmp.path()); init_no_build(tmp.path()); - // Remove tags from all files fs::write( tmp.path().join("blog/post1.md"), "---\ntitle: Hello\ndraft: false\n---\n# Hello\nBody text.", ) .unwrap(); - let output = run(tmp.path(), &["tags".to_string()], false, false, false).await; - assert!(!output.has_failed_step()); - let result = output.result.unwrap(); + let step = run(tmp.path(), &["tags".to_string()], false, false, false).await; + assert!(!crate::step::has_failed(&step)); + let result = unwrap_update(&step); assert_eq!(result.removed.len(), 1); assert_eq!(result.removed[0].name, "tags"); - assert!(result.changed.is_empty()); - assert!(result.added.is_empty()); - // Verify toml no longer has tags let toml = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); assert!(!toml.fields.field.iter().any(|f| f.name == "tags")); } @@ -674,7 +446,7 @@ mod tests { create_test_vault(tmp.path()); init_no_build(tmp.path()); - let output = run( + let step = run( tmp.path(), &["nonexistent".to_string()], false, @@ -682,13 +454,7 @@ mod tests { false, ) .await; - - assert!(output.has_failed_step()); - let msg = match &output.process.read_config { - ProcessingStepResult::Failed(err) => &err.message, - _ => panic!("expected read_config step to fail"), - }; - assert!(msg.contains("field 'nonexistent' is not in mdvs.toml")); + assert!(crate::step::has_failed(&step)); } #[tokio::test] @@ -697,21 +463,8 @@ mod tests { create_test_vault(tmp.path()); init_no_build(tmp.path()); - let output = run( - tmp.path(), - &["tags".to_string()], - true, // reinfer_all - false, - false, - ) - .await; - - assert!(output.has_failed_step()); - let msg = match &output.process.read_config { - ProcessingStepResult::Failed(err) => &err.message, - _ => panic!("expected read_config step to fail"), - }; - assert!(msg.contains("cannot use --reinfer and --reinfer-all together")); + let step = run(tmp.path(), &["tags".to_string()], true, false, false).await; + assert!(crate::step::has_failed(&step)); } #[tokio::test] @@ -722,23 +475,17 @@ mod tests { let toml_before = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); - let output = run(tmp.path(), &[], true, false, false).await; - assert!(!output.has_failed_step()); - let result = output.result.unwrap(); + let step = run(tmp.path(), &[], true, false, false).await; + assert!(!crate::step::has_failed(&step)); + let result = unwrap_update(&step); - // All fields are re-inferred with same types → unchanged assert_eq!(result.unchanged, 3); assert!(result.added.is_empty()); assert!(result.changed.is_empty()); assert!(result.removed.is_empty()); - // Non-field config is preserved let toml_after = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); assert_eq!(toml_before.scan, toml_after.scan); - assert_eq!(toml_before.update, toml_after.update); - assert_eq!(toml_before.embedding_model, toml_after.embedding_model); - assert_eq!(toml_before.chunking, toml_after.chunking); - assert_eq!(toml_before.search, toml_after.search); } #[tokio::test] @@ -747,7 +494,6 @@ mod tests { create_test_vault(tmp.path()); init_no_build(tmp.path()); - // Add a new file fs::write( tmp.path().join("blog/post3.md"), "---\ntitle: New\nauthor: Alice\n---\n# New\nContent.", @@ -756,20 +502,13 @@ mod tests { let toml_before = fs::read_to_string(tmp.path().join("mdvs.toml")).unwrap(); - let output = run(tmp.path(), &[], false, true, false).await; - assert!(!output.has_failed_step()); - let result = output.result.unwrap(); + let step = run(tmp.path(), &[], false, true, false).await; + assert!(!crate::step::has_failed(&step)); + let result = unwrap_update(&step); assert!(result.dry_run); assert_eq!(result.added.len(), 1); - // write_config should be Skipped - assert!(matches!( - output.process.write_config, - ProcessingStepResult::Skipped - )); - - // Toml unchanged let toml_after = fs::read_to_string(tmp.path().join("mdvs.toml")).unwrap(); assert_eq!(toml_before, toml_after); } @@ -778,29 +517,16 @@ mod tests { async fn build_override_false() { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); + init_no_build(tmp.path()); - // Init (schema only) - let output = crate::cmd::init::run( - tmp.path(), - "**", - false, - false, - true, - false, // skip_gitignore - false, // verbose - ); - assert!(!output.has_failed_step()); - - // Add a new field fs::write( tmp.path().join("blog/post3.md"), "---\ntitle: New\nauthor: Alice\n---\n# New\nContent.", ) .unwrap(); - // Update is pure inference — never builds - let output = run(tmp.path(), &[], false, false, false).await; - assert!(!output.has_failed_step()); + let step = run(tmp.path(), &[], false, false, false).await; + assert!(!crate::step::has_failed(&step)); assert!(!tmp.path().join(".mdvs").exists()); } @@ -810,7 +536,6 @@ mod tests { let blog_dir = tmp.path().join("blog"); fs::create_dir_all(&blog_dir).unwrap(); - // Two files with frontmatter + one bare file fs::write( blog_dir.join("post1.md"), "---\ntitle: Hello\n---\n# Hello\nBody.", @@ -823,17 +548,8 @@ mod tests { .unwrap(); fs::write(blog_dir.join("bare.md"), "# No frontmatter\nJust content.").unwrap(); - // Init with ignore_bare_files=true → title required=["**"] - let output = crate::cmd::init::run( - tmp.path(), - "**", - false, - false, - true, // ignore bare files - false, // skip_gitignore - false, // verbose - ); - assert!(!output.has_failed_step()); + let step = crate::cmd::init::run(tmp.path(), "**", false, false, true, false, false); + assert!(!crate::step::has_failed(&step)); let toml_before = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); let title_before = toml_before @@ -844,20 +560,17 @@ mod tests { .unwrap(); assert_eq!(title_before.required, vec!["**"]); - // Flip include_bare_files to true let mut config = toml_before; config.scan.include_bare_files = true; config.write(&tmp.path().join("mdvs.toml")).unwrap(); - // Reinfer all — globs should change even though types don't - let output = run(tmp.path(), &[], true, false, false).await; - assert!(!output.has_failed_step()); - let result = output.result.unwrap(); - assert!(result.has_changes()); - assert_eq!(result.changed.len(), 1); - assert_eq!(result.changed[0].name, "title"); + let step = run(tmp.path(), &[], true, false, false).await; + assert!(!crate::step::has_failed(&step)); + let result = unwrap_update(&step); + assert!( + !result.added.is_empty() || !result.changed.is_empty() || !result.removed.is_empty() + ); - // Toml rewritten with narrower required let toml_after = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); let title_after = toml_after .fields @@ -874,19 +587,16 @@ mod tests { create_test_vault(tmp.path()); init_no_build(tmp.path()); - // Remove tags from all files fs::write( tmp.path().join("blog/post1.md"), "---\ntitle: Hello\ndraft: false\n---\n# Hello\nBody text.", ) .unwrap(); - // Default mode: tags should stay in toml even though it disappeared - let output = run(tmp.path(), &[], false, false, false).await; - assert!(!output.has_failed_step()); - let result = output.result.unwrap(); - - assert!(!result.has_changes()); + let step = run(tmp.path(), &[], false, false, false).await; + assert!(!crate::step::has_failed(&step)); + let result = unwrap_update(&step); + assert!(result.added.is_empty() && result.changed.is_empty() && result.removed.is_empty()); let toml = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); assert!(toml.fields.field.iter().any(|f| f.name == "tags")); diff --git a/src/main.rs b/src/main.rs index 1fedfa3..e5a6185 100644 --- a/src/main.rs +++ b/src/main.rs @@ -1,5 +1,5 @@ use clap::{Parser, Subcommand}; -use mdvs::output::CommandOutput; +use mdvs::block::Render; use std::path::PathBuf; /// Stderr logging level for `--logs`. @@ -167,7 +167,7 @@ async fn main() -> anyhow::Result<()> { ignore_bare_files, skip_gitignore, } => { - let output = mdvs::cmd::init::run( + let step = mdvs::cmd::init::run( &path, &glob, force, @@ -176,8 +176,23 @@ async fn main() -> anyhow::Result<()> { skip_gitignore, cli.verbose, ); - output.print(&cli.output, cli.verbose); - if output.has_failed_step() { + let failed = mdvs::step::has_failed(&step); + let output_str = match (&cli.output, cli.verbose) { + (mdvs::output::OutputFormat::Text, true) => { + mdvs::render::format_text(&step.render()) + } + (mdvs::output::OutputFormat::Text, false) => { + mdvs::render::format_text(&step.to_compact().render()) + } + (mdvs::output::OutputFormat::Json, true) => { + serde_json::to_string_pretty(&step).unwrap() + } + (mdvs::output::OutputFormat::Json, false) => { + serde_json::to_string_pretty(&step.to_compact()).unwrap() + } + }; + print!("{output_str}"); + if failed { std::process::exit(2); } Ok(()) @@ -190,7 +205,7 @@ async fn main() -> anyhow::Result<()> { force, no_update, } => { - let output = mdvs::cmd::build::run( + let step = mdvs::cmd::build::run( &path, set_model.as_deref(), set_revision.as_deref(), @@ -200,11 +215,27 @@ async fn main() -> anyhow::Result<()> { cli.verbose, ) .await; - output.print(&cli.output, cli.verbose); - if output.has_failed_step() { + let failed = mdvs::step::has_failed(&step); + let violations = mdvs::step::has_violations(&step); + let output_str = match (&cli.output, cli.verbose) { + (mdvs::output::OutputFormat::Text, true) => { + mdvs::render::format_text(&step.render()) + } + (mdvs::output::OutputFormat::Text, false) => { + mdvs::render::format_text(&step.to_compact().render()) + } + (mdvs::output::OutputFormat::Json, true) => { + serde_json::to_string_pretty(&step).unwrap() + } + (mdvs::output::OutputFormat::Json, false) => { + serde_json::to_string_pretty(&step.to_compact()).unwrap() + } + }; + print!("{output_str}"); + if failed { std::process::exit(2); } - if output.has_violations() { + if violations { std::process::exit(1); } Ok(()) @@ -217,7 +248,7 @@ async fn main() -> anyhow::Result<()> { no_update, no_build, } => { - let output = mdvs::cmd::search::run( + let step = mdvs::cmd::search::run( &path, &query, limit, @@ -227,19 +258,50 @@ async fn main() -> anyhow::Result<()> { cli.verbose, ) .await; - output.print(&cli.output, cli.verbose); - if output.has_failed_step() { + let failed = mdvs::step::has_failed(&step); + let output_str = match (&cli.output, cli.verbose) { + (mdvs::output::OutputFormat::Text, true) => { + mdvs::render::format_text(&step.render()) + } + (mdvs::output::OutputFormat::Text, false) => { + mdvs::render::format_text(&step.to_compact().render()) + } + (mdvs::output::OutputFormat::Json, true) => { + serde_json::to_string_pretty(&step).unwrap() + } + (mdvs::output::OutputFormat::Json, false) => { + serde_json::to_string_pretty(&step.to_compact()).unwrap() + } + }; + print!("{output_str}"); + if failed { std::process::exit(2); } Ok(()) } Command::Check { path, no_update } => { - let output = mdvs::cmd::check::run(&path, no_update, cli.verbose).await; - output.print(&cli.output, cli.verbose); - if output.has_failed_step() { + let step = mdvs::cmd::check::run(&path, no_update, cli.verbose).await; + let failed = mdvs::step::has_failed(&step); + let violations = mdvs::step::has_violations(&step); + let output_str = match (&cli.output, cli.verbose) { + (mdvs::output::OutputFormat::Text, true) => { + mdvs::render::format_text(&step.render()) + } + (mdvs::output::OutputFormat::Text, false) => { + mdvs::render::format_text(&step.to_compact().render()) + } + (mdvs::output::OutputFormat::Json, true) => { + serde_json::to_string_pretty(&step).unwrap() + } + (mdvs::output::OutputFormat::Json, false) => { + serde_json::to_string_pretty(&step.to_compact()).unwrap() + } + }; + print!("{output_str}"); + if failed { std::process::exit(2); } - if output.result.as_ref().is_some_and(|r| r.has_violations()) { + if violations { std::process::exit(1); } Ok(()) @@ -250,26 +312,71 @@ async fn main() -> anyhow::Result<()> { reinfer_all, dry_run, } => { - let output = + let step = mdvs::cmd::update::run(&path, &reinfer, reinfer_all, dry_run, cli.verbose).await; - output.print(&cli.output, cli.verbose); - if output.has_failed_step() { + let failed = mdvs::step::has_failed(&step); + let output_str = match (&cli.output, cli.verbose) { + (mdvs::output::OutputFormat::Text, true) => { + mdvs::render::format_text(&step.render()) + } + (mdvs::output::OutputFormat::Text, false) => { + mdvs::render::format_text(&step.to_compact().render()) + } + (mdvs::output::OutputFormat::Json, true) => { + serde_json::to_string_pretty(&step).unwrap() + } + (mdvs::output::OutputFormat::Json, false) => { + serde_json::to_string_pretty(&step.to_compact()).unwrap() + } + }; + print!("{output_str}"); + if failed { std::process::exit(2); } Ok(()) } Command::Clean { path } => { - let output = mdvs::cmd::clean::run(&path); - output.print(&cli.output, cli.verbose); - if output.has_failed_step() { + let step = mdvs::cmd::clean::run(&path); + let failed = mdvs::step::has_failed(&step); + let output_str = match (&cli.output, cli.verbose) { + (mdvs::output::OutputFormat::Text, true) => { + mdvs::render::format_text(&step.render()) + } + (mdvs::output::OutputFormat::Text, false) => { + mdvs::render::format_text(&step.to_compact().render()) + } + (mdvs::output::OutputFormat::Json, true) => { + serde_json::to_string_pretty(&step).unwrap() + } + (mdvs::output::OutputFormat::Json, false) => { + serde_json::to_string_pretty(&step.to_compact()).unwrap() + } + }; + print!("{output_str}"); + if failed { std::process::exit(2); } Ok(()) } Command::Info { path } => { - let output = mdvs::cmd::info::run(&path, cli.verbose); - output.print(&cli.output, cli.verbose); - if output.has_failed_step() { + let step = mdvs::cmd::info::run(&path, cli.verbose); + let failed = mdvs::step::has_failed(&step); + let output_str = match (&cli.output, cli.verbose) { + (mdvs::output::OutputFormat::Text, true) => { + mdvs::render::format_text(&step.render()) + } + (mdvs::output::OutputFormat::Text, false) => { + mdvs::render::format_text(&step.to_compact().render()) + } + (mdvs::output::OutputFormat::Json, true) => { + serde_json::to_string_pretty(&step).unwrap() + } + (mdvs::output::OutputFormat::Json, false) => { + serde_json::to_string_pretty(&step.to_compact()).unwrap() + } + }; + print!("{output_str}"); + if failed { std::process::exit(2); } Ok(()) diff --git a/src/output.rs b/src/output.rs index 363e38f..95f3e85 100644 --- a/src/output.rs +++ b/src/output.rs @@ -83,6 +83,33 @@ pub struct DiscoveredField { pub hints: Vec, } +/// Compact version of [`DiscoveredField`] — summary only, no globs or hints. +#[derive(Debug, Serialize)] +pub struct DiscoveredFieldCompact { + /// Field name. + pub name: String, + /// Inferred type. + pub field_type: String, + /// Number of files containing this field. + pub files_found: usize, + /// Total scanned files. + pub total_files: usize, + /// Whether null values are accepted. + pub nullable: bool, +} + +impl From<&DiscoveredField> for DiscoveredFieldCompact { + fn from(f: &DiscoveredField) -> Self { + Self { + name: f.name.clone(), + field_type: f.field_type.clone(), + files_found: f.files_found, + total_files: f.total_files, + nullable: f.nullable, + } + } +} + /// A field whose definition changed between the previous and current scan. #[derive(Debug, Serialize)] pub struct ChangedField { @@ -168,6 +195,39 @@ pub struct RemovedField { pub allowed: Option>, } +/// Compact version of [`ChangedField`] — aspect labels only, no old/new values. +#[derive(Debug, Serialize)] +pub struct ChangedFieldCompact { + /// Field name. + pub name: String, + /// Labels of aspects that changed (e.g. `["type", "allowed"]`). + pub aspects: Vec, +} + +impl From<&ChangedField> for ChangedFieldCompact { + fn from(f: &ChangedField) -> Self { + Self { + name: f.name.clone(), + aspects: f.changes.iter().map(|c| c.label().to_string()).collect(), + } + } +} + +/// Compact version of [`RemovedField`] — name only, no globs. +#[derive(Debug, Serialize)] +pub struct RemovedFieldCompact { + /// Field name. + pub name: String, +} + +impl From<&RemovedField> for RemovedFieldCompact { + fn from(f: &RemovedField) -> Self { + Self { + name: f.name.clone(), + } + } +} + /// Category of a frontmatter validation failure. #[derive(Debug, Clone, PartialEq, Eq, Hash, Serialize)] pub enum ViolationKind { @@ -182,7 +242,7 @@ pub enum ViolationKind { } /// A single file that failed a particular field validation rule. -#[derive(Debug, Serialize)] +#[derive(Debug, Clone, Serialize)] pub struct ViolatingFile { /// Path to the offending markdown file. pub path: PathBuf, @@ -191,7 +251,7 @@ pub struct ViolatingFile { } /// Groups all files that violate a specific validation rule on a single field. -#[derive(Debug, Serialize)] +#[derive(Debug, Clone, Serialize)] pub struct FieldViolation { /// Name of the frontmatter field. pub field: String, @@ -204,7 +264,7 @@ pub struct FieldViolation { } /// A frontmatter field found during check that is not yet tracked in `mdvs.toml`. -#[derive(Debug, Serialize)] +#[derive(Debug, Clone, Serialize)] pub struct NewField { /// Field name. pub name: String, @@ -215,6 +275,54 @@ pub struct NewField { pub files: Option>, } +/// Compact version of [`FieldViolation`] — summary counts, no file paths. +#[derive(Debug, Serialize)] +pub struct FieldViolationCompact { + /// Name of the frontmatter field. + pub field: String, + /// What kind of violation occurred. + pub kind: ViolationKind, + /// Number of files that triggered this violation. + pub file_count: usize, +} + +impl From<&FieldViolation> for FieldViolationCompact { + fn from(v: &FieldViolation) -> Self { + Self { + field: v.field.clone(), + kind: v.kind.clone(), + file_count: v.files.len(), + } + } +} + +/// Compact version of [`NewField`] — name and count only, no file paths. +#[derive(Debug, Serialize)] +pub struct NewFieldCompact { + /// Field name. + pub name: String, + /// Number of files containing this field. + pub files_found: usize, +} + +impl From<&NewField> for NewFieldCompact { + fn from(nf: &NewField) -> Self { + Self { + name: nf.name.clone(), + files_found: nf.files_found, + } + } +} + +/// Per-file chunk count for build output. +#[derive(Debug, Serialize)] +pub struct BuildFileDetail { + /// Relative path of the file. + pub filename: String, + /// Number of chunks produced for this file. + pub chunks: usize, +} + /// Format a file count with correct pluralization: `"1 file"` / `"3 files"`. pub fn format_file_count(n: usize) -> String { if n == 1 { @@ -240,56 +348,6 @@ pub fn format_size(bytes: u64) -> String { } } -/// Shared interface for command result structs, providing text and JSON rendering. -/// -/// Every command collects its results into a struct that implements this trait. -/// JSON output is derived automatically via `Serialize`; commands only need to -/// implement `format_text`. -pub trait CommandOutput: Serialize { - /// Render this result as human-readable text (tables, summaries). - /// When `verbose` is true, output includes expanded details and a metadata footer. - fn format_text(&self, verbose: bool) -> String; - - /// Render this result as JSON. Default serializes the full struct. - /// Command output wrappers override this to omit `process` in compact mode. - /// Infallible: all CommandOutput types derive Serialize with simple field types - /// (strings, numbers, vecs, options). serde_json only fails on non-string map keys - /// or infinite recursion, neither of which applies here. - fn format_json(&self, verbose: bool) -> String { - let _ = verbose; - serde_json::to_string_pretty(self).expect("CommandOutput types must be JSON-serializable") - } - - /// Print to stdout in the requested format. - /// Default implementation handles dispatch — commands don't need to override this. - fn print(&self, format: &OutputFormat, verbose: bool) { - match format { - OutputFormat::Text => print!("{}", self.format_text(verbose)), - OutputFormat::Json => print!("{}", self.format_json(verbose)), - } - } -} - -/// Serialize command output as JSON, omitting process steps in compact mode. -/// -/// In compact mode (not verbose), only the result is emitted: `{"result": ...}`. -/// In verbose mode or when the result is absent (error), the full struct is serialized. -/// Serialize command output as JSON, omitting process steps in compact mode. -/// -/// Infallible: same reasoning as `format_json` — all types are simple Serialize derivations. -pub fn format_json_compact( - full: &T, - result: Option<&R>, - verbose: bool, -) -> String { - if result.is_some() && !verbose { - serde_json::to_string_pretty(&serde_json::json!({ "result": result })) - .expect("CommandOutput types must be JSON-serializable") - } else { - serde_json::to_string_pretty(full).expect("CommandOutput types must be JSON-serializable") - } -} - #[cfg(test)] mod tests { use super::*; From 6d64c189c53069620812fa85adde04368048178d Mon Sep 17 00:00:00 2001 From: edoch Date: Thu, 19 Mar 2026 23:30:06 +0100 Subject: [PATCH 05/35] =?UTF-8?q?refactor:=20begin=20pipeline=20cleanup=20?= =?UTF-8?q?=E2=80=94=20delete=20delete=5Findex,=20move=20BuildFileDetail?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Delete src/pipeline/delete_index.rs (clean now calls core functions directly) - Move BuildFileDetail from pipeline/write_index.rs to output.rs - Update imports in pipeline/classify.rs, pipeline/embed.rs, outcome/commands/build.rs First pipeline module deleted. Part of TODO-0131 wave 1-2. --- src/pipeline/classify.rs | 2 +- src/pipeline/delete_index.rs | 118 ----------------------------------- src/pipeline/embed.rs | 2 +- src/pipeline/mod.rs | 1 - src/pipeline/write_index.rs | 10 +-- 5 files changed, 4 insertions(+), 129 deletions(-) delete mode 100644 src/pipeline/delete_index.rs diff --git a/src/pipeline/classify.rs b/src/pipeline/classify.rs index 06658d3..052bf18 100644 --- a/src/pipeline/classify.rs +++ b/src/pipeline/classify.rs @@ -9,7 +9,7 @@ use crate::index::storage::{content_hash, ChunkRow, FileIndexEntry}; use crate::output::format_file_count; use crate::pipeline::{ProcessingStep, ProcessingStepResult, StepOutput}; -use super::write_index::BuildFileDetail; +use crate::output::BuildFileDetail; /// Output record for the classify step. #[derive(Debug, Serialize)] diff --git a/src/pipeline/delete_index.rs b/src/pipeline/delete_index.rs deleted file mode 100644 index 316b1bc..0000000 --- a/src/pipeline/delete_index.rs +++ /dev/null @@ -1,118 +0,0 @@ -//! Delete index step — removes the `.mdvs/` directory. - -use serde::Serialize; -use std::path::Path; -use std::time::Instant; - -use crate::index::backend::Backend; -use crate::output::{format_file_count, format_size}; -use crate::pipeline::{ - ErrorKind, ProcessingStep, ProcessingStepError, ProcessingStepResult, StepOutput, -}; - -/// Output record for the delete index step. -#[derive(Debug, Serialize)] -pub struct DeleteIndexOutput { - /// Whether the `.mdvs/` directory existed and was removed. - pub removed: bool, - /// Path to the `.mdvs/` directory. - pub path: String, - /// Number of files removed. - pub files_removed: usize, - /// Total bytes freed. - pub size_bytes: u64, -} - -impl StepOutput for DeleteIndexOutput { - fn format_line(&self) -> String { - if self.removed { - format!( - "\"{}\" ({}, {})", - self.path, - format_file_count(self.files_removed), - format_size(self.size_bytes) - ) - } else { - format!("\"{}\" does not exist", self.path) - } - } -} - -/// Count files and sum their sizes in a directory (recursively). -fn walk_dir_stats(dir: &Path) -> anyhow::Result<(usize, u64)> { - let mut count = 0usize; - let mut size = 0u64; - for entry in std::fs::read_dir(dir)? { - let entry = entry?; - let meta = entry.metadata()?; - if meta.is_dir() { - let (c, s) = walk_dir_stats(&entry.path())?; - count += c; - size += s; - } else { - count += 1; - size += meta.len(); - } - } - Ok((count, size)) -} - -/// Delete the `.mdvs/` index directory if it exists. -/// -/// Returns just `ProcessingStepResult` — there is no data -/// to pass forward to subsequent steps. -pub fn run_delete_index(path: &Path) -> ProcessingStepResult { - let start = Instant::now(); - let mdvs_dir = path.join(".mdvs"); - - // Symlink check → User error - if mdvs_dir.is_symlink() { - return ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::User, - message: format!( - "'{}' is a symlink — refusing to delete for safety", - mdvs_dir.display() - ), - }); - } - - if mdvs_dir.exists() { - let (files_removed, size_bytes) = match walk_dir_stats(&mdvs_dir) { - Ok(stats) => stats, - Err(e) => { - return ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::Application, - message: e.to_string(), - }); - } - }; - - let backend = Backend::parquet(path); - if let Err(e) = backend.clean() { - return ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::Application, - message: e.to_string(), - }); - } - - ProcessingStepResult::Completed(ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: DeleteIndexOutput { - removed: true, - path: mdvs_dir.display().to_string(), - files_removed, - size_bytes, - }, - }) - } else { - ProcessingStepResult::Completed(ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: DeleteIndexOutput { - removed: false, - path: mdvs_dir.display().to_string(), - files_removed: 0, - size_bytes: 0, - }, - }) - } -} diff --git a/src/pipeline/embed.rs b/src/pipeline/embed.rs index 73d9dca..d38a7fe 100644 --- a/src/pipeline/embed.rs +++ b/src/pipeline/embed.rs @@ -8,8 +8,8 @@ use crate::index::chunk::{extract_plain_text, Chunks}; use crate::index::embed::Embedder; use crate::index::storage::ChunkRow; use crate::output::format_file_count; +use crate::output::BuildFileDetail; use crate::pipeline::classify::FileToEmbed; -use crate::pipeline::write_index::BuildFileDetail; use crate::pipeline::{ProcessingStep, ProcessingStepResult, StepOutput}; /// Output record for the embed query step. diff --git a/src/pipeline/mod.rs b/src/pipeline/mod.rs index ec12a4a..97e1ae0 100644 --- a/src/pipeline/mod.rs +++ b/src/pipeline/mod.rs @@ -5,7 +5,6 @@ //! compose. pub mod classify; -pub mod delete_index; pub mod embed; pub mod execute_search; pub mod infer; diff --git a/src/pipeline/write_index.rs b/src/pipeline/write_index.rs index eae7658..bb7785c 100644 --- a/src/pipeline/write_index.rs +++ b/src/pipeline/write_index.rs @@ -11,14 +11,8 @@ use crate::pipeline::{ ErrorKind, ProcessingStep, ProcessingStepError, ProcessingStepResult, StepOutput, }; -/// Per-file chunk count for verbose build output. -#[derive(Debug, Serialize)] -pub struct BuildFileDetail { - /// Relative path of the file. - pub filename: String, - /// Number of chunks produced for this file. - pub chunks: usize, -} +// BuildFileDetail moved to crate::output +pub use crate::output::BuildFileDetail; /// Output record for the write index step. #[derive(Debug, Serialize)] From c64118ab492c41fd6429e7936d5beeb242fb64a8 Mon Sep 17 00:00:00 2001 From: edoch Date: Thu, 19 Mar 2026 23:30:27 +0100 Subject: [PATCH 06/35] docs: update TODO status for Step tree implementation Mark TODOs 0120-0130 as done. Update TODO-0122 incremental checklists. Restructure TODO-0131 into 7 waves for full pipeline cleanup. --- docs/spec/todos/TODO-0120.md | 5 +- docs/spec/todos/TODO-0121.md | 5 +- docs/spec/todos/TODO-0122.md | 44 +++++------ docs/spec/todos/TODO-0123.md | 5 +- docs/spec/todos/TODO-0124.md | 4 +- docs/spec/todos/TODO-0125.md | 4 +- docs/spec/todos/TODO-0126.md | 5 +- docs/spec/todos/TODO-0127.md | 5 +- docs/spec/todos/TODO-0128.md | 5 +- docs/spec/todos/TODO-0129.md | 5 +- docs/spec/todos/TODO-0130.md | 104 ++++++++++++++----------- docs/spec/todos/TODO-0131.md | 143 ++++++++++++++++++++++++++--------- docs/spec/todos/index.md | 20 ++--- 13 files changed, 233 insertions(+), 121 deletions(-) diff --git a/docs/spec/todos/TODO-0120.md b/docs/spec/todos/TODO-0120.md index 4436e64..9188a4c 100644 --- a/docs/spec/todos/TODO-0120.md +++ b/docs/spec/todos/TODO-0120.md @@ -1,11 +1,14 @@ --- id: 120 title: "Step tree: core types — Step, StepOutcome, StepError" -status: todo +status: done priority: high created: 2026-03-19 +completed: 2026-03-19 depends_on: [119] blocks: [121, 122, 125, 126, 127, 128, 129, 130, 131] +files_created: [src/step.rs] +files_updated: [src/lib.rs] --- # TODO-0120: Step tree: core types — Step, StepOutcome, StepError diff --git a/docs/spec/todos/TODO-0121.md b/docs/spec/todos/TODO-0121.md index cf0b491..fc228e2 100644 --- a/docs/spec/todos/TODO-0121.md +++ b/docs/spec/todos/TODO-0121.md @@ -1,11 +1,14 @@ --- id: 121 title: "Step tree: Block enum and Render trait" -status: todo +status: done priority: high created: 2026-03-19 +completed: 2026-03-19 depends_on: [120] blocks: [123, 125, 126, 127, 128, 129, 130, 131] +files_created: [src/block.rs] +files_updated: [src/step.rs, src/lib.rs] --- # TODO-0121: Step tree: Block enum and Render trait diff --git a/docs/spec/todos/TODO-0122.md b/docs/spec/todos/TODO-0122.md index 3fa8f1e..ab20fc4 100644 --- a/docs/spec/todos/TODO-0122.md +++ b/docs/spec/todos/TODO-0122.md @@ -19,36 +19,36 @@ Create the `Outcome` and `CompactOutcome` enums with all named outcome structs, The enums start with a `_ => todo!()` catch-all arm in `to_compact()` and `contains_violations()`. Variants are added as commands are converted. Each checkbox corresponds to a command conversion TODO. ### Phase 1: minimal (for clean — TODO-0125) -- [ ] Create `src/outcome/mod.rs` with `Outcome` and `CompactOutcome` enums (initially: `DeleteIndex` + `Clean` variants only, `_ => todo!()` for the rest) -- [ ] `Outcome::to_compact()` and `Outcome::contains_violations()` with catch-all -- [ ] `DeleteIndexOutcome` / `DeleteIndexOutcomeCompact` + `From` + `Render` -- [ ] `CleanOutcome` / `CleanOutcomeCompact` + `From` + `Render` +- [x] Create `src/outcome/mod.rs` with `Outcome` and `CompactOutcome` enums (initially: `DeleteIndex` + `Clean` variants only) +- [x] `Outcome::to_compact()` and `Outcome::contains_violations()` +- [x] `DeleteIndexOutcome` / `DeleteIndexOutcomeCompact` + `From` + `Render` +- [x] `CleanOutcome` / `CleanOutcomeCompact` + `From` + `Render` ### Phase 2: info (TODO-0126) -- [ ] Add `ReadConfig`, `Scan`, `ReadIndex`, `Info` variants to both enums -- [ ] `ReadConfigOutcome` / `ReadConfigOutcomeCompact` + `From` + `Render` -- [ ] `ScanOutcome` / `ScanOutcomeCompact` + `From` + `Render` -- [ ] `ReadIndexOutcome` / `ReadIndexOutcomeCompact` + `From` + `Render` -- [ ] `InfoOutcome` / `InfoOutcomeCompact` + `From` + `Render` +- [x] Add `ReadConfig`, `Scan`, `ReadIndex`, `Info` variants to both enums +- [x] `ReadConfigOutcome` / `ReadConfigOutcomeCompact` + `From` + `Render` +- [x] `ScanOutcome` / `ScanOutcomeCompact` + `From` + `Render` +- [x] `ReadIndexOutcome` / `ReadIndexOutcomeCompact` + `From` + `Render` +- [x] `InfoOutcome` / `InfoOutcomeCompact` + `From` + `Render` ### Phase 3: check (TODO-0127) -- [ ] Add `Validate`, `Check` variants to both enums -- [ ] `ValidateOutcome` / `ValidateOutcomeCompact` + `From` + `Render` -- [ ] `CheckOutcome` / `CheckOutcomeCompact` + `From` + `Render` -- [ ] `FieldViolationCompact`, `NewFieldCompact` sub-structure compacts + `From` -- [ ] Update `Outcome::contains_violations()` for Validate and Check variants +- [x] Add `Validate`, `Check` variants to both enums +- [x] `ValidateOutcome` / `ValidateOutcomeCompact` + `From` + `Render` +- [x] `CheckOutcome` / `CheckOutcomeCompact` + `From` + `Render` +- [x] `FieldViolationCompact`, `NewFieldCompact` sub-structure compacts + `From` +- [x] Update `Outcome::contains_violations()` for Validate and Check variants ### Phase 4: init (TODO-0128) -- [ ] Add `Infer`, `WriteConfig`, `Init` variants to both enums -- [ ] `InferOutcome` / `InferOutcomeCompact` + `From` + `Render` -- [ ] `WriteConfigOutcome` / `WriteConfigOutcomeCompact` + `From` + `Render` -- [ ] `InitOutcome` / `InitOutcomeCompact` + `From` + `Render` -- [ ] `DiscoveredFieldCompact` sub-structure compact + `From` +- [x] Add `Infer`, `WriteConfig`, `Init` variants to both enums +- [x] `InferOutcome` / `InferOutcomeCompact` + `From` + `Render` +- [x] `WriteConfigOutcome` / `WriteConfigOutcomeCompact` + `From` + `Render` +- [x] `InitOutcome` / `InitOutcomeCompact` + `From` + `Render` +- [x] `DiscoveredFieldCompact` sub-structure compact + `From` ### Phase 5: update (TODO-0129) -- [ ] Add `Update` variant to both enums -- [ ] `UpdateOutcome` / `UpdateOutcomeCompact` + `From` + `Render` -- [ ] `ChangedFieldCompact`, `RemovedFieldCompact` sub-structure compacts + `From` +- [x] Add `Update` variant to both enums +- [x] `UpdateOutcome` / `UpdateOutcomeCompact` + `From` + `Render` +- [x] `ChangedFieldCompact`, `RemovedFieldCompact` sub-structure compacts + `From` ### Phase 6: build + search (TODO-0130) - [ ] Add `MutateConfig`, `ReadIndexMetadata`, `CheckConfigChanged`, `Classify`, `LoadModel`, `EmbedFiles`, `WriteIndex`, `Build` variants to both enums diff --git a/docs/spec/todos/TODO-0123.md b/docs/spec/todos/TODO-0123.md index 453d4e0..4e93c92 100644 --- a/docs/spec/todos/TODO-0123.md +++ b/docs/spec/todos/TODO-0123.md @@ -1,11 +1,14 @@ --- id: 123 title: "Step tree: shared formatters (format_text, format_markdown)" -status: todo +status: done priority: high created: 2026-03-19 +completed: 2026-03-19 depends_on: [121] blocks: [125, 126, 127, 128, 129, 130, 131] +files_created: [src/render.rs] +files_updated: [src/lib.rs] --- # TODO-0123: Step tree: shared formatters (format_text, format_markdown) diff --git a/docs/spec/todos/TODO-0124.md b/docs/spec/todos/TODO-0124.md index a473177..9b57f74 100644 --- a/docs/spec/todos/TODO-0124.md +++ b/docs/spec/todos/TODO-0124.md @@ -1,11 +1,13 @@ --- id: 124 title: "Step tree: custom Serialize on Step and StepOutcome" -status: todo +status: done priority: high created: 2026-03-19 +completed: 2026-03-19 depends_on: [120, 122] blocks: [125, 126, 127, 128, 129, 130, 131] +files_updated: [src/step.rs] --- # TODO-0124: Step tree: custom Serialize on Step and StepOutcome diff --git a/docs/spec/todos/TODO-0125.md b/docs/spec/todos/TODO-0125.md index 1c5ac0a..2ee796c 100644 --- a/docs/spec/todos/TODO-0125.md +++ b/docs/spec/todos/TODO-0125.md @@ -1,11 +1,13 @@ --- id: 125 title: "Step tree: convert clean command" -status: todo +status: done priority: high created: 2026-03-19 +completed: 2026-03-19 depends_on: [120, 121, 122, 123, 124] blocks: [129] +files_updated: [src/cmd/clean.rs, src/main.rs, src/step.rs] --- # TODO-0125: Step tree: convert clean command diff --git a/docs/spec/todos/TODO-0126.md b/docs/spec/todos/TODO-0126.md index 145b345..e21e913 100644 --- a/docs/spec/todos/TODO-0126.md +++ b/docs/spec/todos/TODO-0126.md @@ -1,11 +1,14 @@ --- id: 126 title: "Step tree: convert info command" -status: todo +status: done priority: high created: 2026-03-19 +completed: 2026-03-19 depends_on: [120, 121, 122, 123, 124] blocks: [129] +files_created: [src/outcome/scan.rs, src/outcome/config.rs] +files_updated: [src/outcome/index.rs, src/outcome/commands.rs, src/outcome/mod.rs, src/cmd/info.rs, src/main.rs] --- # TODO-0126: Step tree: convert info command diff --git a/docs/spec/todos/TODO-0127.md b/docs/spec/todos/TODO-0127.md index 9ed0b68..4e0be91 100644 --- a/docs/spec/todos/TODO-0127.md +++ b/docs/spec/todos/TODO-0127.md @@ -1,11 +1,14 @@ --- id: 127 title: "Step tree: convert check command" -status: todo +status: done priority: high created: 2026-03-19 +completed: 2026-03-19 depends_on: [120, 121, 122, 123, 124] blocks: [129] +files_created: [src/outcome/validate.rs] +files_updated: [src/outcome/commands.rs, src/outcome/mod.rs, src/output.rs, src/cmd/check.rs, src/step.rs, src/main.rs] --- # TODO-0127: Step tree: convert check command diff --git a/docs/spec/todos/TODO-0128.md b/docs/spec/todos/TODO-0128.md index 859f63d..34c23e6 100644 --- a/docs/spec/todos/TODO-0128.md +++ b/docs/spec/todos/TODO-0128.md @@ -1,11 +1,14 @@ --- id: 128 title: "Step tree: convert init command" -status: todo +status: done priority: high created: 2026-03-19 +completed: 2026-03-19 depends_on: [120, 121, 122, 123, 124] blocks: [129] +files_created: [src/outcome/infer.rs, src/outcome/commands/init.rs] +files_updated: [src/outcome/config.rs, src/outcome/commands/mod.rs, src/outcome/mod.rs, src/output.rs, src/cmd/init.rs, src/cmd/info.rs, src/step.rs, src/main.rs] --- # TODO-0128: Step tree: convert init command diff --git a/docs/spec/todos/TODO-0129.md b/docs/spec/todos/TODO-0129.md index 9a0a422..d46f826 100644 --- a/docs/spec/todos/TODO-0129.md +++ b/docs/spec/todos/TODO-0129.md @@ -1,11 +1,14 @@ --- id: 129 title: "Step tree: convert update command" -status: todo +status: done priority: high created: 2026-03-19 +completed: 2026-03-19 depends_on: [125, 126, 127, 128] blocks: [130] +files_created: [src/outcome/commands/update.rs] +files_updated: [src/outcome/commands/mod.rs, src/outcome/mod.rs, src/output.rs, src/cmd/update.rs, src/cmd/build.rs, src/main.rs] --- # TODO-0129: Step tree: convert update command diff --git a/docs/spec/todos/TODO-0130.md b/docs/spec/todos/TODO-0130.md index f2d83c6..ef17207 100644 --- a/docs/spec/todos/TODO-0130.md +++ b/docs/spec/todos/TODO-0130.md @@ -12,61 +12,77 @@ blocks: [131] ## Summary -Convert `build` and `search` commands together to the new Step tree architecture. These are tightly coupled — search nests build (auto-build), build nests update (auto-update). Most complex piece of the migration. - -## Checklist - -### Outcome structs — build (part of TODO-0122 phase 6) -- [ ] Add `MutateConfig`, `ReadIndexMetadata`, `CheckConfigChanged`, `Classify`, `LoadModel`, `EmbedFiles`, `WriteIndex`, `Build` variants to both enums -- [ ] Create `MutateConfigOutcome` / `MutateConfigOutcomeCompact` + `From` + `Render` -- [ ] Create `ReadIndexMetadataOutcome` / `ReadIndexMetadataOutcomeCompact` + `From` + `Render` -- [ ] Create `CheckConfigChangedOutcome` / `CheckConfigChangedOutcomeCompact` + `From` + `Render` -- [ ] Create `ClassifyOutcome` / `ClassifyOutcomeCompact` + `From` + `Render` -- [ ] Create `LoadModelOutcome` / `LoadModelOutcomeCompact` + `From` + `Render` -- [ ] Create `EmbedFilesOutcome` / `EmbedFilesOutcomeCompact` + `From` + `Render` -- [ ] Create `WriteIndexOutcome` / `WriteIndexOutcomeCompact` + `From` + `Render` -- [ ] Create `BuildOutcome` / `BuildOutcomeCompact` + `From` + `Render` -- [ ] Create `BuildFileDetailCompact` + `From<&BuildFileDetail>` - -### Outcome structs — search (part of TODO-0122 phase 6) -- [ ] Add `EmbedQuery`, `ExecuteSearch`, `ReadChunkText`, `Search` variants to both enums -- [ ] Create `EmbedQueryOutcome` / `EmbedQueryOutcomeCompact` + `From` + `Render` -- [ ] Create `ExecuteSearchOutcome` / `ExecuteSearchOutcomeCompact` + `From` + `Render` -- [ ] Create `ReadChunkTextOutcome` / `ReadChunkTextOutcomeCompact` + `From` + `Render` -- [ ] Create `SearchOutcome` / `SearchOutcomeCompact` + `From` + `Render` -- [ ] Create `SearchHitCompact` + `From<&SearchHit>` -- [ ] Remove `_ => todo!()` catch-all arms — all variants now covered +Convert `build` and `search` commands to the new Step tree architecture. Split into two waves: build first (nests update, already converted), then search (nests build). + +## Wave 1: Build command + +### Outcome structs (part of TODO-0122 phase 6) +- [x] Add `Classify`, `LoadModel`, `EmbedFiles`, `WriteIndex`, `Build` variants to both enums +- [x] Create `ClassifyOutcome` / `ClassifyOutcomeCompact` + `From` + `Render` +- [x] Create `LoadModelOutcome` / `LoadModelOutcomeCompact` + `From` + `Render` +- [x] Create `EmbedFilesOutcome` / `EmbedFilesOutcomeCompact` + `From` + `Render` +- [x] Create `WriteIndexOutcome` / `WriteIndexOutcomeCompact` + `From` + `Render` +- [x] Create `BuildOutcome` / `BuildOutcomeCompact` + `From` + `Render` +- Skipped: MutateConfig, ReadIndexMetadata, CheckConfigChanged, BuildFileDetailCompact — pre-checks stay inline ### Build command rewrite -- [ ] Rewrite `build::run()` → returns `Step` -- [ ] New leaf steps: MutateConfig, ReadIndexMetadata, CheckConfigChanged -- [ ] Auto-update nesting: `update::run()` returns `Step`, included as substep -- [ ] Violations: Validate succeeds with Ok data, Build aborts with Err -- [ ] Delete `BuildProcessOutput`, `BuildCommandOutput`, `BuildResult` structs +- [x] Rewrite `build::run()` → returns `Step` +- [x] Pre-checks stay inline (MutateConfig, config change, dimension mismatch land on next step) +- [x] Auto-update nesting: `update::run()` returns `Step`, included as substep +- [x] Violations: Validate substep carries violation data, Build aborts with Err +- [x] Delete `BuildProcessOutput`, `BuildCommandOutput`, `BuildResult` structs + +### main.rs +- [x] Update build match arm: `Step` dispatch + +### Tests +- [x] Rewrite build tests with helpers (end-to-end, violations, incremental, model mismatch) + +### Verification +- [x] `cargo test` — all 313 tests pass +- [x] `cargo clippy` + `cargo fmt` +- [x] Manual: test against `example_kb/` with all output formats + +## Wave 2: Search command + +### Outcome structs (part of TODO-0122 phase 6) +- [x] Add `EmbedQuery`, `ExecuteSearch`, `Search` variants to both enums +- [x] Create `EmbedQueryOutcome` / `EmbedQueryOutcomeCompact` + `From` + `Render` +- [x] Create `ExecuteSearchOutcome` / `ExecuteSearchOutcomeCompact` + `From` + `Render` +- [x] Create `SearchOutcome` / `SearchOutcomeCompact` + `From` + `Render` +- [x] Create `SearchHitCompact` + `From<&SearchHit>` +- Skipped: ReadChunkText — chunk text populated inline, not a separate step ### Search command rewrite -- [ ] Rewrite `search::run()` → returns `Step` -- [ ] Auto-build nesting: `build::run()` returns `Step`, included as substep -- [ ] ReadChunkText step: Skipped if !verbose -- [ ] Delete `SearchProcessOutput`, `SearchCommandOutput`, `SearchResult` structs +- [x] Rewrite `search::run()` → returns `Step` +- [x] Auto-build nesting: `build::run()` returns `Step`, included as substep +- [x] Chunk text always populated, compact drops it +- [x] Delete `SearchProcessOutput`, `SearchCommandOutput`, `SearchResult` structs ### main.rs -- [ ] Update build match arm: `Step` dispatch -- [ ] Update search match arm: `Step` dispatch +- [x] Update search match arm: `Step` dispatch +- [x] Removed unused `CommandOutput` import (all commands now use Step) ### Tests -- [ ] Rewrite build tests with helpers (end-to-end, violations, incremental, model mismatch) -- [ ] Rewrite search tests with helpers (end-to-end, missing config, where clause, model mismatch) +- [x] Rewrite search tests with helpers -### Verification -- [ ] `cargo test` — ALL 245+ tests pass -- [ ] `cargo clippy` + `cargo fmt` -- [ ] Manual: test against `example_kb/` with all output formats -- [ ] Manual: test search with auto-build (nested Step tree renders correctly) +### Final verification +- [x] `cargo test` — all 313 tests pass +- [x] `cargo clippy` — clean +- [x] `cargo fmt` — formatted ## Files +### Wave 1 - `src/cmd/build.rs` (rewrite run(), delete old structs, rewrite tests) +- `src/outcome/commands/build.rs` (new) +- `src/outcome/` leaf files (new outcomes for MutateConfig, ReadIndexMetadata, CheckConfigChanged, Classify, LoadModel, EmbedFiles, WriteIndex) +- `src/outcome/commands/mod.rs` + `src/outcome/mod.rs` (add variants) +- `src/main.rs` (update build match arm) + +### Wave 2 - `src/cmd/search.rs` (rewrite run(), delete old structs, rewrite tests) -- `src/outcome/` (add all remaining variants and structs) -- `src/main.rs` (update build + search match arms) +- `src/outcome/commands/search.rs` (new) +- `src/outcome/` leaf files (new outcomes for EmbedQuery, ExecuteSearch, ReadChunkText) +- `src/outcome/commands/mod.rs` + `src/outcome/mod.rs` (add variants, remove catch-alls) +- `src/main.rs` (update search match arm) diff --git a/docs/spec/todos/TODO-0131.md b/docs/spec/todos/TODO-0131.md index 8ac141f..82d8e1e 100644 --- a/docs/spec/todos/TODO-0131.md +++ b/docs/spec/todos/TODO-0131.md @@ -1,6 +1,6 @@ --- id: 131 -title: "Step tree: delete old pipeline, update main.rs, simplify output.rs" +title: "Step tree: delete old pipeline, full cleanup" status: todo priority: high created: 2026-03-19 @@ -8,59 +8,130 @@ depends_on: [130] blocks: [132, 133] --- -# TODO-0131: Step tree: delete old pipeline, update main.rs, simplify output.rs +# TODO-0131: Step tree: delete old pipeline, full cleanup ## Summary -Final cleanup after all commands are converted. Delete the old pipeline modules, simplify output.rs by removing the obsolete CommandOutput trait, and finalize main.rs dispatch. +Full cleanup: rewrite all 7 commands to call core functions directly (replacing `from_pipeline_result(run_*(...))` with `ScannedFiles::scan()` + timing + Step construction). Delete entire `src/pipeline/` directory. Remove all migration glue from `step.rs`. Clean up `output.rs`. -## Incremental checklist +## Strategy -main.rs is updated incrementally as each command is converted. By the time this TODO runs, main.rs already has a mix of old and new dispatch. This TODO finalizes it. +Each pipeline `run_*()` function wraps a core function with timing + `ProcessingStepResult`. Commands currently call `from_pipeline_result(run_*(), ...)` to convert. The cleanup eliminates both layers — commands call core functions directly and construct Steps inline. -### main.rs dispatch (tracked per command conversion) -- [ ] clean match arm uses Step (TODO-0125) -- [ ] info match arm uses Step (TODO-0126) -- [ ] check match arm uses Step (TODO-0127) -- [ ] init match arm uses Step (TODO-0128) -- [ ] update match arm uses Step (TODO-0129) -- [ ] build match arm uses Step (TODO-0130) -- [ ] search match arm uses Step (TODO-0130) +A `Step::leaf()` constructor helper reduces the boilerplate of building leaf steps. -### Pipeline module deletion (after all commands converted) -- [ ] Delete `src/pipeline/scan.rs` -- [ ] Delete `src/pipeline/infer.rs` -- [ ] Delete `src/pipeline/validate.rs` +## Wave 1: Infrastructure + dead code + +- [ ] Add `Step::leaf(outcome, elapsed_ms)` and `Step::failed(kind, message, elapsed_ms)` constructors to `step.rs` +- [ ] Move `BuildFileDetail` from `src/pipeline/write_index.rs` to `src/output.rs` +- [ ] Delete dead code from `src/output.rs`: `CommandOutput` trait, `format_json_compact()` +- [ ] Delete `StepOutput` trait from `src/pipeline/mod.rs` (if unused outside pipeline) + +### Verification +- [ ] `cargo build` + `cargo test` + `cargo clippy` + +## Wave 2: clean + info (simplest commands) + +### clean +- [ ] Rewrite to call `Backend::clean()` / `walk_dir_stats()` directly (from `pipeline/delete_index.rs`) +- [ ] Delete `src/pipeline/delete_index.rs` + +### info +- [ ] Rewrite to call `MdvsToml::read()` directly (from `pipeline/read_config.rs`) +- [ ] Call `ScannedFiles::scan()` directly (from `pipeline/scan.rs`) +- [ ] Call `backend.read_*()` directly (from `pipeline/read_index.rs`) +- [ ] Delete pipeline modules IF no other command still uses them (likely not yet — scan/read_config used by many) + +### Verification +- [ ] `cargo build` + `cargo test` + `cargo clippy` + +## Wave 3: init + update (scan/infer/write_config users) + +### init +- [ ] Call `ScannedFiles::scan()` directly +- [ ] Call `DirectoryTree::infer()` directly (from `pipeline/infer.rs`) +- [ ] Call `MdvsToml::write()` directly (from `pipeline/write_config.rs`) + +### update +- [ ] Same core functions as init + field comparison inline + +### Delete pipeline modules +- [ ] Delete `src/pipeline/infer.rs` (no more callers) +- [ ] Delete `src/pipeline/write_config.rs` (no more callers) + +### Verification +- [ ] `cargo build` + `cargo test` + `cargo clippy` + +## Wave 4: check (scan/validate user) + +### check +- [ ] Call `ScannedFiles::scan()` directly +- [ ] Call `check::validate()` directly (already a core function, not a pipeline wrapper — just remove `run_validate` indirection) + +### Delete pipeline modules +- [ ] Delete `src/pipeline/validate.rs` (no more callers) + +### Verification +- [ ] `cargo build` + `cargo test` + `cargo clippy` + +## Wave 5: build (most complex — scan/validate/classify/load_model/embed/write_index) + +### build +- [ ] Call `ScannedFiles::scan()` directly +- [ ] Call `check::validate()` directly +- [ ] Call `classify_files()` directly (from `pipeline/classify.rs` — move core logic) +- [ ] Call `Embedder::load()` directly (from `pipeline/load_model.rs`) +- [ ] Call `embed_files()` directly (from `pipeline/embed.rs` — move core logic) +- [ ] Call write_index logic directly (from `pipeline/write_index.rs` — move core logic) + +### Delete pipeline modules - [ ] Delete `src/pipeline/classify.rs` - [ ] Delete `src/pipeline/load_model.rs` - [ ] Delete `src/pipeline/embed.rs` -- [ ] Delete `src/pipeline/read_config.rs` -- [ ] Delete `src/pipeline/write_config.rs` -- [ ] Delete `src/pipeline/read_index.rs` - [ ] Delete `src/pipeline/write_index.rs` + +### Verification +- [ ] `cargo build` + `cargo test` + `cargo clippy` + +## Wave 6: search (read_index/load_model/embed_query/execute_search) + +### search +- [ ] Call `MdvsToml::read()` directly +- [ ] Call `backend.read_*()` directly +- [ ] Call `Embedder::load()` directly +- [ ] Call `embedder.embed()` directly (from `pipeline/embed.rs` run_embed_query) +- [ ] Call `SearchContext::execute()` directly (from `pipeline/execute_search.rs`) + +### Delete pipeline modules - [ ] Delete `src/pipeline/execute_search.rs` -- [ ] Delete `src/pipeline/delete_index.rs` -- [ ] Delete `src/pipeline/mod.rs` +- [ ] Delete `src/pipeline/read_index.rs` +- [ ] Delete `src/pipeline/read_config.rs` +- [ ] Delete `src/pipeline/scan.rs` -### output.rs simplification -- [ ] Remove `CommandOutput` trait -- [ ] Remove `OutputFormat` enum (moved to main.rs or render.rs) -- [ ] Remove `format_json_compact` helper -- [ ] Keep shared sub-types (`FieldViolation`, `SearchHit`, `NewField`, `DiscoveredField`, etc.) +### Verification +- [ ] `cargo build` + `cargo test` + `cargo clippy` -### lib.rs cleanup -- [ ] Remove `pub mod pipeline` -- [ ] Confirm `pub mod step`, `pub mod block`, `pub mod outcome`, `pub mod render` are present +## Wave 7: Final deletion + +- [ ] Delete `src/pipeline/mod.rs` +- [ ] Remove `pub mod pipeline` from `src/lib.rs` +- [ ] Delete migration helpers from `src/step.rs`: `from_pipeline_result`, `from_pipeline_result_with_data`, `convert_error_kind`, `has_failed_step()` compat method +- [ ] Remove `OutputFormat` from `output.rs` if moved elsewhere, or keep if still used +- [ ] `cargo fmt` ### Final verification - [ ] `cargo test` — all tests pass - [ ] `cargo clippy` — no warnings - [ ] `cargo fmt` — formatted -- [ ] Manual end-to-end: test all 7 commands against `example_kb/` with all output formats (text compact, text verbose, JSON compact, JSON verbose) +- [ ] Manual end-to-end: test all 7 commands against `example_kb/` with all output formats + +## Notes + +- Pipeline modules that contain meaningful core logic (classify's `classify_files()`, embed's batch logic, write_index's parquet assembly) need their logic moved to `src/index/` or kept as free functions in the command. Don't lose the logic — just unwrap it from the `ProcessingStepResult` wrapping. +- `read_config` and `scan` are the most widely shared — they'll be the last pipeline modules deleted (Wave 6). +- `check::validate()` is already a core function in `src/cmd/check.rs` — `pipeline/validate.rs` is just a thin wrapper. Easy delete. +- Some pipeline modules (`read_config.rs`, `scan.rs`) call core functions that are trivial (`MdvsToml::read()`, `ScannedFiles::scan()`). These are one-liners to inline. -## Files +## Files affected -- `src/pipeline/` (delete entire directory) -- `src/output.rs` (simplify) -- `src/main.rs` (finalize dispatch) -- `src/lib.rs` (update module declarations) +All `src/cmd/*.rs`, all `src/pipeline/*.rs` (deleted), `src/step.rs`, `src/output.rs`, `src/lib.rs` diff --git a/docs/spec/todos/index.md b/docs/spec/todos/index.md index 096c2b6..b52095d 100644 --- a/docs/spec/todos/index.md +++ b/docs/spec/todos/index.md @@ -121,17 +121,17 @@ | [0117](TODO-0117.md) | Fix null values skipping Disallowed and NullNotAllowed checks | done | high | 2026-03-17 | | [0118](TODO-0118.md) | Rework README and book intro to show directory-aware schema | todo | high | 2026-03-17 | | [0119](TODO-0119.md) | Unified Step tree architecture — replace pipeline/command output split | todo | high | 2026-03-18 | -| [0120](TODO-0120.md) | Step tree: core types — Step, StepOutcome, StepError | todo | high | 2026-03-19 | -| [0121](TODO-0121.md) | Step tree: Block enum and Render trait | todo | high | 2026-03-19 | +| [0120](TODO-0120.md) | Step tree: core types — Step, StepOutcome, StepError | done | high | 2026-03-19 | +| [0121](TODO-0121.md) | Step tree: Block enum and Render trait | done | high | 2026-03-19 | | [0122](TODO-0122.md) | Step tree: Outcome enums and all outcome structs | todo | high | 2026-03-19 | -| [0123](TODO-0123.md) | Step tree: shared formatters (format_text, format_markdown) | todo | high | 2026-03-19 | -| [0124](TODO-0124.md) | Step tree: custom Serialize on Step and StepOutcome | todo | high | 2026-03-19 | -| [0125](TODO-0125.md) | Step tree: convert clean command | todo | high | 2026-03-19 | -| [0126](TODO-0126.md) | Step tree: convert info command | todo | high | 2026-03-19 | -| [0127](TODO-0127.md) | Step tree: convert check command | todo | high | 2026-03-19 | -| [0128](TODO-0128.md) | Step tree: convert init command | todo | high | 2026-03-19 | -| [0129](TODO-0129.md) | Step tree: convert update command | todo | high | 2026-03-19 | -| [0130](TODO-0130.md) | Step tree: convert build + search commands | todo | high | 2026-03-19 | +| [0123](TODO-0123.md) | Step tree: shared formatters (format_text, format_markdown) | done | high | 2026-03-19 | +| [0124](TODO-0124.md) | Step tree: custom Serialize on Step and StepOutcome | done | high | 2026-03-19 | +| [0125](TODO-0125.md) | Step tree: convert clean command | done | high | 2026-03-19 | +| [0126](TODO-0126.md) | Step tree: convert info command | done | high | 2026-03-19 | +| [0127](TODO-0127.md) | Step tree: convert check command | done | high | 2026-03-19 | +| [0128](TODO-0128.md) | Step tree: convert init command | done | high | 2026-03-19 | +| [0129](TODO-0129.md) | Step tree: convert update command | done | high | 2026-03-19 | +| [0130](TODO-0130.md) | Step tree: convert build + search commands | done | high | 2026-03-19 | | [0131](TODO-0131.md) | Step tree: delete old pipeline, update main.rs, simplify output.rs | todo | high | 2026-03-19 | | [0132](TODO-0132.md) | Macro for compact struct generation (crabtime) | todo | low | 2026-03-19 | | [0133](TODO-0133.md) | Macro for step pipeline boilerplate (early-return pattern) | todo | low | 2026-03-19 | From 78c7a0f28114c991a003614bbc912d95bbfca71f Mon Sep 17 00:00:00 2001 From: edoch Date: Fri, 20 Mar 2026 19:00:26 +0100 Subject: [PATCH 07/35] refactor: inline all pipeline calls into commands, remove migration helpers All 7 commands now call core functions directly (MdvsToml::read, ScannedFiles::scan, check::validate, Backend methods, Embedder::load, etc.) instead of going through pipeline wrappers. - Waves 2-6 of TODO-0131: info, init, update, check, build, search all pipeline-free - classify_files() and embed_file() logic moved to build.rs - validate_where_clause() and IndexData moved to search.rs - Migration helpers removed from step.rs (from_pipeline_result, convert_error_kind, has_failed_step compat method) - Skipped padding removed from all error paths (TODO-0135) - Auto-update output attached as substep in check.rs Co-Authored-By: Claude --- src/cmd/build.rs | 862 ++++++++++++++++++++++++++++++++-------------- src/cmd/check.rs | 108 +++--- src/cmd/info.rs | 4 +- src/cmd/init.rs | 128 ++++--- src/cmd/search.rs | 369 ++++++++++++-------- src/cmd/update.rs | 196 +++++------ src/step.rs | 103 ------ 7 files changed, 1046 insertions(+), 724 deletions(-) diff --git a/src/cmd/build.rs b/src/cmd/build.rs index 705f90b..1e2339d 100644 --- a/src/cmd/build.rs +++ b/src/cmd/build.rs @@ -1,22 +1,19 @@ use crate::discover::field_type::FieldType; +use crate::discover::scan::{ScannedFile, ScannedFiles}; use crate::index::backend::Backend; -use crate::index::storage::{content_hash, BuildMetadata, FileRow}; +use crate::index::chunk::{extract_plain_text, Chunks}; +use crate::index::embed::{Embedder, ModelConfig}; +use crate::index::storage::{content_hash, BuildMetadata, ChunkRow, FileIndexEntry, FileRow}; use crate::outcome::commands::BuildOutcome; use crate::outcome::{ ClassifyOutcome, EmbedFilesOutcome, LoadModelOutcome, Outcome, ReadConfigOutcome, ScanOutcome, ValidateOutcome, WriteIndexOutcome, }; -use crate::pipeline::classify::run_classify; -use crate::pipeline::embed::run_embed_files; -use crate::pipeline::load_model::run_load_model; -use crate::pipeline::read_config::run_read_config; -use crate::pipeline::scan::run_scan; -use crate::pipeline::validate::run_validate; -use crate::pipeline::write_index::run_write_index; -use crate::pipeline::ProcessingStepResult; +use crate::output::BuildFileDetail; use crate::schema::config::{BuildConfig, MdvsToml, SearchConfig}; use crate::schema::shared::{ChunkingConfig, EmbeddingModelConfig}; -use crate::step::{from_pipeline_result, ErrorKind, Step, StepError, StepOutcome}; +use crate::step::{ErrorKind, Step, StepError, StepOutcome}; +use std::collections::{HashMap, HashSet}; use std::path::Path; use std::time::Instant; use tracing::instrument; @@ -24,6 +21,156 @@ use tracing::instrument; const DEFAULT_MODEL: &str = "minishlab/potion-base-8M"; const DEFAULT_CHUNK_SIZE: usize = 1024; +// ============================================================================ +// Classification types + logic (moved from pipeline/classify.rs) +// ============================================================================ + +/// A file that needs chunking and embedding. +struct FileToEmbed<'a> { + /// Unique file identifier (preserved for edited files, new UUID for new files). + file_id: String, + /// Reference to the scanned file data. + scanned: &'a ScannedFile, +} + +/// Data produced by classification, carried forward to embed and write_index steps. +struct ClassifyData<'a> { + /// Whether this is a full rebuild. + full_rebuild: bool, + /// Files that need chunking + embedding (new or edited). + needs_embedding: Vec>, + /// Maps filename → file_id for ALL current files (new, edited, unchanged). + file_id_map: HashMap, + /// Chunks retained from unchanged files. + retained_chunks: Vec, + /// Number of files removed since previous build. + removed_count: usize, + /// Number of chunks dropped from removed files. + chunks_removed: usize, + /// Per-file chunk counts for removed files (for verbose output). + removed_details: Vec, +} + +struct FileClassification<'a> { + needs_embedding: Vec>, + file_id_map: HashMap, + unchanged_file_ids: HashSet, + removed_count: usize, + removed_file_ids: HashSet, + removed_filenames: Vec, +} + +fn classify_files<'a>( + scanned: &'a ScannedFiles, + existing_index: &[FileIndexEntry], +) -> FileClassification<'a> { + let existing: HashMap<&str, (&str, &str)> = existing_index + .iter() + .map(|e| { + ( + e.filename.as_str(), + (e.file_id.as_str(), e.content_hash.as_str()), + ) + }) + .collect(); + + let mut needs_embedding = Vec::new(); + let mut file_id_map = HashMap::new(); + let mut unchanged_file_ids = HashSet::new(); + let mut seen_filenames = HashSet::new(); + + for file in &scanned.files { + let filename = file.path.display().to_string(); + let hash = content_hash(&file.content); + + if let Some(&(old_id, old_hash)) = existing.get(filename.as_str()) { + seen_filenames.insert(filename.clone()); + if hash == old_hash { + file_id_map.insert(filename, old_id.to_string()); + unchanged_file_ids.insert(old_id.to_string()); + } else { + let file_id = old_id.to_string(); + file_id_map.insert(filename, file_id.clone()); + needs_embedding.push(FileToEmbed { + file_id, + scanned: file, + }); + } + } else { + let file_id = uuid::Uuid::new_v4().to_string(); + file_id_map.insert(filename, file_id.clone()); + needs_embedding.push(FileToEmbed { + file_id, + scanned: file, + }); + } + } + + let mut removed_file_ids = HashSet::new(); + let mut removed_filenames = Vec::new(); + for entry in existing_index { + if !seen_filenames.contains(entry.filename.as_str()) { + removed_file_ids.insert(entry.file_id.clone()); + removed_filenames.push(entry.filename.clone()); + } + } + let removed_count = removed_filenames.len(); + + FileClassification { + needs_embedding, + file_id_map, + unchanged_file_ids, + removed_count, + removed_file_ids, + removed_filenames, + } +} + +// ============================================================================ +// Embed logic (moved from pipeline/embed.rs) +// ============================================================================ + +/// Data produced by the embed files step. +struct EmbedFilesData { + /// Chunk rows for newly embedded files. + chunk_rows: Vec, + /// Per-file chunk counts (for verbose output). + details: Vec, +} + +/// Chunk, extract plain text, embed, and produce chunk rows for a single file. +async fn embed_file( + file_id: &str, + file: &ScannedFile, + max_chunk_size: usize, + embedder: &Embedder, +) -> Vec { + let chunks = Chunks::new(&file.content, max_chunk_size); + let plain_texts: Vec = chunks + .iter() + .map(|c| extract_plain_text(&c.plain_text)) + .collect(); + let text_refs: Vec<&str> = plain_texts.iter().map(|s| s.as_str()).collect(); + let embeddings = if text_refs.is_empty() { + vec![] + } else { + embedder.embed_batch(&text_refs).await + }; + + chunks + .iter() + .zip(embeddings) + .map(|(chunk, embedding)| ChunkRow { + chunk_id: uuid::Uuid::new_v4().to_string(), + file_id: file_id.to_string(), + chunk_index: chunk.chunk_index as i32, + start_line: chunk.start_line as i32, + end_line: chunk.end_line as i32, + embedding, + }) + .collect() +} + // ============================================================================ // run() // ============================================================================ @@ -42,46 +189,88 @@ pub async fn run( let start = Instant::now(); let mut substeps = Vec::new(); - // 1. Read config - let (read_config_result, config) = run_read_config(path); - substeps.push(from_pipeline_result(read_config_result, |o| { - Outcome::ReadConfig(ReadConfigOutcome { - config_path: o.config_path.clone(), - }) - })); + // 1. Read config — calls MdvsToml::read() + validate() directly + let config_start = Instant::now(); + let config_path_buf = path.join("mdvs.toml"); + let config = match MdvsToml::read(&config_path_buf) { + Ok(cfg) => match cfg.validate() { + Ok(()) => { + substeps.push(Step::leaf( + Outcome::ReadConfig(ReadConfigOutcome { + config_path: config_path_buf.display().to_string(), + }), + config_start.elapsed().as_millis() as u64, + )); + Some(cfg) + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::User, + format!("mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'"), + config_start.elapsed().as_millis() as u64, + )); + None + } + }, + Err(e) => { + substeps.push(Step::failed( + ErrorKind::User, + e.to_string(), + config_start.elapsed().as_millis() as u64, + )); + None + } + }; let config = match config { Some(c) => c, - None => return fail_from_last(&mut substeps, start, 7), + None => return fail_from_last(&mut substeps, start), }; // 2. Auto-update (conditional) let should_update = !no_update && config.build.as_ref().is_some_and(|b| b.auto_update); if should_update { let update_step = crate::cmd::update::run(path, &[], false, false, false).await; - if update_step.has_failed_step() { + if crate::step::has_failed(&update_step) { substeps.push(update_step); - return fail_msg( - &mut substeps, - start, - ErrorKind::User, - "auto-update failed", - 6, - ); + return fail_msg(&mut substeps, start, ErrorKind::User, "auto-update failed"); } substeps.push(update_step); } // Re-read config if auto-update ran let mut config = if should_update { - let (re_read, cfg) = run_read_config(path); - substeps.push(from_pipeline_result(re_read, |o| { - Outcome::ReadConfig(ReadConfigOutcome { - config_path: o.config_path.clone(), - }) - })); - match cfg { - Some(c) => c, - None => return fail_from_last(&mut substeps, start, 6), + let re_read_start = Instant::now(); + let re_read_path = path.join("mdvs.toml"); + match MdvsToml::read(&re_read_path) { + Ok(cfg) => match cfg.validate() { + Ok(()) => { + substeps.push(Step::leaf( + Outcome::ReadConfig(ReadConfigOutcome { + config_path: re_read_path.display().to_string(), + }), + re_read_start.elapsed().as_millis() as u64, + )); + cfg + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::User, + format!( + "mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'" + ), + re_read_start.elapsed().as_millis() as u64, + )); + return fail_from_last(&mut substeps, start); + } + }, + Err(e) => { + substeps.push(Step::failed( + ErrorKind::User, + e.to_string(), + re_read_start.elapsed().as_millis() as u64, + )); + return fail_from_last(&mut substeps, start); + } } } else { config @@ -109,19 +298,29 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start, 5); + return fail_from_last(&mut substeps, start); } - let (scan_result, scanned) = run_scan(path, &config.scan); - substeps.push(from_pipeline_result(scan_result, |o| { - Outcome::Scan(ScanOutcome { - files_found: o.files_found, - glob: o.glob.clone(), - }) - })); - let scanned = match scanned { - Some(s) => s, - None => return fail_from_last(&mut substeps, start, 5), + let scan_start = Instant::now(); + let scanned = match ScannedFiles::scan(path, &config.scan) { + Ok(s) => { + substeps.push(Step::leaf( + Outcome::Scan(ScanOutcome { + files_found: s.files.len(), + glob: config.scan.glob.clone(), + }), + scan_start.elapsed().as_millis() as u64, + )); + s + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + scan_start.elapsed().as_millis() as u64, + )); + return fail_from_last(&mut substeps, start); + } }; // 4. Validate @@ -136,56 +335,34 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start, 4); + return fail_from_last(&mut substeps, start); } - let (validate_result, validation_data) = run_validate(&scanned, &config, false); - let (violations, new_fields) = match validation_data { - Some((v, nf)) => (v, nf), - None => { - substeps.push(from_pipeline_result(validate_result, |o| { - Outcome::Validate(ValidateOutcome { - files_checked: o.files_checked, - violations: vec![], - new_fields: vec![], - }) - })); - return fail_from_last(&mut substeps, start, 4); + let validate_start = Instant::now(); + let check_result = match crate::cmd::check::validate(&scanned, &config, false) { + Ok(r) => r, + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + validate_start.elapsed().as_millis() as u64, + )); + return fail_from_last(&mut substeps, start); } }; - - // Build validate substep with actual violation/new_fields data - let validate_step = Step { - substeps: vec![], - outcome: match validate_result { - ProcessingStepResult::Completed(step) => StepOutcome::Complete { - result: Ok(Outcome::Validate(ValidateOutcome { - files_checked: step.output.files_checked, - violations: violations.clone(), - new_fields: new_fields.clone(), - })), - elapsed_ms: step.elapsed_ms, - }, - ProcessingStepResult::Failed(err) => StepOutcome::Complete { - result: Err(StepError { - kind: crate::step::convert_error_kind(err.kind), - message: err.message, - }), - elapsed_ms: 0, - }, - ProcessingStepResult::Skipped => StepOutcome::Skipped, - }, - }; - substeps.push(validate_step); + substeps.push(Step::leaf( + Outcome::Validate(ValidateOutcome { + files_checked: check_result.files_checked, + violations: check_result.field_violations.clone(), + new_fields: check_result.new_fields.clone(), + }), + validate_start.elapsed().as_millis() as u64, + )); + let violations = check_result.field_violations; + let new_fields = check_result.new_fields; // Abort on violations if !violations.is_empty() { - for _ in 0..4 { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); - } return Step { substeps, outcome: StepOutcome::Complete { @@ -225,7 +402,7 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start, 3); + return fail_from_last(&mut substeps, start); } }; @@ -242,7 +419,7 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start, 3); + return fail_from_last(&mut substeps, start); } }; let chunking = match config.chunking.as_ref() { @@ -258,7 +435,7 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start, 3); + return fail_from_last(&mut substeps, start); } }; let backend = Backend::parquet(path); @@ -278,7 +455,7 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start, 3); + return fail_from_last(&mut substeps, start); } let existing_index = if full_rebuild { @@ -297,7 +474,7 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start, 3); + return fail_from_last(&mut substeps, start); } } }; @@ -317,40 +494,139 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start, 3); + return fail_from_last(&mut substeps, start); } } }; - let (classify_result, classify_data) = - run_classify(&scanned, &existing_index, existing_chunks, full_rebuild); - substeps.push(from_pipeline_result(classify_result, |o| { - Outcome::Classify(ClassifyOutcome { - full_rebuild: o.full_rebuild, - needs_embedding: o.needs_embedding, - unchanged: o.unchanged, - removed: o.removed, - }) - })); - let classify_data = match classify_data { - Some(d) => d, - None => return fail_from_last(&mut substeps, start, 3), + let classify_start = Instant::now(); + let classify_data = if full_rebuild { + let mut file_id_map = HashMap::new(); + let needs_embedding: Vec> = scanned + .files + .iter() + .map(|f| { + let file_id = uuid::Uuid::new_v4().to_string(); + let filename = f.path.display().to_string(); + file_id_map.insert(filename, file_id.clone()); + FileToEmbed { + file_id, + scanned: f, + } + }) + .collect(); + let count = needs_embedding.len(); + substeps.push(Step::leaf( + Outcome::Classify(ClassifyOutcome { + full_rebuild: true, + needs_embedding: count, + unchanged: 0, + removed: 0, + }), + classify_start.elapsed().as_millis() as u64, + )); + ClassifyData { + full_rebuild: true, + needs_embedding, + file_id_map, + retained_chunks: vec![], + removed_count: 0, + chunks_removed: 0, + removed_details: vec![], + } + } else { + let classification = classify_files(&scanned, &existing_index); + + let mut removed_chunk_counts: HashMap<&str, usize> = HashMap::new(); + for c in &existing_chunks { + if classification.removed_file_ids.contains(&c.file_id) { + *removed_chunk_counts.entry(c.file_id.as_str()).or_default() += 1; + } + } + let chunks_removed: usize = removed_chunk_counts.values().sum(); + + let filename_to_id: HashMap<&str, &str> = existing_index + .iter() + .map(|e| (e.filename.as_str(), e.file_id.as_str())) + .collect(); + let mut removed_details = Vec::new(); + for filename in &classification.removed_filenames { + let file_id = filename_to_id.get(filename.as_str()).copied().unwrap_or(""); + let chunk_count = removed_chunk_counts.get(file_id).copied().unwrap_or(0); + removed_details.push(BuildFileDetail { + filename: filename.clone(), + chunks: chunk_count, + }); + } + + let retained_chunks: Vec = existing_chunks + .into_iter() + .filter(|c| classification.unchanged_file_ids.contains(&c.file_id)) + .collect(); + + let needs_count = classification.needs_embedding.len(); + let unchanged_count = classification.unchanged_file_ids.len(); + let removed_count = classification.removed_count; + + substeps.push(Step::leaf( + Outcome::Classify(ClassifyOutcome { + full_rebuild: false, + needs_embedding: needs_count, + unchanged: unchanged_count, + removed: removed_count, + }), + classify_start.elapsed().as_millis() as u64, + )); + ClassifyData { + full_rebuild: false, + needs_embedding: classification.needs_embedding, + file_id_map: classification.file_id_map, + retained_chunks, + removed_count, + chunks_removed, + removed_details, + } }; let needs_embedding = !classify_data.needs_embedding.is_empty(); - // 6. Load model - let (load_model_result, embedder) = if needs_embedding { - run_load_model(embedding) + // 6. Load model — calls ModelConfig::try_from() + Embedder::load() directly + let embedder = if needs_embedding { + let model_start = Instant::now(); + match ModelConfig::try_from(embedding) { + Ok(mc) => match Embedder::load(&mc) { + Ok(emb) => { + substeps.push(Step::leaf( + Outcome::LoadModel(LoadModelOutcome { + model_name: embedding.name.clone(), + dimension: emb.dimension(), + }), + model_start.elapsed().as_millis() as u64, + )); + Some(emb) + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + model_start.elapsed().as_millis() as u64, + )); + None + } + }, + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + model_start.elapsed().as_millis() as u64, + )); + None + } + } } else { - (ProcessingStepResult::Skipped, None) + substeps.push(Step::skipped()); + None }; - substeps.push(from_pipeline_result(load_model_result, |o| { - Outcome::LoadModel(LoadModelOutcome { - model_name: o.model_name.clone(), - dimension: o.dimension, - }) - })); // Dimension check let dim_error = if full_rebuild { @@ -375,18 +651,11 @@ pub async fn run( }; if needs_embedding && embedder.is_none() { - for _ in 0..2 { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); - } return fail_msg( &mut substeps, start, ErrorKind::Application, "model loading failed", - 0, ); } @@ -395,46 +664,38 @@ pub async fn run( let built_at = chrono::Utc::now().timestamp_micros(); if let Some(msg) = dim_error { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: msg, - }), - elapsed_ms: 0, - }, - }); - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); // write_index - return fail_from_last_skip(&mut substeps, start, 0); + substeps.push(Step::failed(ErrorKind::User, msg, 0)); + return fail_from_last(&mut substeps, start); } - let (embed_result, embed_data) = if needs_embedding { + let embed_data = if needs_embedding { + let embed_start = Instant::now(); let emb = embedder.as_ref().unwrap(); - run_embed_files(&classify_data.needs_embedding, emb, max_chunk_size).await + let mut embed_chunk_rows = Vec::new(); + let mut details = Vec::new(); + for fte in &classify_data.needs_embedding { + let crs = embed_file(&fte.file_id, fte.scanned, max_chunk_size, emb).await; + details.push(BuildFileDetail { + filename: fte.scanned.path.display().to_string(), + chunks: crs.len(), + }); + embed_chunk_rows.extend(crs); + } + substeps.push(Step::leaf( + Outcome::EmbedFiles(EmbedFilesOutcome { + files_embedded: classify_data.needs_embedding.len(), + chunks_produced: embed_chunk_rows.len(), + }), + embed_start.elapsed().as_millis() as u64, + )); + Some(EmbedFilesData { + chunk_rows: embed_chunk_rows, + details, + }) } else { - (ProcessingStepResult::Skipped, None) + substeps.push(Step::skipped()); + None }; - substeps.push(from_pipeline_result(embed_result, |o| { - Outcome::EmbedFiles(EmbedFilesOutcome { - files_embedded: o.files_embedded, - chunks_produced: o.chunks_produced, - }) - })); - - if needs_embedding - && embed_data.is_none() - && !matches!(substeps.last().unwrap().outcome, StepOutcome::Skipped) - { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); // write_index - return fail_from_last_skip(&mut substeps, start, 0); - } // 8. Write index let file_rows: Vec = scanned @@ -467,22 +728,25 @@ pub async fn run( built_at: chrono::Utc::now().to_rfc3339(), }; - let write_index_result = run_write_index( - &backend, - &schema_fields, - &file_rows, - &chunk_rows, - build_meta, - ); - substeps.push(from_pipeline_result(write_index_result, |o| { - Outcome::WriteIndex(WriteIndexOutcome { - files_written: o.files_written, - chunks_written: o.chunks_written, - }) - })); - - if crate::step::has_failed(substeps.last().unwrap()) { - return fail_from_last_skip(&mut substeps, start, 0); + let write_start = Instant::now(); + match backend.write_index(&schema_fields, &file_rows, &chunk_rows, build_meta) { + Ok(()) => { + substeps.push(Step::leaf( + Outcome::WriteIndex(WriteIndexOutcome { + files_written: file_rows.len(), + chunks_written: chunk_rows.len(), + }), + write_start.elapsed().as_millis() as u64, + )); + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + write_start.elapsed().as_millis() as u64, + )); + return fail_from_last(&mut substeps, start); + } } // Assemble BuildOutcome @@ -512,22 +776,15 @@ pub async fn run( } } -/// Push N Skipped substeps, extract error from last substep, return failed command Step. -fn fail_from_last( - substeps: &mut Vec>, - start: Instant, - skipped: usize, -) -> Step { - let msg = match substeps.last().map(|s| &s.outcome) { - Some(StepOutcome::Complete { result: Err(e), .. }) => e.message.clone(), - _ => "step failed".into(), +/// Extract error from last failed substep and return a failed command Step. +fn fail_from_last(substeps: &mut Vec>, start: Instant) -> Step { + let msg = match substeps.iter().rev().find_map(|s| match &s.outcome { + StepOutcome::Complete { result: Err(e), .. } => Some(e.message.clone()), + _ => None, + }) { + Some(m) => m, + None => "step failed".into(), }; - for _ in 0..skipped { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); - } Step { substeps: std::mem::take(substeps), outcome: StepOutcome::Complete { @@ -540,20 +797,13 @@ fn fail_from_last( } } -/// Push N Skipped substeps, return failed command Step with a specific message. +/// Return a failed command Step with a specific message. fn fail_msg( substeps: &mut Vec>, start: Instant, kind: ErrorKind, msg: &str, - skipped: usize, ) -> Step { - for _ in 0..skipped { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); - } Step { substeps: std::mem::take(substeps), outcome: StepOutcome::Complete { @@ -566,31 +816,6 @@ fn fail_msg( } } -/// Return failed command Step (no additional Skipped substeps). -fn fail_from_last_skip( - substeps: &mut Vec>, - start: Instant, - _skipped: usize, -) -> Step { - let msg = match substeps.iter().rev().find_map(|s| match &s.outcome { - StepOutcome::Complete { result: Err(e), .. } => Some(e.message.clone()), - _ => None, - }) { - Some(m) => m, - None => "step failed".into(), - }; - Step { - substeps: std::mem::take(substeps), - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, - } -} - // ============================================================================ // Helpers // ============================================================================ @@ -784,7 +1009,7 @@ mod tests { async fn missing_config() { let tmp = tempfile::tempdir().unwrap(); let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(output.has_failed_step()); + assert!(crate::step::has_failed(&output)); } #[tokio::test] @@ -802,20 +1027,24 @@ mod tests { false, // skip_gitignore false, // verbose ); - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; assert!( - !output.has_failed_step(), + !crate::step::has_failed(&output), "first build failed: {:#?}", output ); // Run build again (tests standalone rebuild) let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step(), "build failed: {:#?}", output); - assert!(!output.has_failed_step()); + assert!( + !crate::step::has_failed(&output), + "build failed: {:#?}", + output + ); + assert!(!crate::step::has_failed(&output)); // Verify Parquet files exist let files_path = tmp.path().join(".mdvs/files.parquet"); @@ -870,11 +1099,11 @@ mod tests { false, // skip_gitignore false, // verbose ); - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); // Overwrite chunks.parquet with wrong dimension (2 instead of actual) let bad_chunks = vec![ChunkRow { @@ -897,7 +1126,7 @@ mod tests { // Build should fail with dimension mismatch when model loads let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(output.has_failed_step()); + assert!(crate::step::has_failed(&output)); let err = unwrap_error(&output); assert!(err.message.contains("dimension mismatch")); } @@ -919,11 +1148,11 @@ mod tests { false, // skip_gitignore false, // verbose ); - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); // Overwrite chunks.parquet with wrong dimension (2 instead of actual) let bad_chunks = vec![ChunkRow { @@ -940,10 +1169,10 @@ mod tests { // Build with --force should succeed despite dimension mismatch let output = run(tmp.path(), None, None, None, true, true, false).await; assert!( - !output.has_failed_step(), + !crate::step::has_failed(&output), "expected success with --force, got failed step" ); - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let result = unwrap_build(&output); assert!(result.full_rebuild); } @@ -963,7 +1192,7 @@ mod tests { false, // skip_gitignore false, // verbose ); - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); // Verify no model/chunking sections (auto-flag sections are present from init) let config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); @@ -972,7 +1201,11 @@ mod tests { // Build should fill defaults and succeed let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step(), "build failed: {:#?}", output); + assert!( + !crate::step::has_failed(&output), + "build failed: {:#?}", + output + ); // Verify sections were written let config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); @@ -1004,11 +1237,11 @@ mod tests { false, // skip_gitignore false, // verbose ); - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); // Build the index (creates build sections) let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); // Try to change model without --force let output = run( @@ -1021,7 +1254,7 @@ mod tests { false, ) .await; - assert!(output.has_failed_step()); + assert!(crate::step::has_failed(&output)); let err = unwrap_error(&output); assert!(err.message.contains("--force")); } @@ -1044,10 +1277,10 @@ mod tests { // Build the index (creates build sections) let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let output = run(tmp.path(), None, None, Some(512), false, true, false).await; - assert!(output.has_failed_step()); + assert!(crate::step::has_failed(&output)); let err = unwrap_error(&output); assert!(err.message.contains("--force")); } @@ -1070,12 +1303,12 @@ mod tests { // Build the index (creates build sections) let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); // Change chunk size with --force (same model so no dimension mismatch) let output = run(tmp.path(), None, None, Some(512), true, true, false).await; assert!( - !output.has_failed_step(), + !crate::step::has_failed(&output), "build with --force failed: {:#?}", output ); @@ -1102,7 +1335,7 @@ mod tests { // Build the index (creates build sections + parquets) let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); // Manually change chunk_size in toml (simulates user editing) let mut config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); @@ -1111,7 +1344,7 @@ mod tests { // Build without --force should error let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(output.has_failed_step()); + assert!(crate::step::has_failed(&output)); let err = unwrap_error(&output); assert!(err.message.contains("config changed since last build")); assert!(err.message.contains("chunk_size")); @@ -1119,7 +1352,7 @@ mod tests { // Build with --force should succeed let output = run(tmp.path(), None, None, None, true, true, false).await; assert!( - !output.has_failed_step(), + !crate::step::has_failed(&output), "build with --force failed: {:#?}", output ); @@ -1321,7 +1554,7 @@ mod tests { // Build should succeed despite unknown "author" field let output = run(tmp.path(), None, None, None, false, true, false).await; assert!( - !output.has_failed_step(), + !crate::step::has_failed(&output), "build should succeed with new fields: {:#?}", output ); @@ -1368,13 +1601,13 @@ mod tests { // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let (files_before, chunks_before) = read_index_state(tmp.path()); // Build again with no changes let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let (files_after, chunks_after) = read_index_state(tmp.path()); @@ -1410,7 +1643,7 @@ mod tests { // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let (files_before, chunks_before) = read_index_state(tmp.path()); // Add a new file @@ -1421,7 +1654,7 @@ mod tests { .unwrap(); let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let (files_after, chunks_after) = read_index_state(tmp.path()); @@ -1464,7 +1697,7 @@ mod tests { // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let (files_before, chunks_before) = read_index_state(tmp.path()); let post1_id = files_before["blog/post1.md"].clone(); @@ -1489,7 +1722,7 @@ mod tests { ).unwrap(); let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let (files_after, chunks_after) = read_index_state(tmp.path()); @@ -1537,7 +1770,7 @@ mod tests { // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let (files_before, _) = read_index_state(tmp.path()); assert_eq!(files_before.len(), 2); @@ -1546,7 +1779,7 @@ mod tests { fs::remove_file(tmp.path().join("blog/post2.md")).unwrap(); let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let (files_after, chunks_after) = read_index_state(tmp.path()); @@ -1577,7 +1810,7 @@ mod tests { // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let (_, chunks_before) = read_index_state(tmp.path()); let old_chunk_ids: HashSet = @@ -1590,7 +1823,7 @@ mod tests { ).unwrap(); let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let (_, chunks_after) = read_index_state(tmp.path()); let new_chunk_ids: HashSet = @@ -1618,7 +1851,7 @@ mod tests { // Build the index let output = run(tmp.path(), None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let (files_before, chunks_before) = read_index_state(tmp.path()); let old_file_ids: HashSet = files_before.values().cloned().collect(); @@ -1627,7 +1860,7 @@ mod tests { // Force rebuild — should generate all new IDs let output = run(tmp.path(), None, None, None, true, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let (files_after, chunks_after) = read_index_state(tmp.path()); let new_file_ids: HashSet = files_after.values().cloned().collect(); @@ -1657,4 +1890,111 @@ mod tests { Some("abc123def".to_string()) ); } + + // ======================================================================== + // classify_files unit tests (moved from pipeline/classify.rs) + // ======================================================================== + + use crate::discover::scan::ScannedFile; + + fn make_scanned_files(files: Vec<(&str, &str)>) -> ScannedFiles { + ScannedFiles { + files: files + .into_iter() + .map(|(path, body)| ScannedFile { + path: std::path::PathBuf::from(path), + data: None, + content: body.to_string(), + }) + .collect(), + } + } + + #[test] + fn classify_all_new() { + let scanned = make_scanned_files(vec![("a.md", "hello"), ("b.md", "world")]); + let existing: Vec = vec![]; + let c = classify_files(&scanned, &existing); + + assert_eq!(c.needs_embedding.len(), 2); + assert_eq!(c.unchanged_file_ids.len(), 0); + assert_eq!(c.removed_count, 0); + assert_eq!(c.file_id_map.len(), 2); + } + + #[test] + fn classify_all_unchanged() { + let scanned = make_scanned_files(vec![("a.md", "hello"), ("b.md", "world")]); + let existing = vec![ + FileIndexEntry { + file_id: "f1".into(), + filename: "a.md".into(), + content_hash: content_hash("hello"), + }, + FileIndexEntry { + file_id: "f2".into(), + filename: "b.md".into(), + content_hash: content_hash("world"), + }, + ]; + let c = classify_files(&scanned, &existing); + + assert_eq!(c.needs_embedding.len(), 0); + assert_eq!(c.unchanged_file_ids.len(), 2); + assert!(c.unchanged_file_ids.contains("f1")); + assert!(c.unchanged_file_ids.contains("f2")); + assert_eq!(c.removed_count, 0); + assert_eq!(c.file_id_map["a.md"], "f1"); + assert_eq!(c.file_id_map["b.md"], "f2"); + } + + #[test] + fn classify_mixed() { + let scanned = make_scanned_files(vec![ + ("a.md", "same content"), + ("b.md", "new body"), + ("c.md", "brand new"), + ]); + let existing = vec![ + FileIndexEntry { + file_id: "f1".into(), + filename: "a.md".into(), + content_hash: content_hash("same content"), + }, + FileIndexEntry { + file_id: "f2".into(), + filename: "b.md".into(), + content_hash: content_hash("old body"), + }, + FileIndexEntry { + file_id: "f3".into(), + filename: "d.md".into(), + content_hash: content_hash("deleted"), + }, + ]; + let c = classify_files(&scanned, &existing); + + assert!(c.unchanged_file_ids.contains("f1")); + assert_eq!(c.file_id_map["a.md"], "f1"); + + assert_eq!(c.needs_embedding.len(), 2); + let edited = c + .needs_embedding + .iter() + .find(|f| f.scanned.path.to_str() == Some("b.md")) + .unwrap(); + assert_eq!(edited.file_id, "f2"); + + let new = c + .needs_embedding + .iter() + .find(|f| f.scanned.path.to_str() == Some("c.md")) + .unwrap(); + assert_ne!(new.file_id, "f1"); + assert_ne!(new.file_id, "f2"); + assert_ne!(new.file_id, "f3"); + + assert_eq!(c.removed_count, 1); + assert!(!c.file_id_map.contains_key("d.md")); + } } diff --git a/src/cmd/check.rs b/src/cmd/check.rs index e401449..dcba8b6 100644 --- a/src/cmd/check.rs +++ b/src/cmd/check.rs @@ -5,7 +5,7 @@ use crate::outcome::{Outcome, ReadConfigOutcome, ScanOutcome, ValidateOutcome}; use crate::output::{FieldViolation, NewField, ViolatingFile, ViolationKind}; use crate::schema::config::MdvsToml; use crate::schema::shared::FieldTypeSerde; -use crate::step::{from_pipeline_result, ErrorKind, Step, StepError, StepOutcome}; +use crate::step::{ErrorKind, Step, StepError, StepOutcome}; use globset::Glob; use serde::Serialize; use serde_json::Value; @@ -43,31 +43,45 @@ impl CheckResult { /// Read config, optionally auto-update, scan files, and validate frontmatter. #[instrument(name = "check", skip_all)] pub async fn run(path: &Path, no_update: bool, verbose: bool) -> Step { - use crate::pipeline::read_config::run_read_config; - use crate::pipeline::scan::run_scan; - let start = std::time::Instant::now(); let mut substeps = Vec::new(); - // 1. Read config - let (read_config_result, config) = run_read_config(path); - substeps.push(from_pipeline_result(read_config_result, |o| { - Outcome::ReadConfig(ReadConfigOutcome { - config_path: o.config_path.clone(), - }) - })); + // 1. Read config — calls MdvsToml::read() + validate() directly + let config_start = std::time::Instant::now(); + let config_path_buf = path.join("mdvs.toml"); + let config = match MdvsToml::read(&config_path_buf) { + Ok(cfg) => match cfg.validate() { + Ok(()) => { + substeps.push(Step::leaf( + Outcome::ReadConfig(ReadConfigOutcome { + config_path: config_path_buf.display().to_string(), + }), + config_start.elapsed().as_millis() as u64, + )); + Some(cfg) + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::User, + format!("mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'"), + config_start.elapsed().as_millis() as u64, + )); + None + } + }, + Err(e) => { + substeps.push(Step::failed( + ErrorKind::User, + e.to_string(), + config_start.elapsed().as_millis() as u64, + )); + None + } + }; let config = match config { Some(c) => c, None => { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); // scan - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); // validate let msg = match &substeps[0].outcome { StepOutcome::Complete { result: Err(e), .. } => e.message.clone(), _ => "failed to read config".into(), @@ -85,19 +99,13 @@ pub async fn run(path: &Path, no_update: bool, verbose: bool) -> Step { } }; - // 2. Auto-update (calls old update::run(), does not nest as substep during migration) + // 2. Auto-update let should_update = !no_update && config.check.as_ref().is_some_and(|c| c.auto_update); if should_update { let update_output = crate::cmd::update::run(path, &[], false, false, verbose).await; - if update_output.has_failed_step() { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); // scan - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); // validate + let failed = crate::step::has_failed(&update_output); + substeps.push(update_output); + if failed { return Step { substeps, outcome: StepOutcome::Complete { @@ -116,14 +124,6 @@ pub async fn run(path: &Path, no_update: bool, verbose: bool) -> Step { match MdvsToml::read(&path.join("mdvs.toml")) { Ok(c) => c, Err(e) => { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); // scan - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); // validate return Step { substeps, outcome: StepOutcome::Complete { @@ -140,22 +140,32 @@ pub async fn run(path: &Path, no_update: bool, verbose: bool) -> Step { config }; - // 3. Scan - let (scan_result, scanned) = run_scan(path, &config.scan); - substeps.push(from_pipeline_result(scan_result, |o| { - Outcome::Scan(ScanOutcome { - files_found: o.files_found, - glob: o.glob.clone(), - }) - })); + // 3. Scan — calls ScannedFiles::scan() directly + let scan_start = std::time::Instant::now(); + let scanned = match ScannedFiles::scan(path, &config.scan) { + Ok(s) => { + substeps.push(Step::leaf( + Outcome::Scan(ScanOutcome { + files_found: s.files.len(), + glob: config.scan.glob.clone(), + }), + scan_start.elapsed().as_millis() as u64, + )); + Some(s) + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + scan_start.elapsed().as_millis() as u64, + )); + None + } + }; let scanned = match scanned { Some(s) => s, None => { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); // validate let msg = match &substeps.last().unwrap().outcome { StepOutcome::Complete { result: Err(e), .. } => e.message.clone(), _ => "scan failed".into(), diff --git a/src/cmd/info.rs b/src/cmd/info.rs index b92e3a4..e34ba93 100644 --- a/src/cmd/info.rs +++ b/src/cmd/info.rs @@ -100,8 +100,6 @@ pub fn run(path: &Path, _verbose: bool) -> Step { let config = match config { Some(c) => c, None => { - substeps.push(Step::skipped()); // scan - substeps.push(Step::skipped()); // read_index let msg = match &substeps[0].outcome { StepOutcome::Complete { result: Err(e), .. } => e.message.clone(), _ => "failed to read config".into(), @@ -346,7 +344,7 @@ mod tests { let step = crate::cmd::init::run(dir, "**", false, false, true, false, false); assert!(!crate::step::has_failed(&step)); let output = crate::cmd::build::run(dir, None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); } #[test] diff --git a/src/cmd/init.rs b/src/cmd/init.rs index 63185b3..d7ff6f6 100644 --- a/src/cmd/init.rs +++ b/src/cmd/init.rs @@ -1,8 +1,11 @@ +use crate::discover::infer::InferredSchema; +use crate::discover::scan::ScannedFiles; use crate::outcome::commands::InitOutcome; use crate::outcome::{InferOutcome, Outcome, ScanOutcome, WriteConfigOutcome}; use crate::output::DiscoveredField; +use crate::schema::config::MdvsToml; use crate::schema::shared::ScanConfig; -use crate::step::{from_pipeline_result, ErrorKind, Step, StepError, StepOutcome}; +use crate::step::{ErrorKind, Step, StepError, StepOutcome}; use std::path::Path; use std::time::Instant; use tracing::{info, instrument}; @@ -19,10 +22,6 @@ pub fn run( skip_gitignore: bool, _verbose: bool, ) -> Step { - use crate::pipeline::infer::run_infer; - use crate::pipeline::scan::run_scan; - use crate::pipeline::write_config::run_write_config; - let start = Instant::now(); let mut substeps = Vec::new(); @@ -35,7 +34,6 @@ pub fn run( start, ErrorKind::User, format!("'{}' is not a directory", path.display()), - 3, // scan + infer + write_config ); } @@ -50,11 +48,9 @@ pub fn run( "mdvs is already initialized in '{}' (use --force to reinitialize)", path.display() ), - 3, ); } - // --force: delete existing artifacts if force { if config_path.exists() { let _ = std::fs::remove_file(&config_path); @@ -64,24 +60,31 @@ pub fn run( } } - // 1. Scan + // 1. Scan — calls ScannedFiles::scan() directly let scan_config = ScanConfig { glob: glob.to_string(), include_bare_files: !ignore_bare_files, skip_gitignore, }; - let (scan_result, scanned) = run_scan(path, &scan_config); - substeps.push(from_pipeline_result(scan_result, |o| { - Outcome::Scan(ScanOutcome { - files_found: o.files_found, - glob: o.glob.clone(), - }) - })); - - let scanned = match scanned { - Some(s) => s, - None => { - return fail_from_last_substep(&mut substeps, start, 2); // infer + write_config + let scan_start = Instant::now(); + let scanned = match ScannedFiles::scan(path, &scan_config) { + Ok(s) => { + substeps.push(Step::leaf( + Outcome::Scan(ScanOutcome { + files_found: s.files.len(), + glob: scan_config.glob.clone(), + }), + scan_start.elapsed().as_millis() as u64, + )); + s + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + scan_start.elapsed().as_millis() as u64, + )); + return fail_from_last_substep(&mut substeps, start); } }; @@ -97,10 +100,6 @@ pub fn run( elapsed_ms: 0, }, }); - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); // write_config return Step { substeps, outcome: StepOutcome::Complete { @@ -113,19 +112,15 @@ pub fn run( }; } - let (infer_result, schema) = run_infer(&scanned); - substeps.push(from_pipeline_result(infer_result, |o| { + // 2b. Infer — InferredSchema::infer() is infallible + let infer_start = Instant::now(); + let schema = InferredSchema::infer(&scanned); + substeps.push(Step::leaf( Outcome::Infer(InferOutcome { - fields_inferred: o.fields_inferred, - }) - })); - - let schema = match schema { - Some(s) => s, - None => { - return fail_from_last_substep(&mut substeps, start, 1); // write_config - } - }; + fields_inferred: schema.fields.len(), + }), + infer_start.elapsed().as_millis() as u64, + )); let total_files = scanned.files.len(); info!(fields = schema.fields.len(), "schema inferred"); @@ -137,20 +132,30 @@ pub fn run( .map(|f| f.to_discovered(total_files, true)) .collect(); - // 3. Write config (Skipped if dry_run) + // 3. Write config — MdvsToml::from_inferred() + write() directly if dry_run { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); + substeps.push(Step::skipped()); } else { - let (write_result, _config) = run_write_config(path, &schema, scan_config); - substeps.push(from_pipeline_result(write_result, |o| { - Outcome::WriteConfig(WriteConfigOutcome { - config_path: o.config_path.clone(), - fields_written: o.fields_written, - }) - })); + let write_start = Instant::now(); + let toml_doc = MdvsToml::from_inferred(&schema, scan_config); + match toml_doc.write(&config_path) { + Ok(()) => { + substeps.push(Step::leaf( + Outcome::WriteConfig(WriteConfigOutcome { + config_path: config_path.display().to_string(), + fields_written: schema.fields.len(), + }), + write_start.elapsed().as_millis() as u64, + )); + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + write_start.elapsed().as_millis() as u64, + )); + } + } } Step { @@ -167,20 +172,13 @@ pub fn run( } } -/// Helper: push N Skipped substeps and return a failed Step. +/// Helper: return a failed Step with the given error. fn fail_early( - mut substeps: Vec>, + substeps: Vec>, start: Instant, kind: ErrorKind, message: String, - skipped_count: usize, ) -> Step { - for _ in 0..skipped_count { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); - } Step { substeps, outcome: StepOutcome::Complete { @@ -190,22 +188,12 @@ fn fail_early( } } -/// Helper: extract error from last substep, push N Skipped, return failed Step. -fn fail_from_last_substep( - substeps: &mut Vec>, - start: Instant, - skipped_count: usize, -) -> Step { +/// Helper: extract error from last substep and return a failed Step. +fn fail_from_last_substep(substeps: &mut Vec>, start: Instant) -> Step { let msg = match substeps.last().map(|s| &s.outcome) { Some(StepOutcome::Complete { result: Err(e), .. }) => e.message.clone(), _ => "step failed".into(), }; - for _ in 0..skipped_count { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); - } Step { substeps: std::mem::take(substeps), outcome: StepOutcome::Complete { diff --git a/src/cmd/search.rs b/src/cmd/search.rs index fe054ec..3fb1a1f 100644 --- a/src/cmd/search.rs +++ b/src/cmd/search.rs @@ -1,10 +1,13 @@ use crate::index::backend::Backend; +use crate::index::embed::{Embedder, ModelConfig}; +use crate::index::storage::BuildMetadata; use crate::outcome::commands::SearchOutcome; use crate::outcome::{ EmbedQueryOutcome, ExecuteSearchOutcome, LoadModelOutcome, Outcome, ReadConfigOutcome, ReadIndexOutcome, }; -use crate::step::{from_pipeline_result, ErrorKind, Step, StepError, StepOutcome}; +use crate::schema::config::MdvsToml; +use crate::step::{ErrorKind, Step, StepError, StepOutcome}; use std::path::Path; use std::time::Instant; use tracing::{instrument, warn}; @@ -21,6 +24,27 @@ fn read_lines(path: &Path, start: i32, end: i32) -> Option { Some(lines[start..end].join("\n")) } +/// Index metadata, used for model mismatch check. +struct IndexData { + metadata: BuildMetadata, +} + +/// Validate --where clause for unmatched quotes. +fn validate_where_clause(w: &str) -> Result<(), String> { + if w.chars().filter(|&c| c == '\'').count() % 2 != 0 { + return Err( + "unmatched single quote in --where clause — escape with '' (e.g. O''Brien)".into(), + ); + } + if w.chars().filter(|&c| c == '"').count() % 2 != 0 { + return Err( + "unmatched double quote in --where clause — escape with \"\" (e.g. \"\"field\"\")" + .into(), + ); + } + Ok(()) +} + /// Embed a query, search the index, and return ranked results. #[instrument(name = "search", skip_all)] pub async fn run( @@ -32,22 +56,41 @@ pub async fn run( no_build: bool, _verbose: bool, ) -> Step { - use crate::pipeline::embed::run_embed_query; - use crate::pipeline::execute_search::run_execute_search; - use crate::pipeline::load_model::run_load_model; - use crate::pipeline::read_config::run_read_config; - use crate::pipeline::read_index::run_read_index; - let start = Instant::now(); let mut substeps = Vec::new(); - // 1. Read config - let (read_config_result, config) = run_read_config(path); - substeps.push(from_pipeline_result(read_config_result, |o| { - Outcome::ReadConfig(ReadConfigOutcome { - config_path: o.config_path.clone(), - }) - })); + // 1. Read config — calls MdvsToml::read() + validate() directly + let config_start = Instant::now(); + let config_path_buf = path.join("mdvs.toml"); + let config = match MdvsToml::read(&config_path_buf) { + Ok(cfg) => match cfg.validate() { + Ok(()) => { + substeps.push(Step::leaf( + Outcome::ReadConfig(ReadConfigOutcome { + config_path: config_path_buf.display().to_string(), + }), + config_start.elapsed().as_millis() as u64, + )); + Some(cfg) + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::User, + format!("mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'"), + config_start.elapsed().as_millis() as u64, + )); + None + } + }, + Err(e) => { + substeps.push(Step::failed( + ErrorKind::User, + e.to_string(), + config_start.elapsed().as_millis() as u64, + )); + None + } + }; // Auto-build: run build before searching if configured let auto_build = if let Some(ref cfg) = config { @@ -56,15 +99,9 @@ pub async fn run( let build_no_update = no_update || !cfg.search.as_ref().is_some_and(|s| s.auto_update); let build_step = crate::cmd::build::run(path, None, None, None, false, build_no_update, false).await; - if build_step.has_failed_step() { + if crate::step::has_failed(&build_step) { substeps.push(build_step); - return fail_msg( - &mut substeps, - start, - ErrorKind::User, - "auto-build failed", - 4, - ); + return fail_msg(&mut substeps, start, ErrorKind::User, "auto-build failed"); } Some(build_step) } else { @@ -81,16 +118,38 @@ pub async fn run( // Re-read config if auto-build ran let config = if substeps.len() > 1 { - // auto-build ran, re-read - let (result, cfg) = run_read_config(path); - substeps.push(from_pipeline_result(result, |o| { - Outcome::ReadConfig(ReadConfigOutcome { - config_path: o.config_path.clone(), - }) - })); - match cfg { - Some(c) => Some(c), - None => return fail_from_last(&mut substeps, start, 3), + let re_read_start = Instant::now(); + let re_read_path = path.join("mdvs.toml"); + match MdvsToml::read(&re_read_path) { + Ok(cfg) => match cfg.validate() { + Ok(()) => { + substeps.push(Step::leaf( + Outcome::ReadConfig(ReadConfigOutcome { + config_path: re_read_path.display().to_string(), + }), + re_read_start.elapsed().as_millis() as u64, + )); + Some(cfg) + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::User, + format!( + "mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'" + ), + re_read_start.elapsed().as_millis() as u64, + )); + return fail_from_last(&mut substeps, start); + } + }, + Err(e) => { + substeps.push(Step::failed( + ErrorKind::User, + e.to_string(), + re_read_start.elapsed().as_millis() as u64, + )); + return fail_from_last(&mut substeps, start); + } } } else { config @@ -98,28 +157,58 @@ pub async fn run( let embedding = config.as_ref().and_then(|c| c.embedding_model.as_ref()); - // 2. Read index - let (read_index_result, index_data) = match &config { - Some(_) => run_read_index(path), + // 2. Read index — calls Backend methods directly + let index_data = match &config { + Some(_) => { + let index_start = Instant::now(); + let backend = Backend::parquet(path); + if !backend.exists() { + substeps.push(Step::leaf( + Outcome::ReadIndex(ReadIndexOutcome { + exists: false, + files_indexed: 0, + chunks: 0, + }), + index_start.elapsed().as_millis() as u64, + )); + None + } else { + let build_meta = backend.read_metadata().ok().flatten(); + let idx_stats = backend.stats().ok().flatten(); + match (build_meta, idx_stats) { + (Some(metadata), Some(stats)) => { + substeps.push(Step::leaf( + Outcome::ReadIndex(ReadIndexOutcome { + exists: true, + files_indexed: stats.files_indexed, + chunks: stats.chunks, + }), + index_start.elapsed().as_millis() as u64, + )); + Some(IndexData { metadata }) + } + _ => { + substeps.push(Step::leaf( + Outcome::ReadIndex(ReadIndexOutcome { + exists: false, + files_indexed: 0, + chunks: 0, + }), + index_start.elapsed().as_millis() as u64, + )); + None + } + } + } + } None => { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); - return fail_from_last(&mut substeps, start, 3); + return fail_from_last(&mut substeps, start); } }; - substeps.push(from_pipeline_result(read_index_result, |o| { - Outcome::ReadIndex(ReadIndexOutcome { - exists: o.exists, - files_indexed: o.files_indexed, - chunks: o.chunks, - }) - })); // Pre-checks before loading model let pre_check_error: Option = match (config.as_ref(), embedding, index_data.as_ref()) { - (None, _, _) => None, // already failed + (None, _, _) => None, (_, None, _) => { Some("missing [embedding_model] in mdvs.toml (run `mdvs build` first)".to_string()) } @@ -137,85 +226,106 @@ pub async fn run( } }; - // 3. Load model + // 3. Load model — calls ModelConfig::try_from() + Embedder::load() directly if let Some(msg) = pre_check_error { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: msg, - }), - elapsed_ms: 0, - }, - }); - return fail_from_last(&mut substeps, start, 2); + substeps.push(Step::failed(ErrorKind::User, msg, 0)); + return fail_from_last(&mut substeps, start); } let emb_config = embedding.unwrap(); - let (load_model_result, embedder) = run_load_model(emb_config); - substeps.push(from_pipeline_result(load_model_result, |o| { - Outcome::LoadModel(LoadModelOutcome { - model_name: o.model_name.clone(), - dimension: o.dimension, - }) - })); - let embedder = match embedder { - Some(e) => e, - None => return fail_from_last(&mut substeps, start, 2), + let model_start = Instant::now(); + let embedder = match ModelConfig::try_from(emb_config) { + Ok(mc) => match Embedder::load(&mc) { + Ok(emb) => { + substeps.push(Step::leaf( + Outcome::LoadModel(LoadModelOutcome { + model_name: emb_config.name.clone(), + dimension: emb.dimension(), + }), + model_start.elapsed().as_millis() as u64, + )); + emb + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + model_start.elapsed().as_millis() as u64, + )); + return fail_from_last(&mut substeps, start); + } + }, + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + model_start.elapsed().as_millis() as u64, + )); + return fail_from_last(&mut substeps, start); + } }; - // 4. Embed query - let (embed_query_result, query_embedding) = run_embed_query(&embedder, query).await; - substeps.push(from_pipeline_result(embed_query_result, |o| { + // 4. Embed query — calls embedder.embed() directly (infallible) + let embed_start = Instant::now(); + let query_embedding = embedder.embed(query).await; + substeps.push(Step::leaf( Outcome::EmbedQuery(EmbedQueryOutcome { - query: o.query.clone(), - }) - })); - let query_embedding = match query_embedding { - Some(qe) => qe, - None => return fail_from_last(&mut substeps, start, 1), - }; + query: query.to_string(), + }), + embed_start.elapsed().as_millis() as u64, + )); - // 5. Execute search + // 5. Execute search — calls backend.search() directly with quote validation let cfg = config.as_ref().unwrap(); let backend = Backend::parquet(path); let (prefix, aliases) = match &cfg.search { Some(sc) => (sc.internal_prefix.as_str(), &sc.aliases), None => ("", &std::collections::HashMap::new()), }; - let (execute_result, hits) = run_execute_search( - &backend, - query_embedding, - where_clause, - limit, - prefix, - aliases, - ) - .await; - substeps.push(from_pipeline_result(execute_result, |o| { - Outcome::ExecuteSearch(ExecuteSearchOutcome { hits: o.hits }) - })); - - // Build result with chunk text populated (always — full outcome carries all data) - let hits = match hits { - Some(mut hits) => { - for hit in &mut hits { - if let (Some(s), Some(e)) = (hit.start_line, hit.end_line) { - match read_lines(&path.join(&hit.filename), s, e) { - Some(text) => hit.chunk_text = Some(text), - None => warn!( - file = %hit.filename, - "could not read chunk text (file may have changed since build)" - ), - } - } - } + + if let Some(w) = where_clause { + if let Err(msg) = validate_where_clause(w) { + substeps.push(Step::failed(ErrorKind::User, msg, 0)); + return fail_from_last(&mut substeps, start); + } + } + + let search_start = Instant::now(); + let hits = match backend + .search(query_embedding, where_clause, limit, prefix, aliases) + .await + { + Ok(hits) => { + substeps.push(Step::leaf( + Outcome::ExecuteSearch(ExecuteSearchOutcome { hits: hits.len() }), + search_start.elapsed().as_millis() as u64, + )); hits } - None => return fail_from_last(&mut substeps, start, 0), + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + search_start.elapsed().as_millis() as u64, + )); + return fail_from_last(&mut substeps, start); + } }; + // Populate chunk text for each hit + let mut hits = hits; + for hit in &mut hits { + if let (Some(s), Some(e)) = (hit.start_line, hit.end_line) { + match read_lines(&path.join(&hit.filename), s, e) { + Some(text) => hit.chunk_text = Some(text), + None => warn!( + file = %hit.filename, + "could not read chunk text (file may have changed since build)" + ), + } + } + } + let model_name = emb_config.name.clone(); Step { substeps, @@ -231,11 +341,7 @@ pub async fn run( } } -fn fail_from_last( - substeps: &mut Vec>, - start: Instant, - skipped: usize, -) -> Step { +fn fail_from_last(substeps: &mut Vec>, start: Instant) -> Step { let msg = match substeps.iter().rev().find_map(|s| match &s.outcome { StepOutcome::Complete { result: Err(e), .. } => Some(e.message.clone()), _ => None, @@ -243,12 +349,6 @@ fn fail_from_last( Some(m) => m, None => "step failed".into(), }; - for _ in 0..skipped { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); - } Step { substeps: std::mem::take(substeps), outcome: StepOutcome::Complete { @@ -266,14 +366,7 @@ fn fail_msg( start: Instant, kind: ErrorKind, msg: &str, - skipped: usize, ) -> Step { - for _ in 0..skipped { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); - } Step { substeps: std::mem::take(substeps), outcome: StepOutcome::Complete { @@ -364,14 +457,14 @@ mod tests { let step = crate::cmd::init::run(dir, "**", false, false, true, false, false); assert!(!crate::step::has_failed(&step)); let output = crate::cmd::build::run(dir, None, None, None, false, true, false).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); } #[tokio::test] async fn missing_config() { let tmp = tempfile::tempdir().unwrap(); let output = run(tmp.path(), "test query", 10, None, true, true, false).await; - assert!(output.has_failed_step()); + assert!(crate::step::has_failed(&output)); } #[tokio::test] @@ -380,7 +473,7 @@ mod tests { write_config(tmp.path(), "test-model"); let output = run(tmp.path(), "test query", 10, None, true, true, false).await; - assert!(output.has_failed_step()); + assert!(crate::step::has_failed(&output)); let err = unwrap_error(&output); assert!(err.message.contains("index not found")); } @@ -392,7 +485,11 @@ mod tests { init_and_build(tmp.path()).await; let output = run(tmp.path(), "rust programming", 10, None, true, true, false).await; - assert!(!output.has_failed_step(), "search failed: {:?}", output); + assert!( + !crate::step::has_failed(&output), + "search failed: {:?}", + output + ); let result = unwrap_search(&output); assert_eq!(result.query, "rust programming"); assert!(!result.model_name.is_empty()); @@ -410,7 +507,7 @@ mod tests { init_and_build(tmp.path()).await; let output = run(tmp.path(), "rust programming", 10, None, true, true, true).await; - assert!(!output.has_failed_step()); + assert!(!crate::step::has_failed(&output)); let result = unwrap_search(&output); assert!(!result.hits.is_empty()); assert!(result.hits[0].chunk_text.is_some()); @@ -482,7 +579,7 @@ mod tests { config.write(&tmp.path().join("mdvs.toml")).unwrap(); let output = run(tmp.path(), "test query", 10, None, true, true, false).await; - assert!(output.has_failed_step()); + assert!(crate::step::has_failed(&output)); let err = unwrap_error(&output); assert!(err.message.contains("model mismatch")); } @@ -503,7 +600,7 @@ mod tests { false, ) .await; - assert!(output.has_failed_step()); + assert!(crate::step::has_failed(&output)); let err = unwrap_error(&output); assert!(err.message.contains("unmatched single quote")); } @@ -515,7 +612,7 @@ mod tests { init_and_build(tmp.path()).await; let output = run(tmp.path(), "test", 10, Some("x = \"bad"), true, true, false).await; - assert!(output.has_failed_step()); + assert!(crate::step::has_failed(&output)); let err = unwrap_error(&output); assert!(err.message.contains("unmatched double quote")); } @@ -536,7 +633,7 @@ mod tests { false, ) .await; - assert!(output.has_failed_step()); + assert!(crate::step::has_failed(&output)); } #[tokio::test] diff --git a/src/cmd/update.rs b/src/cmd/update.rs index bf7863d..dad2518 100644 --- a/src/cmd/update.rs +++ b/src/cmd/update.rs @@ -1,9 +1,11 @@ +use crate::discover::infer::InferredSchema; +use crate::discover::scan::ScannedFiles; use crate::outcome::commands::UpdateOutcome; use crate::outcome::{InferOutcome, Outcome, ReadConfigOutcome, ScanOutcome, WriteConfigOutcome}; use crate::output::{ChangedField, FieldChange, RemovedField}; -use crate::schema::config::TomlField; +use crate::schema::config::{MdvsToml, TomlField}; use crate::schema::shared::FieldTypeSerde; -use crate::step::{from_pipeline_result, ErrorKind, Step, StepError, StepOutcome}; +use crate::step::{ErrorKind, Step, StepError, StepOutcome}; use std::collections::HashMap; use std::path::Path; use std::time::Instant; @@ -19,10 +21,6 @@ pub async fn run( dry_run: bool, _verbose: bool, ) -> Step { - use crate::pipeline::infer::run_infer; - use crate::pipeline::read_config::run_read_config; - use crate::pipeline::scan::run_scan; - let start = Instant::now(); let mut substeps = Vec::new(); @@ -33,21 +31,40 @@ pub async fn run( start, ErrorKind::User, "cannot use --reinfer and --reinfer-all together".into(), - 4, ); } - // 1. Read config - let (read_config_result, config) = run_read_config(path); - substeps.push(from_pipeline_result(read_config_result, |o| { - Outcome::ReadConfig(ReadConfigOutcome { - config_path: o.config_path.clone(), - }) - })); - - let mut config = match config { - Some(c) => c, - None => return fail_from_last_substep(&mut substeps, start, 3), + // 1. Read config — MdvsToml::read() + validate() directly + let config_start = Instant::now(); + let config_path_buf = path.join("mdvs.toml"); + let mut config = match MdvsToml::read(&config_path_buf) { + Ok(cfg) => match cfg.validate() { + Ok(()) => { + substeps.push(Step::leaf( + Outcome::ReadConfig(ReadConfigOutcome { + config_path: config_path_buf.display().to_string(), + }), + config_start.elapsed().as_millis() as u64, + )); + cfg + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::User, + format!("mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'"), + config_start.elapsed().as_millis() as u64, + )); + return fail_from_last_substep(&mut substeps, start); + } + }, + Err(e) => { + substeps.push(Step::failed( + ErrorKind::User, + e.to_string(), + config_start.elapsed().as_millis() as u64, + )); + return fail_from_last_substep(&mut substeps, start); + } }; // Pre-check: reinfer field names exist @@ -58,37 +75,42 @@ pub async fn run( start, ErrorKind::User, format!("field '{name}' is not in mdvs.toml"), - 3, ); } } - // 2. Scan - let (scan_result, scanned) = run_scan(path, &config.scan); - substeps.push(from_pipeline_result(scan_result, |o| { - Outcome::Scan(ScanOutcome { - files_found: o.files_found, - glob: o.glob.clone(), - }) - })); - - let scanned = match scanned { - Some(s) => s, - None => return fail_from_last_substep(&mut substeps, start, 2), + // 2. Scan — ScannedFiles::scan() directly + let scan_start = Instant::now(); + let scanned = match ScannedFiles::scan(path, &config.scan) { + Ok(s) => { + substeps.push(Step::leaf( + Outcome::Scan(ScanOutcome { + files_found: s.files.len(), + glob: config.scan.glob.clone(), + }), + scan_start.elapsed().as_millis() as u64, + )); + s + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + scan_start.elapsed().as_millis() as u64, + )); + return fail_from_last_substep(&mut substeps, start); + } }; - // 3. Infer - let (infer_result, schema) = run_infer(&scanned); - substeps.push(from_pipeline_result(infer_result, |o| { + // 3. Infer — InferredSchema::infer() is infallible + let infer_start = Instant::now(); + let schema = InferredSchema::infer(&scanned); + substeps.push(Step::leaf( Outcome::Infer(InferOutcome { - fields_inferred: o.fields_inferred, - }) - })); - - let schema = match schema { - Some(s) => s, - None => return fail_from_last_substep(&mut substeps, start, 1), - }; + fields_inferred: schema.fields.len(), + }), + infer_start.elapsed().as_millis() as u64, + )); let total_files = scanned.files.len(); @@ -194,53 +216,40 @@ pub async fn run( // 4. Write config (Skipped if dry_run or no changes) if dry_run || !has_changes { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); + substeps.push(Step::skipped()); } else { let write_start = Instant::now(); - let config_path = path.join("mdvs.toml"); + let write_path = path.join("mdvs.toml"); config.fields.field = new_fields; - if let Err(e) = config.write(&config_path) { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: e.to_string(), + match config.write(&write_path) { + Ok(()) => { + substeps.push(Step::leaf( + Outcome::WriteConfig(WriteConfigOutcome { + config_path: write_path.display().to_string(), + fields_written: config.fields.field.len(), }), - elapsed_ms: write_start.elapsed().as_millis() as u64, - }, - }); - return Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: "failed to write config".into(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, - }; + write_start.elapsed().as_millis() as u64, + )); + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + write_start.elapsed().as_millis() as u64, + )); + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: "failed to write config".into(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + }; + } } - - substeps.push(from_pipeline_result( - crate::pipeline::ProcessingStepResult::Completed(crate::pipeline::ProcessingStep { - elapsed_ms: write_start.elapsed().as_millis() as u64, - output: crate::pipeline::write_config::WriteConfigOutput { - config_path: config_path.display().to_string(), - fields_written: config.fields.field.len(), - }, - }), - |o| { - Outcome::WriteConfig(WriteConfigOutcome { - config_path: o.config_path.clone(), - fields_written: o.fields_written, - }) - }, - )); } Step { @@ -260,18 +269,11 @@ pub async fn run( } fn fail_early( - mut substeps: Vec>, + substeps: Vec>, start: Instant, kind: ErrorKind, message: String, - skipped_count: usize, ) -> Step { - for _ in 0..skipped_count { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); - } Step { substeps, outcome: StepOutcome::Complete { @@ -281,21 +283,11 @@ fn fail_early( } } -fn fail_from_last_substep( - substeps: &mut Vec>, - start: Instant, - skipped_count: usize, -) -> Step { +fn fail_from_last_substep(substeps: &mut Vec>, start: Instant) -> Step { let msg = match substeps.last().map(|s| &s.outcome) { Some(StepOutcome::Complete { result: Err(e), .. }) => e.message.clone(), _ => "step failed".into(), }; - for _ in 0..skipped_count { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }); - } Step { substeps: std::mem::take(substeps), outcome: StepOutcome::Complete { diff --git a/src/step.rs b/src/step.rs index 7ab014f..fcc12d2 100644 --- a/src/step.rs +++ b/src/step.rs @@ -136,12 +136,6 @@ impl Step { outcome: StepOutcome::Skipped, } } - - /// Compatibility method during migration — delegates to `has_failed()` free function. - /// TODO: remove after all commands are converted (TODO-0131 wave 7). - pub fn has_failed_step(&self) -> bool { - has_failed(self) - } } // --- Free functions --- @@ -164,103 +158,6 @@ pub fn has_violations(step: &Step) -> bool { } } -// --- Pipeline migration helper (temporary, deleted in TODO-0131) --- - -/// Convert an old `ProcessingStepResult` into a new `Step`. -/// -/// This is migration glue: during the transition, commands still call old -/// `run_*()` pipeline functions that return `ProcessingStepResult`. This -/// generic helper converts the result into a `Step` leaf node, -/// using the provided closure to map the output to an `Outcome` variant. -/// -/// Deleted when the old pipeline modules are removed (TODO-0131). -pub fn from_pipeline_result( - result: crate::pipeline::ProcessingStepResult, - to_outcome: F, -) -> Step -where - F: FnOnce(&T) -> Outcome, -{ - match result { - crate::pipeline::ProcessingStepResult::Completed(step) => Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Ok(to_outcome(&step.output)), - elapsed_ms: step.elapsed_ms, - }, - }, - crate::pipeline::ProcessingStepResult::Failed(err) => Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: convert_error_kind(err.kind), - message: err.message, - }), - elapsed_ms: 0, - }, - }, - crate::pipeline::ProcessingStepResult::Skipped => Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }, - } -} - -/// Like [`from_pipeline_result`], but also returns the raw output data -/// (needed when subsequent steps consume data from a prior step). -pub fn from_pipeline_result_with_data( - result: crate::pipeline::ProcessingStepResult, - to_outcome: F, -) -> (Step, Option) -where - F: FnOnce(&T) -> Outcome, -{ - match result { - crate::pipeline::ProcessingStepResult::Completed(step) => { - let outcome = to_outcome(&step.output); - ( - Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Ok(outcome), - elapsed_ms: step.elapsed_ms, - }, - }, - Some(step.output), - ) - } - crate::pipeline::ProcessingStepResult::Failed(err) => ( - Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: convert_error_kind(err.kind), - message: err.message, - }), - elapsed_ms: 0, - }, - }, - None, - ), - crate::pipeline::ProcessingStepResult::Skipped => ( - Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }, - None, - ), - } -} - -/// Convert old pipeline ErrorKind to new step ErrorKind. -/// Temporary migration helper (deleted in TODO-0131). -pub fn convert_error_kind(kind: crate::pipeline::ErrorKind) -> ErrorKind { - match kind { - crate::pipeline::ErrorKind::User => ErrorKind::User, - crate::pipeline::ErrorKind::Application => ErrorKind::Application, - } -} - // --- Serialize impls (hand-written, not derived) --- impl Serialize for Step { From 96f0974268187767ace4327ccdef2592952c6625 Mon Sep 17 00:00:00 2001 From: edoch Date: Fri, 20 Mar 2026 19:00:37 +0100 Subject: [PATCH 08/35] refactor: delete src/pipeline/ directory MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit All pipeline wrapper modules removed — no command imports from pipeline anymore. Deletes 12 files (1,318 lines): mod.rs, classify.rs, embed.rs, execute_search.rs, infer.rs, load_model.rs, read_config.rs, read_index.rs, scan.rs, validate.rs, write_config.rs, write_index.rs. Co-Authored-By: Claude --- src/lib.rs | 2 - src/pipeline/classify.rs | 378 --------------------------------- src/pipeline/embed.rs | 142 ------------- src/pipeline/execute_search.rs | 88 -------- src/pipeline/infer.rs | 43 ---- src/pipeline/load_model.rs | 65 ------ src/pipeline/mod.rs | 181 ---------------- src/pipeline/read_config.rs | 60 ------ src/pipeline/read_index.rs | 110 ---------- src/pipeline/scan.rs | 57 ----- src/pipeline/validate.rs | 74 ------- src/pipeline/write_config.rs | 60 ------ src/pipeline/write_index.rs | 60 ------ 13 files changed, 1320 deletions(-) delete mode 100644 src/pipeline/classify.rs delete mode 100644 src/pipeline/embed.rs delete mode 100644 src/pipeline/execute_search.rs delete mode 100644 src/pipeline/infer.rs delete mode 100644 src/pipeline/load_model.rs delete mode 100644 src/pipeline/mod.rs delete mode 100644 src/pipeline/read_config.rs delete mode 100644 src/pipeline/read_index.rs delete mode 100644 src/pipeline/scan.rs delete mode 100644 src/pipeline/validate.rs delete mode 100644 src/pipeline/write_config.rs delete mode 100644 src/pipeline/write_index.rs diff --git a/src/lib.rs b/src/lib.rs index 706ccf4..2acf9b9 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -17,8 +17,6 @@ pub mod index; pub mod outcome; /// Output formatting types and the `CommandOutput` trait. pub mod output; -/// Core pipeline abstractions for structured command output. -pub mod pipeline; /// Shared formatters (`format_text`, `format_markdown`) that consume `Vec`. pub mod render; /// Configuration file types (`mdvs.toml`) and shared data structures. diff --git a/src/pipeline/classify.rs b/src/pipeline/classify.rs deleted file mode 100644 index 052bf18..0000000 --- a/src/pipeline/classify.rs +++ /dev/null @@ -1,378 +0,0 @@ -//! Classify step — classifies files as new/edited/unchanged/removed for incremental builds. - -use serde::Serialize; -use std::collections::{HashMap, HashSet}; -use std::time::Instant; - -use crate::discover::scan::{ScannedFile, ScannedFiles}; -use crate::index::storage::{content_hash, ChunkRow, FileIndexEntry}; -use crate::output::format_file_count; -use crate::pipeline::{ProcessingStep, ProcessingStepResult, StepOutput}; - -use crate::output::BuildFileDetail; - -/// Output record for the classify step. -#[derive(Debug, Serialize)] -pub struct ClassifyOutput { - /// Whether this is a full rebuild (force or first build). - pub full_rebuild: bool, - /// Number of files that need embedding (new + edited). - pub needs_embedding: usize, - /// Number of files unchanged from previous build. - pub unchanged: usize, - /// Number of files removed since previous build. - pub removed: usize, -} - -impl StepOutput for ClassifyOutput { - fn format_line(&self) -> String { - if self.full_rebuild { - format!("{} (full rebuild)", format_file_count(self.needs_embedding)) - } else { - format!( - "{} ({} to embed, {} unchanged, {} removed)", - format_file_count(self.needs_embedding + self.unchanged), - self.needs_embedding, - self.unchanged, - self.removed - ) - } - } -} - -/// A file that needs chunking and embedding. -pub(crate) struct FileToEmbed<'a> { - /// Unique file identifier (preserved for edited files, new UUID for new files). - pub file_id: String, - /// Reference to the scanned file data. - pub scanned: &'a ScannedFile, -} - -/// Data produced by classification, carried forward to embed and write_index steps. -pub(crate) struct ClassifyData<'a> { - /// Whether this is a full rebuild. - pub full_rebuild: bool, - /// Files that need chunking + embedding (new or edited). - pub needs_embedding: Vec>, - /// Maps filename → file_id for ALL current files (new, edited, unchanged). - pub file_id_map: HashMap, - /// Chunks retained from unchanged files. - pub retained_chunks: Vec, - /// Number of files removed since previous build. - pub removed_count: usize, - /// Number of chunks dropped from removed files. - pub chunks_removed: usize, - /// Per-file chunk counts for removed files (for verbose output). - pub removed_details: Vec, -} - -/// Classify files for incremental build. -/// -/// For full rebuilds (`force` or no existing index), all scanned files go to -/// `needs_embedding`. For incremental builds, files are compared against the -/// existing index by content hash. -/// -/// Returns the step result and the classification data for subsequent steps. -pub(crate) fn run_classify<'a>( - scanned: &'a ScannedFiles, - existing_index: &[FileIndexEntry], - existing_chunks: Vec, - full_rebuild: bool, -) -> ( - ProcessingStepResult, - Option>, -) { - let start = Instant::now(); - - if full_rebuild { - // Full rebuild: all files need embedding, no retained chunks - let mut file_id_map = HashMap::new(); - let needs_embedding: Vec> = scanned - .files - .iter() - .map(|f| { - let file_id = uuid::Uuid::new_v4().to_string(); - let filename = f.path.display().to_string(); - file_id_map.insert(filename, file_id.clone()); - FileToEmbed { - file_id, - scanned: f, - } - }) - .collect(); - - let count = needs_embedding.len(); - let step = ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: ClassifyOutput { - full_rebuild: true, - needs_embedding: count, - unchanged: 0, - removed: 0, - }, - }; - let data = ClassifyData { - full_rebuild: true, - needs_embedding, - file_id_map, - retained_chunks: vec![], - removed_count: 0, - chunks_removed: 0, - removed_details: vec![], - }; - (ProcessingStepResult::Completed(step), Some(data)) - } else { - // Incremental: classify by comparing content hashes - let classification = classify_files(scanned, existing_index); - - // Count removed chunks and build removed file details - let mut removed_chunk_counts: HashMap<&str, usize> = HashMap::new(); - for c in &existing_chunks { - if classification.removed_file_ids.contains(&c.file_id) { - *removed_chunk_counts.entry(c.file_id.as_str()).or_default() += 1; - } - } - let chunks_removed: usize = removed_chunk_counts.values().sum(); - - // Build removed file details (map file_id back to filename) - let filename_to_id: HashMap<&str, &str> = existing_index - .iter() - .map(|e| (e.filename.as_str(), e.file_id.as_str())) - .collect(); - let mut removed_details = Vec::new(); - for filename in &classification.removed_filenames { - let file_id = filename_to_id.get(filename.as_str()).copied().unwrap_or(""); - let chunk_count = removed_chunk_counts.get(file_id).copied().unwrap_or(0); - removed_details.push(BuildFileDetail { - filename: filename.clone(), - chunks: chunk_count, - }); - } - - // Retain chunks from unchanged files - let retained_chunks: Vec = existing_chunks - .into_iter() - .filter(|c| classification.unchanged_file_ids.contains(&c.file_id)) - .collect(); - - let needs_count = classification.needs_embedding.len(); - let unchanged_count = classification.unchanged_file_ids.len(); - let removed_count = classification.removed_count; - - let step = ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: ClassifyOutput { - full_rebuild: false, - needs_embedding: needs_count, - unchanged: unchanged_count, - removed: removed_count, - }, - }; - let data = ClassifyData { - full_rebuild: false, - needs_embedding: classification.needs_embedding, - file_id_map: classification.file_id_map, - retained_chunks, - removed_count, - chunks_removed, - removed_details, - }; - (ProcessingStepResult::Completed(step), Some(data)) - } -} - -struct FileClassification<'a> { - /// Files that need chunking + embedding (new or edited). - needs_embedding: Vec>, - /// Maps filename → file_id for ALL current files (new, edited, unchanged). - file_id_map: HashMap, - /// file_ids whose existing chunks should be retained. - unchanged_file_ids: HashSet, - /// Number of files in the old index that no longer exist. - removed_count: usize, - /// file_ids of removed files (for chunk counting). - removed_file_ids: HashSet, - /// Filenames of removed files (for verbose output). - removed_filenames: Vec, -} - -fn classify_files<'a>( - scanned: &'a ScannedFiles, - existing_index: &[FileIndexEntry], -) -> FileClassification<'a> { - let existing: HashMap<&str, (&str, &str)> = existing_index - .iter() - .map(|e| { - ( - e.filename.as_str(), - (e.file_id.as_str(), e.content_hash.as_str()), - ) - }) - .collect(); - - let mut needs_embedding = Vec::new(); - let mut file_id_map = HashMap::new(); - let mut unchanged_file_ids = HashSet::new(); - let mut seen_filenames = HashSet::new(); - - for file in &scanned.files { - let filename = file.path.display().to_string(); - let hash = content_hash(&file.content); - - if let Some(&(old_id, old_hash)) = existing.get(filename.as_str()) { - seen_filenames.insert(filename.clone()); - if hash == old_hash { - // Unchanged — keep existing chunks - file_id_map.insert(filename, old_id.to_string()); - unchanged_file_ids.insert(old_id.to_string()); - } else { - // Edited — re-embed, keep file_id - let file_id = old_id.to_string(); - file_id_map.insert(filename, file_id.clone()); - needs_embedding.push(FileToEmbed { - file_id, - scanned: file, - }); - } - } else { - // New file - let file_id = uuid::Uuid::new_v4().to_string(); - file_id_map.insert(filename, file_id.clone()); - needs_embedding.push(FileToEmbed { - file_id, - scanned: file, - }); - } - } - - let mut removed_file_ids = HashSet::new(); - let mut removed_filenames = Vec::new(); - for entry in existing_index { - if !seen_filenames.contains(entry.filename.as_str()) { - removed_file_ids.insert(entry.file_id.clone()); - removed_filenames.push(entry.filename.clone()); - } - } - let removed_count = removed_filenames.len(); - - FileClassification { - needs_embedding, - file_id_map, - unchanged_file_ids, - removed_count, - removed_file_ids, - removed_filenames, - } -} - -#[cfg(test)] -mod tests { - use super::*; - use crate::discover::scan::ScannedFile; - - fn make_scanned_files(files: Vec<(&str, &str)>) -> ScannedFiles { - ScannedFiles { - files: files - .into_iter() - .map(|(path, body)| ScannedFile { - path: std::path::PathBuf::from(path), - data: None, - content: body.to_string(), - }) - .collect(), - } - } - - #[test] - fn classify_all_new() { - let scanned = make_scanned_files(vec![("a.md", "hello"), ("b.md", "world")]); - let existing: Vec = vec![]; - let c = classify_files(&scanned, &existing); - - assert_eq!(c.needs_embedding.len(), 2); - assert_eq!(c.unchanged_file_ids.len(), 0); - assert_eq!(c.removed_count, 0); - assert_eq!(c.file_id_map.len(), 2); - } - - #[test] - fn classify_all_unchanged() { - let scanned = make_scanned_files(vec![("a.md", "hello"), ("b.md", "world")]); - let existing = vec![ - FileIndexEntry { - file_id: "f1".into(), - filename: "a.md".into(), - content_hash: content_hash("hello"), - }, - FileIndexEntry { - file_id: "f2".into(), - filename: "b.md".into(), - content_hash: content_hash("world"), - }, - ]; - let c = classify_files(&scanned, &existing); - - assert_eq!(c.needs_embedding.len(), 0); - assert_eq!(c.unchanged_file_ids.len(), 2); - assert!(c.unchanged_file_ids.contains("f1")); - assert!(c.unchanged_file_ids.contains("f2")); - assert_eq!(c.removed_count, 0); - assert_eq!(c.file_id_map["a.md"], "f1"); - assert_eq!(c.file_id_map["b.md"], "f2"); - } - - #[test] - fn classify_mixed() { - // a.md: unchanged, b.md: edited, c.md: new, d.md: removed - let scanned = make_scanned_files(vec![ - ("a.md", "same content"), - ("b.md", "new body"), - ("c.md", "brand new"), - ]); - let existing = vec![ - FileIndexEntry { - file_id: "f1".into(), - filename: "a.md".into(), - content_hash: content_hash("same content"), - }, - FileIndexEntry { - file_id: "f2".into(), - filename: "b.md".into(), - content_hash: content_hash("old body"), - }, - FileIndexEntry { - file_id: "f3".into(), - filename: "d.md".into(), - content_hash: content_hash("deleted"), - }, - ]; - let c = classify_files(&scanned, &existing); - - // a.md unchanged - assert!(c.unchanged_file_ids.contains("f1")); - assert_eq!(c.file_id_map["a.md"], "f1"); - - // b.md edited — needs embedding, keeps file_id - assert_eq!(c.needs_embedding.len(), 2); // b.md + c.md - let edited = c - .needs_embedding - .iter() - .find(|f| f.scanned.path.to_str() == Some("b.md")) - .unwrap(); - assert_eq!(edited.file_id, "f2"); - - // c.md new — needs embedding, new UUID - let new = c - .needs_embedding - .iter() - .find(|f| f.scanned.path.to_str() == Some("c.md")) - .unwrap(); - assert_ne!(new.file_id, "f1"); - assert_ne!(new.file_id, "f2"); - assert_ne!(new.file_id, "f3"); - - // d.md removed - assert_eq!(c.removed_count, 1); - assert!(!c.file_id_map.contains_key("d.md")); - } -} diff --git a/src/pipeline/embed.rs b/src/pipeline/embed.rs deleted file mode 100644 index d38a7fe..0000000 --- a/src/pipeline/embed.rs +++ /dev/null @@ -1,142 +0,0 @@ -//! Embed step — embeds queries and files using the loaded model. - -use serde::Serialize; -use std::time::Instant; - -use crate::discover::scan::ScannedFile; -use crate::index::chunk::{extract_plain_text, Chunks}; -use crate::index::embed::Embedder; -use crate::index::storage::ChunkRow; -use crate::output::format_file_count; -use crate::output::BuildFileDetail; -use crate::pipeline::classify::FileToEmbed; -use crate::pipeline::{ProcessingStep, ProcessingStepResult, StepOutput}; - -/// Output record for the embed query step. -#[derive(Debug, Serialize)] -pub struct EmbedQueryOutput { - /// The query that was embedded. - pub query: String, -} - -impl StepOutput for EmbedQueryOutput { - fn format_line(&self) -> String { - format!("\"{}\"", self.query) - } -} - -/// Embed a query string using the loaded model. -/// -/// Returns the step result and the query embedding vector. -pub async fn run_embed_query( - embedder: &Embedder, - query: &str, -) -> (ProcessingStepResult, Option>) { - let start = Instant::now(); - let embedding = embedder.embed(query).await; - let step = ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: EmbedQueryOutput { - query: query.to_string(), - }, - }; - (ProcessingStepResult::Completed(step), Some(embedding)) -} - -/// Output record for the embed files step. -#[derive(Debug, Serialize)] -pub struct EmbedFilesOutput { - /// Number of files that were embedded. - pub files_embedded: usize, - /// Total number of chunks produced. - pub chunks_produced: usize, -} - -impl StepOutput for EmbedFilesOutput { - fn format_line(&self) -> String { - format!( - "{} ({} chunks)", - format_file_count(self.files_embedded), - self.chunks_produced - ) - } -} - -/// Data produced by the embed files step. -pub(crate) struct EmbedFilesData { - /// Chunk rows for newly embedded files. - pub chunk_rows: Vec, - /// Per-file chunk counts (for verbose output). - pub details: Vec, -} - -/// Embed a batch of files: chunk, extract plain text, embed, produce rows. -/// -/// Returns the step result and the embed data for the write_index step. -pub(crate) async fn run_embed_files( - files: &[FileToEmbed<'_>], - embedder: &Embedder, - max_chunk_size: usize, -) -> ( - ProcessingStepResult, - Option, -) { - let start = Instant::now(); - let mut chunk_rows = Vec::new(); - let mut details = Vec::new(); - - for fte in files { - let crs = embed_file(&fte.file_id, fte.scanned, max_chunk_size, embedder).await; - details.push(BuildFileDetail { - filename: fte.scanned.path.display().to_string(), - chunks: crs.len(), - }); - chunk_rows.extend(crs); - } - - let step = ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: EmbedFilesOutput { - files_embedded: files.len(), - chunks_produced: chunk_rows.len(), - }, - }; - let data = EmbedFilesData { - chunk_rows, - details, - }; - (ProcessingStepResult::Completed(step), Some(data)) -} - -/// Chunk, extract plain text, embed, and produce chunk rows for a single file. -async fn embed_file( - file_id: &str, - file: &ScannedFile, - max_chunk_size: usize, - embedder: &Embedder, -) -> Vec { - let chunks = Chunks::new(&file.content, max_chunk_size); - let plain_texts: Vec = chunks - .iter() - .map(|c| extract_plain_text(&c.plain_text)) - .collect(); - let text_refs: Vec<&str> = plain_texts.iter().map(|s| s.as_str()).collect(); - let embeddings = if text_refs.is_empty() { - vec![] - } else { - embedder.embed_batch(&text_refs).await - }; - - chunks - .iter() - .zip(embeddings) - .map(|(chunk, embedding)| ChunkRow { - chunk_id: uuid::Uuid::new_v4().to_string(), - file_id: file_id.to_string(), - chunk_index: chunk.chunk_index as i32, - start_line: chunk.start_line as i32, - end_line: chunk.end_line as i32, - embedding, - }) - .collect() -} diff --git a/src/pipeline/execute_search.rs b/src/pipeline/execute_search.rs deleted file mode 100644 index 1ba76c2..0000000 --- a/src/pipeline/execute_search.rs +++ /dev/null @@ -1,88 +0,0 @@ -//! Execute search step — runs the search query via backend. - -use serde::Serialize; -use std::time::Instant; - -use crate::index::backend::{Backend, SearchHit}; -use crate::pipeline::{ - ErrorKind, ProcessingStep, ProcessingStepError, ProcessingStepResult, StepOutput, -}; - -/// Output record for the execute search step. -#[derive(Debug, Serialize)] -pub struct ExecuteSearchOutput { - /// Number of search hits returned. - pub hits: usize, -} - -impl StepOutput for ExecuteSearchOutput { - fn format_line(&self) -> String { - let word = if self.hits == 1 { "hit" } else { "hits" }; - format!("{} {word}", self.hits) - } -} - -/// Execute a search query against the index. -/// -/// Validates the `--where` clause (quote parity) before running the SQL. -/// Returns the step result and the search hits. -pub(crate) async fn run_execute_search( - backend: &Backend, - query_embedding: Vec, - where_clause: Option<&str>, - limit: usize, - internal_prefix: &str, - aliases: &std::collections::HashMap, -) -> ( - ProcessingStepResult, - Option>, -) { - // Validate --where clause: unmatched quotes indicate unescaped special characters - if let Some(w) = where_clause { - if w.chars().filter(|&c| c == '\'').count() % 2 != 0 { - let err = ProcessingStepError { - kind: ErrorKind::User, - message: - "unmatched single quote in --where clause — escape with '' (e.g. O''Brien)" - .to_string(), - }; - return (ProcessingStepResult::Failed(err), None); - } - if w.chars().filter(|&c| c == '"').count() % 2 != 0 { - let err = ProcessingStepError { - kind: ErrorKind::User, - message: - "unmatched double quote in --where clause — escape with \"\" (e.g. \"\"field\"\")" - .to_string(), - }; - return (ProcessingStepResult::Failed(err), None); - } - } - - let start = Instant::now(); - match backend - .search( - query_embedding, - where_clause, - limit, - internal_prefix, - aliases, - ) - .await - { - Ok(hits) => { - let step = ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: ExecuteSearchOutput { hits: hits.len() }, - }; - (ProcessingStepResult::Completed(step), Some(hits)) - } - Err(e) => { - let err = ProcessingStepError { - kind: ErrorKind::Application, - message: e.to_string(), - }; - (ProcessingStepResult::Failed(err), None) - } - } -} diff --git a/src/pipeline/infer.rs b/src/pipeline/infer.rs deleted file mode 100644 index 18ebf20..0000000 --- a/src/pipeline/infer.rs +++ /dev/null @@ -1,43 +0,0 @@ -//! Infer step — infers field types and glob patterns from scanned files. - -use serde::Serialize; -use std::time::Instant; - -use crate::discover::infer::InferredSchema; -use crate::discover::scan::ScannedFiles; -use crate::pipeline::{ProcessingStep, ProcessingStepResult, StepOutput}; - -/// Output record for the infer step. -#[derive(Debug, Serialize)] -pub struct InferOutput { - /// Number of fields inferred from frontmatter. - pub fields_inferred: usize, -} - -impl StepOutput for InferOutput { - fn format_line(&self) -> String { - if self.fields_inferred == 0 { - "no fields found".to_string() - } else { - format!("{} field(s)", self.fields_inferred) - } - } -} - -/// Infer field types and glob patterns from scanned files. -/// -/// Inference is infallible (pure computation), so always returns Completed. -/// Returns the step result and the inferred schema for subsequent steps. -pub fn run_infer( - scanned: &ScannedFiles, -) -> (ProcessingStepResult, Option) { - let start = Instant::now(); - let schema = InferredSchema::infer(scanned); - let step = ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: InferOutput { - fields_inferred: schema.fields.len(), - }, - }; - (ProcessingStepResult::Completed(step), Some(schema)) -} diff --git a/src/pipeline/load_model.rs b/src/pipeline/load_model.rs deleted file mode 100644 index 7bf962f..0000000 --- a/src/pipeline/load_model.rs +++ /dev/null @@ -1,65 +0,0 @@ -//! Load model step — loads the embedding model. - -use serde::Serialize; -use std::time::Instant; - -use crate::index::embed::{Embedder, ModelConfig}; -use crate::pipeline::{ - ErrorKind, ProcessingStep, ProcessingStepError, ProcessingStepResult, StepOutput, -}; -use crate::schema::shared::EmbeddingModelConfig; - -/// Output record for the load model step. -#[derive(Debug, Serialize)] -pub struct LoadModelOutput { - /// Name of the loaded model. - pub model_name: String, - /// Embedding dimension. - pub dimension: usize, -} - -impl StepOutput for LoadModelOutput { - fn format_line(&self) -> String { - format!("\"{}\" ({}d)", self.model_name, self.dimension) - } -} - -/// Load the embedding model from config. -/// -/// Returns the step result and the loaded embedder (for subsequent steps). -pub fn run_load_model( - embedding: &EmbeddingModelConfig, -) -> (ProcessingStepResult, Option) { - let start = Instant::now(); - - let model_config = match ModelConfig::try_from(embedding) { - Ok(mc) => mc, - Err(e) => { - let err = ProcessingStepError { - kind: ErrorKind::Application, - message: e.to_string(), - }; - return (ProcessingStepResult::Failed(err), None); - } - }; - - match Embedder::load(&model_config) { - Ok(embedder) => { - let step = ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: LoadModelOutput { - model_name: embedding.name.clone(), - dimension: embedder.dimension(), - }, - }; - (ProcessingStepResult::Completed(step), Some(embedder)) - } - Err(e) => { - let err = ProcessingStepError { - kind: ErrorKind::Application, - message: e.to_string(), - }; - (ProcessingStepResult::Failed(err), None) - } - } -} diff --git a/src/pipeline/mod.rs b/src/pipeline/mod.rs deleted file mode 100644 index 97e1ae0..0000000 --- a/src/pipeline/mod.rs +++ /dev/null @@ -1,181 +0,0 @@ -//! Core pipeline abstractions for structured command output. -//! -//! Every command is a flat sequence of processing steps, each producing a typed -//! result. These types are the building blocks that step modules and commands -//! compose. - -pub mod classify; -pub mod embed; -pub mod execute_search; -pub mod infer; -pub mod load_model; -pub mod read_config; -pub mod read_index; -pub mod scan; -pub mod validate; -pub mod write_config; -pub mod write_index; - -use serde::Serialize; - -/// Trait for step output types — provides a one-liner text description. -/// -/// Each step output struct implements this to describe its result in a single -/// line, e.g. "Scanned 5 files" or "Loaded model minishlab/potion-base-8M". -pub trait StepOutput { - /// One-liner description of the step's result. - fn format_line(&self) -> String; -} - -/// A record of a single pipeline step's execution. -#[derive(Debug, Serialize)] -pub struct ProcessingStep { - /// Wall-clock time for this step in milliseconds. - pub elapsed_ms: u64, - /// Step-specific typed output. - pub output: T, -} - -/// The three states a step can be in. -/// -/// Serialized with a `status` discriminator field (internally tagged): -/// - `{ "status": "completed", "elapsed_ms": 42, "output": { ... } }` -/// - `{ "status": "failed", "kind": "user", "message": "..." }` -/// - `{ "status": "skipped" }` -#[derive(Debug, Serialize)] -#[serde(tag = "status", rename_all = "snake_case")] -pub enum ProcessingStepResult { - /// The step ran and produced output. - Completed(ProcessingStep), - /// The step ran but hit an actual error. - Failed(ProcessingStepError), - /// The step didn't run because a previous step's outcome made it unnecessary. - Skipped, -} - -impl ProcessingStepResult { - /// Render this step as a single line for text output. - /// - /// The `label` identifies the step (e.g. "Scan", "Load model") and is - /// prepended to the description: `"Scan: 43 files"`, `"Load model: skipped"`. - pub fn format_line(&self, label: &str) -> String { - match self { - Self::Completed(step) => format!("{label}: {}", step.output.format_line()), - Self::Failed(err) => format!("{label}: failed — {}", err.message), - Self::Skipped => format!("{label}: skipped"), - } - } -} - -/// An error that occurred during a processing step. -#[derive(Debug, Serialize)] -pub struct ProcessingStepError { - /// Whether this is a user error (bad input) or application error (internal failure). - pub kind: ErrorKind, - /// Human-readable error message. - pub message: String, -} - -/// Error categorization (HTTP analogy). -#[derive(Debug, Serialize)] -#[serde(rename_all = "snake_case")] -pub enum ErrorKind { - /// Bad input: config not found, model mismatch, invalid flags (4xx). - User, - /// Unexpected internal failure: I/O errors, parquet corruption (5xx). - Application, -} - -#[cfg(test)] -mod tests { - use super::*; - - /// Dummy step output for testing. - #[derive(Debug, Serialize)] - struct DummyOutput { - files: usize, - } - - impl StepOutput for DummyOutput { - fn format_line(&self) -> String { - format!("{} files", self.files) - } - } - - #[test] - fn completed_json_shape() { - let result: ProcessingStepResult = - ProcessingStepResult::Completed(ProcessingStep { - elapsed_ms: 42, - output: DummyOutput { files: 5 }, - }); - let json = serde_json::to_value(&result).unwrap(); - assert_eq!(json["status"], "completed"); - assert_eq!(json["elapsed_ms"], 42); - assert_eq!(json["output"]["files"], 5); - } - - #[test] - fn failed_json_shape() { - let result: ProcessingStepResult = - ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::User, - message: "config not found".into(), - }); - let json = serde_json::to_value(&result).unwrap(); - assert_eq!(json["status"], "failed"); - assert_eq!(json["kind"], "user"); - assert_eq!(json["message"], "config not found"); - } - - #[test] - fn failed_application_error_json_shape() { - let result: ProcessingStepResult = - ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::Application, - message: "I/O error reading parquet".into(), - }); - let json = serde_json::to_value(&result).unwrap(); - assert_eq!(json["status"], "failed"); - assert_eq!(json["kind"], "application"); - assert_eq!(json["message"], "I/O error reading parquet"); - } - - #[test] - fn skipped_json_shape() { - let result: ProcessingStepResult = ProcessingStepResult::Skipped; - let json = serde_json::to_value(&result).unwrap(); - assert_eq!(json["status"], "skipped"); - // No other fields - assert_eq!(json.as_object().unwrap().len(), 1); - } - - #[test] - fn format_line_completed() { - let result: ProcessingStepResult = - ProcessingStepResult::Completed(ProcessingStep { - elapsed_ms: 100, - output: DummyOutput { files: 3 }, - }); - assert_eq!(result.format_line("Process"), "Process: 3 files"); - } - - #[test] - fn format_line_failed() { - let result: ProcessingStepResult = - ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::User, - message: "model not found".into(), - }); - assert_eq!( - result.format_line("Load model"), - "Load model: failed — model not found" - ); - } - - #[test] - fn format_line_skipped() { - let result: ProcessingStepResult = ProcessingStepResult::Skipped; - assert_eq!(result.format_line("Embed"), "Embed: skipped"); - } -} diff --git a/src/pipeline/read_config.rs b/src/pipeline/read_config.rs deleted file mode 100644 index e2cbe2d..0000000 --- a/src/pipeline/read_config.rs +++ /dev/null @@ -1,60 +0,0 @@ -//! Read config step — loads and parses `mdvs.toml`. - -use serde::Serialize; -use std::path::Path; -use std::time::Instant; - -use crate::pipeline::{ - ErrorKind, ProcessingStep, ProcessingStepError, ProcessingStepResult, StepOutput, -}; -use crate::schema::config::MdvsToml; - -/// Output record for the read config step. -#[derive(Debug, Serialize)] -pub struct ReadConfigOutput { - /// Path to the config file that was read. - pub config_path: String, -} - -impl StepOutput for ReadConfigOutput { - fn format_line(&self) -> String { - self.config_path.clone() - } -} - -/// Read and parse `mdvs.toml` from the given project path. -/// -/// Returns the step result (for the pipeline record) and the parsed config -/// (for the next step to consume). The config is `None` if reading failed. -pub fn run_read_config(path: &Path) -> (ProcessingStepResult, Option) { - let start = Instant::now(); - let config_path = path.join("mdvs.toml"); - match MdvsToml::read(&config_path) { - Ok(config) => { - if let Err(e) = config.validate() { - let err = ProcessingStepError { - kind: ErrorKind::User, - message: format!( - "mdvs.toml is invalid: {} — fix the file or run 'mdvs init --force'", - e - ), - }; - return (ProcessingStepResult::Failed(err), None); - } - let step = ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: ReadConfigOutput { - config_path: config_path.display().to_string(), - }, - }; - (ProcessingStepResult::Completed(step), Some(config)) - } - Err(e) => { - let err = ProcessingStepError { - kind: ErrorKind::User, - message: e.to_string(), - }; - (ProcessingStepResult::Failed(err), None) - } - } -} diff --git a/src/pipeline/read_index.rs b/src/pipeline/read_index.rs deleted file mode 100644 index 1b0f9d7..0000000 --- a/src/pipeline/read_index.rs +++ /dev/null @@ -1,110 +0,0 @@ -//! Read index step — reads backend metadata and statistics. - -use serde::Serialize; -use std::path::Path; -use std::time::Instant; - -use crate::index::backend::{Backend, IndexStats}; -use crate::index::storage::BuildMetadata; -use crate::pipeline::{ - ErrorKind, ProcessingStep, ProcessingStepError, ProcessingStepResult, StepOutput, -}; - -/// Output record for the read index step. -#[derive(Debug, Serialize)] -pub struct ReadIndexOutput { - /// Whether a built index was found. - pub exists: bool, - /// Number of files in the index (0 if no index). - pub files_indexed: usize, - /// Number of chunks in the index (0 if no index). - pub chunks: usize, -} - -impl StepOutput for ReadIndexOutput { - fn format_line(&self) -> String { - if self.exists { - format!("{} files, {} chunks", self.files_indexed, self.chunks) - } else { - "no index found".to_string() - } - } -} - -/// Data passed forward from read_index — full metadata and statistics. -pub struct IndexData { - /// Build metadata from parquet key-value metadata. - pub metadata: BuildMetadata, - /// Index statistics (file and chunk counts). - pub stats: IndexStats, -} - -/// Read index metadata and statistics from the `.mdvs/` directory. -/// -/// Returns the step result and the full index data (for the command result). -/// "No index found" is a normal `Completed` output with `exists=false`. -pub fn run_read_index(path: &Path) -> (ProcessingStepResult, Option) { - let start = Instant::now(); - let backend = Backend::parquet(path); - - if !backend.exists() { - let step = ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: ReadIndexOutput { - exists: false, - files_indexed: 0, - chunks: 0, - }, - }; - return (ProcessingStepResult::Completed(step), None); - } - - let build_meta = match backend.read_metadata() { - Ok(m) => m, - Err(e) => { - let err = ProcessingStepError { - kind: ErrorKind::Application, - message: e.to_string(), - }; - return (ProcessingStepResult::Failed(err), None); - } - }; - - let idx_stats = match backend.stats() { - Ok(s) => s, - Err(e) => { - let err = ProcessingStepError { - kind: ErrorKind::Application, - message: e.to_string(), - }; - return (ProcessingStepResult::Failed(err), None); - } - }; - - match (build_meta, idx_stats) { - (Some(metadata), Some(stats)) => { - let step = ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: ReadIndexOutput { - exists: true, - files_indexed: stats.files_indexed, - chunks: stats.chunks, - }, - }; - let data = IndexData { metadata, stats }; - (ProcessingStepResult::Completed(step), Some(data)) - } - _ => { - // Index directory exists but metadata/stats missing — treat as no index - let step = ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: ReadIndexOutput { - exists: false, - files_indexed: 0, - chunks: 0, - }, - }; - (ProcessingStepResult::Completed(step), None) - } - } -} diff --git a/src/pipeline/scan.rs b/src/pipeline/scan.rs deleted file mode 100644 index 6d5832d..0000000 --- a/src/pipeline/scan.rs +++ /dev/null @@ -1,57 +0,0 @@ -//! Scan step — walks the filesystem and parses frontmatter. - -use serde::Serialize; -use std::path::Path; -use std::time::Instant; - -use crate::discover::scan::ScannedFiles; -use crate::output::format_file_count; -use crate::pipeline::{ - ErrorKind, ProcessingStep, ProcessingStepError, ProcessingStepResult, StepOutput, -}; -use crate::schema::shared::ScanConfig; - -/// Output record for the scan step. -#[derive(Debug, Serialize)] -pub struct ScanOutput { - /// Number of markdown files found. - pub files_found: usize, - /// Glob pattern used for scanning. - pub glob: String, -} - -impl StepOutput for ScanOutput { - fn format_line(&self) -> String { - format_file_count(self.files_found) - } -} - -/// Scan the project directory for markdown files and parse their frontmatter. -/// -/// Returns the step result (for the pipeline record) and the scanned files -/// (for the next step to consume). The scanned files are `None` if scanning failed. -pub fn run_scan( - path: &Path, - config: &ScanConfig, -) -> (ProcessingStepResult, Option) { - let start = Instant::now(); - match ScannedFiles::scan(path, config) { - Ok(scanned) => { - let step = ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: ScanOutput { - files_found: scanned.files.len(), - glob: config.glob.clone(), - }, - }; - (ProcessingStepResult::Completed(step), Some(scanned)) - } - Err(e) => { - let err = ProcessingStepError { - kind: ErrorKind::Application, - message: e.to_string(), - }; - (ProcessingStepResult::Failed(err), None) - } - } -} diff --git a/src/pipeline/validate.rs b/src/pipeline/validate.rs deleted file mode 100644 index 3e08413..0000000 --- a/src/pipeline/validate.rs +++ /dev/null @@ -1,74 +0,0 @@ -//! Validate step — checks frontmatter against the schema. - -use serde::Serialize; -use std::time::Instant; - -use crate::discover::scan::ScannedFiles; -use crate::output::{format_file_count, FieldViolation, NewField}; -use crate::pipeline::{ - ErrorKind, ProcessingStep, ProcessingStepError, ProcessingStepResult, StepOutput, -}; -use crate::schema::config::MdvsToml; - -/// Output record for the validate step. -#[derive(Debug, Serialize)] -pub struct ValidateOutput { - /// Number of markdown files validated. - pub files_checked: usize, - /// Number of schema violations found. - pub violation_count: usize, - /// Number of new (unknown) fields found. - pub new_field_count: usize, -} - -impl StepOutput for ValidateOutput { - fn format_line(&self) -> String { - if self.violation_count == 0 { - format!("{} — no violations", format_file_count(self.files_checked)) - } else { - format!( - "{} — {} violation(s)", - format_file_count(self.files_checked), - self.violation_count - ) - } - } -} - -/// Validate scanned files against the schema. -/// -/// Full validation data passed forward to the command result. -pub type ValidationData = (Vec, Vec); - -/// Returns the step result (for the pipeline record) and the full validation -/// data (violations and new fields, for the command result). The validation -/// data is `None` if validation itself failed (not if violations were found — -/// violations are normal output). -pub fn run_validate( - scanned: &ScannedFiles, - config: &MdvsToml, - verbose: bool, -) -> (ProcessingStepResult, Option) { - let start = Instant::now(); - match crate::cmd::check::validate(scanned, config, verbose) { - Ok(check_result) => { - let step = ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: ValidateOutput { - files_checked: check_result.files_checked, - violation_count: check_result.field_violations.len(), - new_field_count: check_result.new_fields.len(), - }, - }; - let data = (check_result.field_violations, check_result.new_fields); - (ProcessingStepResult::Completed(step), Some(data)) - } - Err(e) => { - let err = ProcessingStepError { - kind: ErrorKind::Application, - message: e.to_string(), - }; - (ProcessingStepResult::Failed(err), None) - } - } -} diff --git a/src/pipeline/write_config.rs b/src/pipeline/write_config.rs deleted file mode 100644 index 1019fbd..0000000 --- a/src/pipeline/write_config.rs +++ /dev/null @@ -1,60 +0,0 @@ -//! Write config step — constructs and writes `mdvs.toml` from inferred schema. - -use serde::Serialize; -use std::path::Path; -use std::time::Instant; - -use crate::discover::infer::InferredSchema; -use crate::pipeline::{ - ErrorKind, ProcessingStep, ProcessingStepError, ProcessingStepResult, StepOutput, -}; -use crate::schema::config::MdvsToml; -use crate::schema::shared::ScanConfig; - -/// Output record for the write_config step. -#[derive(Debug, Serialize)] -pub struct WriteConfigOutput { - /// Path where `mdvs.toml` was written. - pub config_path: String, - /// Number of fields written to the config. - pub fields_written: usize, -} - -impl StepOutput for WriteConfigOutput { - fn format_line(&self) -> String { - self.config_path.clone() - } -} - -/// Construct `MdvsToml` from inferred schema and write to disk. -/// Schema-only — no build sections are written. -pub fn run_write_config( - path: &Path, - schema: &InferredSchema, - scan_config: ScanConfig, -) -> (ProcessingStepResult, Option) { - let start = Instant::now(); - let config_path = path.join("mdvs.toml"); - - let toml_doc = MdvsToml::from_inferred(schema, scan_config); - - if let Err(e) = toml_doc.write(&config_path) { - return ( - ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::Application, - message: e.to_string(), - }), - None, - ); - } - - let fields_written = schema.fields.len(); - let step = ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: WriteConfigOutput { - config_path: config_path.display().to_string(), - fields_written, - }, - }; - (ProcessingStepResult::Completed(step), Some(toml_doc)) -} diff --git a/src/pipeline/write_index.rs b/src/pipeline/write_index.rs deleted file mode 100644 index bb7785c..0000000 --- a/src/pipeline/write_index.rs +++ /dev/null @@ -1,60 +0,0 @@ -//! Write index step — writes file and chunk rows to the Parquet index. - -use serde::Serialize; -use std::time::Instant; - -use crate::discover::field_type::FieldType; -use crate::index::backend::Backend; -use crate::index::storage::{BuildMetadata, ChunkRow, FileRow}; -use crate::output::format_file_count; -use crate::pipeline::{ - ErrorKind, ProcessingStep, ProcessingStepError, ProcessingStepResult, StepOutput, -}; - -// BuildFileDetail moved to crate::output -pub use crate::output::BuildFileDetail; - -/// Output record for the write index step. -#[derive(Debug, Serialize)] -pub struct WriteIndexOutput { - /// Number of files written to the index. - pub files_written: usize, - /// Number of chunks written to the index. - pub chunks_written: usize, -} - -impl StepOutput for WriteIndexOutput { - fn format_line(&self) -> String { - format!( - "{}, {} chunks", - format_file_count(self.files_written), - self.chunks_written - ) - } -} - -/// Write file and chunk rows to the Parquet index. -/// -/// Returns the step result. -pub(crate) fn run_write_index( - backend: &Backend, - schema_fields: &[(String, FieldType)], - file_rows: &[FileRow], - chunk_rows: &[ChunkRow], - metadata: BuildMetadata, -) -> ProcessingStepResult { - let start = Instant::now(); - match backend.write_index(schema_fields, file_rows, chunk_rows, metadata) { - Ok(()) => ProcessingStepResult::Completed(ProcessingStep { - elapsed_ms: start.elapsed().as_millis() as u64, - output: WriteIndexOutput { - files_written: file_rows.len(), - chunks_written: chunk_rows.len(), - }, - }), - Err(e) => ProcessingStepResult::Failed(ProcessingStepError { - kind: ErrorKind::Application, - message: e.to_string(), - }), - } -} From b4f2c54ff4f51f594697aa9746a78142040718b3 Mon Sep 17 00:00:00 2001 From: edoch Date: Fri, 20 Mar 2026 19:00:48 +0100 Subject: [PATCH 09/35] docs: add post-migration cleanup TODOs (0134, 0135) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit TODO-0134: Step tree post-migration cleanup — unused imports, missing unit tests for moved private functions. TODO-0135: Skipped padding removal design notes — documents the architectural decision and remaining variable-length substep concerns. Co-Authored-By: Claude --- docs/spec/todos/TODO-0134.md | 46 +++++++++++++++++++++++++++++++++++ docs/spec/todos/TODO-0135.md | 47 ++++++++++++++++++++++++++++++++++++ docs/spec/todos/index.md | 2 ++ 3 files changed, 95 insertions(+) create mode 100644 docs/spec/todos/TODO-0134.md create mode 100644 docs/spec/todos/TODO-0135.md diff --git a/docs/spec/todos/TODO-0134.md b/docs/spec/todos/TODO-0134.md new file mode 100644 index 0000000..f50ab25 --- /dev/null +++ b/docs/spec/todos/TODO-0134.md @@ -0,0 +1,46 @@ +--- +id: 134 +title: Step tree post-migration cleanup +status: todo +priority: medium +created: 2026-03-20 +depends_on: [131] +blocks: [] +--- + +# TODO-0134: Step tree post-migration cleanup + +## Summary + +Audit findings from the Step tree migration (TODO-0131). Covers dead parameter cleanup, unused imports, and missing unit tests for moved functions. + +## Details + +### 1. `fail_from_last_skip()` unused parameter (build.rs) + +The function accepts `_skipped: usize` but ignores it. Remove the parameter — callers always pass `0`. (The broader Skipped-padding question is tracked in TODO-0135.) + +### 2. Unused test imports (update.rs) + +Test module imports `ViolationKind`, `FieldsConfig`, `UpdateConfig`, `FieldTypeSerde`, `ScanConfig` but doesn't use them. Lint cleanup. + +### 3. Missing unit tests for private functions + +Functions moved during the pipeline cleanup that lack isolation tests: + +| Function | File | Currently tested via | +|----------|------|---------------------| +| `validate_where_clause()` | `src/cmd/search.rs` | Integration only | +| `read_lines()` | `src/cmd/search.rs` | Not tested | +| `has_failed()` | `src/step.rs` | Indirect only | +| `has_violations()` | `src/step.rs` | Indirect only | +| `type_matches()` | `src/cmd/check.rs` | Integration only | +| `matches_any_glob()` | `src/cmd/check.rs` | Integration only | + +`embed_file()` in build.rs is excluded — requires a real Embedder and is well-covered by integration tests. + +### Deferred to TODO-0135 + +- Auto-update output dropped on success in check.rs +- Variable-length substep trees from auto-update/auto-build nesting +- Skipped padding removal from error paths diff --git a/docs/spec/todos/TODO-0135.md b/docs/spec/todos/TODO-0135.md new file mode 100644 index 0000000..08d9143 --- /dev/null +++ b/docs/spec/todos/TODO-0135.md @@ -0,0 +1,47 @@ +--- +id: 135 +title: Remove Skipped padding from Step tree error paths +status: todo +priority: medium +created: 2026-03-20 +depends_on: [131] +blocks: [] +--- + +# TODO-0135: Remove Skipped padding from Step tree error paths + +## Summary + +Commands currently pad their substep list with `Step::skipped()` entries on early failure so the tree always has a fixed number of substeps. This is a leftover from the old fixed-position pipeline model and is unnecessary with the Step tree architecture, where each step is identified by its Outcome type, not its position. + +## Details + +### Current behavior + +When a command fails early, it pushes anonymous Skipped entries for every step that won't run: + +```rust +// build fails after scan → pad with 5 Skipped +return fail_from_last(&mut substeps, start, 5); +// result: [ReadConfig, Scan(failed), Skipped, Skipped, Skipped, Skipped, Skipped] +``` + +Every error path in every command has a hardcoded skip count. These counts are fragile (the audit found inconsistencies when auto-update/auto-build nesting changes the total). + +### Proposed behavior + +Stop padding. A failed-after-scan tree is just `[ReadConfig, Scan(failed)]`. The renderer shows what happened and stops. Each step is identified by its `Outcome` variant, not by position. + +### Scope + +1. **Remove skip-count parameters** from `fail_from_last`, `fail_msg`, `fail_early` helpers in all commands +2. **Stop pushing Skipped substeps** on error paths +3. **Update rendering** to handle variable-length substep lists (currently rendering uses `Render` trait on each substep's Outcome — should already work, but verify) +4. **Attach auto-update/auto-build steps** as substeps in check.rs (currently dropped on success), matching build.rs and search.rs patterns +5. **Verify JSON output** still makes sense without padding (compact and verbose) + +### Related + +- check.rs drops auto-update output on success (fix as part of this) +- build.rs/search.rs auto-update/auto-build nesting complicates skip counts (resolved by removing padding) +- `fail_from_last_skip()` in build.rs has unused `_skipped` parameter (resolved by this TODO) diff --git a/docs/spec/todos/index.md b/docs/spec/todos/index.md index b52095d..3554cb3 100644 --- a/docs/spec/todos/index.md +++ b/docs/spec/todos/index.md @@ -135,3 +135,5 @@ | [0131](TODO-0131.md) | Step tree: delete old pipeline, update main.rs, simplify output.rs | todo | high | 2026-03-19 | | [0132](TODO-0132.md) | Macro for compact struct generation (crabtime) | todo | low | 2026-03-19 | | [0133](TODO-0133.md) | Macro for step pipeline boilerplate (early-return pattern) | todo | low | 2026-03-19 | +| [0134](TODO-0134.md) | Step tree post-migration cleanup | todo | medium | 2026-03-20 | +| [0135](TODO-0135.md) | Remove Skipped padding from Step tree error paths | todo | medium | 2026-03-20 | From 6fa92100c09fe65f2a1ceb63bdf9f72e45e62e1f Mon Sep 17 00:00:00 2001 From: edoch Date: Fri, 20 Mar 2026 23:07:17 +0100 Subject: [PATCH 10/35] refactor: inline auto-update and auto-build to eliminate redundant reads Commands no longer call other commands' run() as black boxes. Instead, they share data (config, scanned files, embedder) directly. - check: inlines update logic (infer + write config) after scan, reuses scanned files for validate. Now sync (no async needed). - build: extracts build_core() shared function, inlines auto-update after scan. mutate_config made pub(crate). - search: calls build_core() directly instead of build::run(), reuses embedder from build for query embedding. Before: check = 3 config reads + 2 scans; search = 5+ reads + 3+ scans + 2 model loads. After: each command reads config once, scans once, loads model once. Co-Authored-By: Claude --- src/cmd/build.rs | 280 ++++++++++++++++++++++++++-------------------- src/cmd/check.rs | 232 +++++++++++++++++++++----------------- src/cmd/search.rs | 134 ++++++++++------------ src/main.rs | 2 +- 4 files changed, 347 insertions(+), 301 deletions(-) diff --git a/src/cmd/build.rs b/src/cmd/build.rs index 1e2339d..8c046c5 100644 --- a/src/cmd/build.rs +++ b/src/cmd/build.rs @@ -1,4 +1,5 @@ use crate::discover::field_type::FieldType; +use crate::discover::infer::InferredSchema; use crate::discover::scan::{ScannedFile, ScannedFiles}; use crate::index::backend::Backend; use crate::index::chunk::{extract_plain_text, Chunks}; @@ -6,12 +7,12 @@ use crate::index::embed::{Embedder, ModelConfig}; use crate::index::storage::{content_hash, BuildMetadata, ChunkRow, FileIndexEntry, FileRow}; use crate::outcome::commands::BuildOutcome; use crate::outcome::{ - ClassifyOutcome, EmbedFilesOutcome, LoadModelOutcome, Outcome, ReadConfigOutcome, ScanOutcome, - ValidateOutcome, WriteIndexOutcome, + ClassifyOutcome, EmbedFilesOutcome, InferOutcome, LoadModelOutcome, Outcome, ReadConfigOutcome, + ScanOutcome, ValidateOutcome, WriteConfigOutcome, WriteIndexOutcome, }; use crate::output::BuildFileDetail; -use crate::schema::config::{BuildConfig, MdvsToml, SearchConfig}; -use crate::schema::shared::{ChunkingConfig, EmbeddingModelConfig}; +use crate::schema::config::{BuildConfig, MdvsToml, SearchConfig, TomlField}; +use crate::schema::shared::{ChunkingConfig, EmbeddingModelConfig, FieldTypeSerde}; use crate::step::{ErrorKind, Step, StepError, StepOutcome}; use std::collections::{HashMap, HashSet}; use std::path::Path; @@ -221,62 +222,14 @@ pub async fn run( None } }; - let config = match config { + let mut config = match config { Some(c) => c, None => return fail_from_last(&mut substeps, start), }; - // 2. Auto-update (conditional) let should_update = !no_update && config.build.as_ref().is_some_and(|b| b.auto_update); - if should_update { - let update_step = crate::cmd::update::run(path, &[], false, false, false).await; - if crate::step::has_failed(&update_step) { - substeps.push(update_step); - return fail_msg(&mut substeps, start, ErrorKind::User, "auto-update failed"); - } - substeps.push(update_step); - } - // Re-read config if auto-update ran - let mut config = if should_update { - let re_read_start = Instant::now(); - let re_read_path = path.join("mdvs.toml"); - match MdvsToml::read(&re_read_path) { - Ok(cfg) => match cfg.validate() { - Ok(()) => { - substeps.push(Step::leaf( - Outcome::ReadConfig(ReadConfigOutcome { - config_path: re_read_path.display().to_string(), - }), - re_read_start.elapsed().as_millis() as u64, - )); - cfg - } - Err(e) => { - substeps.push(Step::failed( - ErrorKind::User, - format!( - "mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'" - ), - re_read_start.elapsed().as_millis() as u64, - )); - return fail_from_last(&mut substeps, start); - } - }, - Err(e) => { - substeps.push(Step::failed( - ErrorKind::User, - e.to_string(), - re_read_start.elapsed().as_millis() as u64, - )); - return fail_from_last(&mut substeps, start); - } - } - } else { - config - }; - - // Mutate config (inline, not a step) + // 2. Mutate config (inline, not a step) let mutation_error = mutate_config( &mut config, path, @@ -286,7 +239,6 @@ pub async fn run( force, ); - // 3. Scan (mutation errors land here) if let Some(msg) = mutation_error { substeps.push(Step { substeps: vec![], @@ -301,6 +253,48 @@ pub async fn run( return fail_from_last(&mut substeps, start); } + // 3. Core build pipeline (scan → auto-update → validate → classify → embed → write) + let (build_outcome, _embedder) = match build_core( + path, + &mut config, + &config_path_buf, + force, + should_update, + &mut substeps, + ) + .await + { + Ok(result) => result, + Err(()) => return fail_from_last(&mut substeps, start), + }; + + Step { + substeps, + outcome: StepOutcome::Complete { + result: Ok(Outcome::Build(Box::new(build_outcome))), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + } +} + +// ============================================================================ +// build_core() — shared pipeline, called by build::run() and search::run() +// ============================================================================ + +/// Core build pipeline: scan → auto-update → validate → classify → embed → write index. +/// +/// Returns `BuildOutcome` + optional `Embedder` (for reuse by search) on success. +/// On failure, pushes error substeps and returns `Err(())` — the caller constructs +/// the failed command Step from the substeps. +pub(crate) async fn build_core( + path: &Path, + config: &mut MdvsToml, + config_path: &Path, + force: bool, + auto_update: bool, + substeps: &mut Vec>, +) -> Result<(BuildOutcome, Option), ()> { + // 1. Scan let scan_start = Instant::now(); let scanned = match ScannedFiles::scan(path, &config.scan) { Ok(s) => { @@ -319,11 +313,71 @@ pub async fn run( e.to_string(), scan_start.elapsed().as_millis() as u64, )); - return fail_from_last(&mut substeps, start); + return Err(()); } }; - // 4. Validate + // 2. Auto-update: infer new fields, write config if changed + if auto_update { + let infer_start = Instant::now(); + let schema = InferredSchema::infer(&scanned); + substeps.push(Step::leaf( + Outcome::Infer(InferOutcome { + fields_inferred: schema.fields.len(), + }), + infer_start.elapsed().as_millis() as u64, + )); + + let existing: HashSet<&str> = config + .fields + .field + .iter() + .map(|f| f.name.as_str()) + .collect(); + let new_toml_fields: Vec = schema + .fields + .iter() + .filter(|f| !existing.contains(f.name.as_str())) + .filter(|f| !config.fields.ignore.contains(&f.name)) + .map(|f| TomlField { + name: f.name.clone(), + field_type: FieldTypeSerde::from(&f.field_type), + allowed: f.allowed.clone(), + required: f.required.clone(), + nullable: f.nullable, + }) + .collect(); + + if !new_toml_fields.is_empty() { + config.fields.field.extend(new_toml_fields); + let write_start = Instant::now(); + match config.write(config_path) { + Ok(()) => { + substeps.push(Step::leaf( + Outcome::WriteConfig(WriteConfigOutcome { + config_path: config_path.display().to_string(), + fields_written: config.fields.field.len(), + }), + write_start.elapsed().as_millis() as u64, + )); + // Re-read to pick up normalized TOML + if let Ok(c) = MdvsToml::read(config_path) { + *config = c; + } + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + write_start.elapsed().as_millis() as u64, + )); + return Err(()); + } + } + } + } + + // 3. Validate if scanned.files.is_empty() { substeps.push(Step { substeps: vec![], @@ -335,11 +389,11 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start); + return Err(()); } let validate_start = Instant::now(); - let check_result = match crate::cmd::check::validate(&scanned, &config, false) { + let check_result = match crate::cmd::check::validate(&scanned, config, false) { Ok(r) => r, Err(e) => { substeps.push(Step::failed( @@ -347,7 +401,7 @@ pub async fn run( e.to_string(), validate_start.elapsed().as_millis() as u64, )); - return fail_from_last(&mut substeps, start); + return Err(()); } }; substeps.push(Step::leaf( @@ -361,10 +415,9 @@ pub async fn run( let violations = check_result.field_violations; let new_fields = check_result.new_fields; - // Abort on violations if !violations.is_empty() { - return Step { - substeps, + substeps.push(Step { + substeps: vec![], outcome: StepOutcome::Complete { result: Err(StepError { kind: ErrorKind::User, @@ -373,12 +426,13 @@ pub async fn run( violations.len() ), }), - elapsed_ms: start.elapsed().as_millis() as u64, + elapsed_ms: 0, }, - }; + }); + return Err(()); } - // Pre-checks for classify + // 4. Pre-checks for classify let schema_fields: Vec<(String, FieldType)> = match config .fields .field @@ -402,7 +456,7 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start); + return Err(()); } }; @@ -419,7 +473,7 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start); + return Err(()); } }; let chunking = match config.chunking.as_ref() { @@ -435,11 +489,11 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start); + return Err(()); } }; let backend = Backend::parquet(path); - let config_change_error = detect_config_changes(&backend, embedding, chunking, &config, force); + let config_change_error = detect_config_changes(&backend, embedding, chunking, config, force); // 5. Classify let full_rebuild = force || !backend.exists(); @@ -455,7 +509,7 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start); + return Err(()); } let existing_index = if full_rebuild { @@ -474,7 +528,7 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start); + return Err(()); } } }; @@ -494,7 +548,7 @@ pub async fn run( elapsed_ms: 0, }, }); - return fail_from_last(&mut substeps, start); + return Err(()); } } }; @@ -590,7 +644,7 @@ pub async fn run( let needs_embedding = !classify_data.needs_embedding.is_empty(); - // 6. Load model — calls ModelConfig::try_from() + Embedder::load() directly + // 6. Load model let embedder = if needs_embedding { let model_start = Instant::now(); match ModelConfig::try_from(embedding) { @@ -651,12 +705,17 @@ pub async fn run( }; if needs_embedding && embedder.is_none() { - return fail_msg( - &mut substeps, - start, - ErrorKind::Application, - "model loading failed", - ); + substeps.push(Step { + substeps: vec![], + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: "model loading failed".into(), + }), + elapsed_ms: 0, + }, + }); + return Err(()); } // 7. Embed files @@ -665,7 +724,7 @@ pub async fn run( if let Some(msg) = dim_error { substeps.push(Step::failed(ErrorKind::User, msg, 0)); - return fail_from_last(&mut substeps, start); + return Err(()); } let embed_data = if needs_embedding { @@ -745,7 +804,7 @@ pub async fn run( e.to_string(), write_start.elapsed().as_millis() as u64, )); - return fail_from_last(&mut substeps, start); + return Err(()); } } @@ -754,26 +813,22 @@ pub async fn run( let chunks_total = chunk_rows.len(); let chunks_unchanged = chunks_total - chunks_embedded; - Step { - substeps, - outcome: StepOutcome::Complete { - result: Ok(Outcome::Build(Box::new(BuildOutcome { - full_rebuild: classify_data.full_rebuild, - files_total: file_rows.len(), - files_embedded: classify_data.needs_embedding.len(), - files_unchanged: file_rows.len() - classify_data.needs_embedding.len(), - files_removed: classify_data.removed_count, - chunks_total, - chunks_embedded, - chunks_unchanged, - chunks_removed: classify_data.chunks_removed, - new_fields, - embedded_files: embedded_details, - removed_files: classify_data.removed_details, - }))), - elapsed_ms: start.elapsed().as_millis() as u64, - }, - } + let outcome = BuildOutcome { + full_rebuild: classify_data.full_rebuild, + files_total: file_rows.len(), + files_embedded: classify_data.needs_embedding.len(), + files_unchanged: file_rows.len() - classify_data.needs_embedding.len(), + files_removed: classify_data.removed_count, + chunks_total, + chunks_embedded, + chunks_unchanged, + chunks_removed: classify_data.chunks_removed, + new_fields, + embedded_files: embedded_details, + removed_files: classify_data.removed_details, + }; + + Ok((outcome, embedder)) } /// Extract error from last failed substep and return a failed command Step. @@ -797,25 +852,6 @@ fn fail_from_last(substeps: &mut Vec>, start: Instant) -> Step>, - start: Instant, - kind: ErrorKind, - msg: &str, -) -> Step { - Step { - substeps: std::mem::take(substeps), - outcome: StepOutcome::Complete { - result: Err(StepError { - kind, - message: msg.into(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, - } -} - // ============================================================================ // Helpers // ============================================================================ @@ -831,7 +867,7 @@ fn normalize_revision(s: &str) -> Option { /// Apply config mutations: fill missing build sections, apply --set-* flags. /// Returns `Some(error_message)` if a flag requires --force but wasn't given. -fn mutate_config( +pub(crate) fn mutate_config( config: &mut MdvsToml, path: &Path, set_model: Option<&str>, diff --git a/src/cmd/check.rs b/src/cmd/check.rs index dcba8b6..4677254 100644 --- a/src/cmd/check.rs +++ b/src/cmd/check.rs @@ -1,9 +1,12 @@ use crate::discover::field_type::FieldType; +use crate::discover::infer::InferredSchema; use crate::discover::scan::ScannedFiles; use crate::outcome::commands::CheckOutcome; -use crate::outcome::{Outcome, ReadConfigOutcome, ScanOutcome, ValidateOutcome}; +use crate::outcome::{ + InferOutcome, Outcome, ReadConfigOutcome, ScanOutcome, ValidateOutcome, WriteConfigOutcome, +}; use crate::output::{FieldViolation, NewField, ViolatingFile, ViolationKind}; -use crate::schema::config::MdvsToml; +use crate::schema::config::{MdvsToml, TomlField}; use crate::schema::shared::FieldTypeSerde; use crate::step::{ErrorKind, Step, StepError, StepOutcome}; use globset::Glob; @@ -11,6 +14,7 @@ use serde::Serialize; use serde_json::Value; use std::collections::{BTreeMap, HashMap, HashSet}; use std::path::{Path, PathBuf}; +use std::time::Instant; use tracing::{info, instrument}; // ============================================================================ @@ -42,8 +46,8 @@ impl CheckResult { /// Read config, optionally auto-update, scan files, and validate frontmatter. #[instrument(name = "check", skip_all)] -pub async fn run(path: &Path, no_update: bool, verbose: bool) -> Step { - let start = std::time::Instant::now(); +pub fn run(path: &Path, no_update: bool, verbose: bool) -> Step { + let start = Instant::now(); let mut substeps = Vec::new(); // 1. Read config — calls MdvsToml::read() + validate() directly @@ -99,49 +103,8 @@ pub async fn run(path: &Path, no_update: bool, verbose: bool) -> Step { } }; - // 2. Auto-update - let should_update = !no_update && config.check.as_ref().is_some_and(|c| c.auto_update); - if should_update { - let update_output = crate::cmd::update::run(path, &[], false, false, verbose).await; - let failed = crate::step::has_failed(&update_output); - substeps.push(update_output); - if failed { - return Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: "auto-update failed".into(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, - }; - } - } - - // Re-read config if auto-update ran - let config = if should_update { - match MdvsToml::read(&path.join("mdvs.toml")) { - Ok(c) => c, - Err(e) => { - return Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: format!("failed to re-read config: {e}"), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, - }; - } - } - } else { - config - }; - - // 3. Scan — calls ScannedFiles::scan() directly - let scan_start = std::time::Instant::now(); + // 2. Scan (once — shared between auto-update and validate) + let scan_start = Instant::now(); let scanned = match ScannedFiles::scan(path, &config.scan) { Ok(s) => { substeps.push(Step::leaf( @@ -151,7 +114,7 @@ pub async fn run(path: &Path, no_update: bool, verbose: bool) -> Step { }), scan_start.elapsed().as_millis() as u64, )); - Some(s) + s } Err(e) => { substeps.push(Step::failed( @@ -159,17 +122,7 @@ pub async fn run(path: &Path, no_update: bool, verbose: bool) -> Step { e.to_string(), scan_start.elapsed().as_millis() as u64, )); - None - } - }; - - let scanned = match scanned { - Some(s) => s, - None => { - let msg = match &substeps.last().unwrap().outcome { - StepOutcome::Complete { result: Err(e), .. } => e.message.clone(), - _ => "scan failed".into(), - }; + let msg = e.to_string(); return Step { substeps, outcome: StepOutcome::Complete { @@ -183,6 +136,83 @@ pub async fn run(path: &Path, no_update: bool, verbose: bool) -> Step { } }; + // 3. Auto-update: infer new fields, write config if changed + let should_update = !no_update && config.check.as_ref().is_some_and(|c| c.auto_update); + let config = if should_update { + let infer_start = Instant::now(); + let schema = InferredSchema::infer(&scanned); + substeps.push(Step::leaf( + Outcome::Infer(InferOutcome { + fields_inferred: schema.fields.len(), + }), + infer_start.elapsed().as_millis() as u64, + )); + + // Find truly new fields (not in config, not ignored) + let existing: HashSet<&str> = config + .fields + .field + .iter() + .map(|f| f.name.as_str()) + .collect(); + let new_toml_fields: Vec = schema + .fields + .iter() + .filter(|f| !existing.contains(f.name.as_str())) + .filter(|f| !config.fields.ignore.contains(&f.name)) + .map(|f| TomlField { + name: f.name.clone(), + field_type: FieldTypeSerde::from(&f.field_type), + allowed: f.allowed.clone(), + required: f.required.clone(), + nullable: f.nullable, + }) + .collect(); + + if new_toml_fields.is_empty() { + config + } else { + let mut config = config; + config.fields.field.extend(new_toml_fields); + let write_start = Instant::now(); + match config.write(&config_path_buf) { + Ok(()) => { + substeps.push(Step::leaf( + Outcome::WriteConfig(WriteConfigOutcome { + config_path: config_path_buf.display().to_string(), + fields_written: config.fields.field.len(), + }), + write_start.elapsed().as_millis() as u64, + )); + // Re-read to pick up normalized TOML + match MdvsToml::read(&config_path_buf) { + Ok(c) => c, + Err(_) => config, + } + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + write_start.elapsed().as_millis() as u64, + )); + return Step { + substeps, + outcome: StepOutcome::Complete { + result: Err(StepError { + kind: ErrorKind::Application, + message: "auto-update failed to write config".into(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, + }, + }; + } + } + } + } else { + config + }; + // 4. Validate let validate_start = std::time::Instant::now(); let check_result = match validate(&scanned, &config, verbose) { @@ -559,8 +589,8 @@ mod tests { } } - #[tokio::test] - async fn clean_check() { + #[test] + fn clean_check() { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); write_toml( @@ -581,7 +611,7 @@ mod tests { vec![], ); - let step = run(tmp.path(), true, false).await; + let step = run(tmp.path(), true, false); let result = unwrap_check(&step); assert!(result.violations.is_empty()); @@ -589,8 +619,8 @@ mod tests { assert_eq!(result.files_checked, 2); } - #[tokio::test] - async fn missing_required() { + #[test] + fn missing_required() { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); @@ -612,7 +642,7 @@ mod tests { vec![], ); - let step = run(tmp.path(), true, false).await; + let step = run(tmp.path(), true, false); let result = unwrap_check(&step); assert!(!result.violations.is_empty()); @@ -623,8 +653,8 @@ mod tests { assert_eq!(v.files[0].path.display().to_string(), "blog/post2.md"); } - #[tokio::test] - async fn wrong_type() { + #[test] + fn wrong_type() { let tmp = tempfile::tempdir().unwrap(); let blog_dir = tmp.path().join("blog"); fs::create_dir_all(&blog_dir).unwrap(); @@ -641,7 +671,7 @@ mod tests { vec![], ); - let step = run(tmp.path(), true, false).await; + let step = run(tmp.path(), true, false); let result = unwrap_check(&step); assert!(!result.violations.is_empty()); @@ -651,8 +681,8 @@ mod tests { assert_eq!(v.files[0].detail.as_deref(), Some("got String")); } - #[tokio::test] - async fn wrong_type_int_in_float_lenient() { + #[test] + fn wrong_type_int_in_float_lenient() { let tmp = tempfile::tempdir().unwrap(); fs::create_dir_all(tmp.path().join("blog")).unwrap(); @@ -674,13 +704,13 @@ mod tests { vec![], ); - let step = run(tmp.path(), true, false).await; + let step = run(tmp.path(), true, false); let result = unwrap_check(&step); assert!(result.violations.is_empty()); } - #[tokio::test] - async fn disallowed_field() { + #[test] + fn disallowed_field() { let tmp = tempfile::tempdir().unwrap(); let notes_dir = tmp.path().join("notes"); fs::create_dir_all(¬es_dir).unwrap(); @@ -703,7 +733,7 @@ mod tests { vec![], ); - let step = run(tmp.path(), true, false).await; + let step = run(tmp.path(), true, false); let result = unwrap_check(&step); assert!(!result.violations.is_empty()); @@ -713,14 +743,14 @@ mod tests { assert_eq!(v.files[0].path.display().to_string(), "notes/idea.md"); } - #[tokio::test] - async fn new_fields_informational() { + #[test] + fn new_fields_informational() { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); write_toml(tmp.path(), vec![string_field("title")], vec![]); - let step = run(tmp.path(), true, false).await; + let step = run(tmp.path(), true, false); let result = unwrap_check(&step); assert!(result.violations.is_empty()); @@ -729,8 +759,8 @@ mod tests { assert!(result.new_fields.iter().any(|f| f.name == "tags")); } - #[tokio::test] - async fn string_top_type_accepts_any_value() { + #[test] + fn string_top_type_accepts_any_value() { let tmp = tempfile::tempdir().unwrap(); let blog_dir = tmp.path().join("blog"); fs::create_dir_all(&blog_dir).unwrap(); @@ -754,15 +784,15 @@ mod tests { write_toml(tmp.path(), vec![string_field("field")], vec![]); - let step = run(tmp.path(), true, false).await; + let step = run(tmp.path(), true, false); let result = unwrap_check(&step); assert!(result.violations.is_empty()); assert_eq!(result.files_checked, 4); } - #[tokio::test] - async fn bare_files_trigger_required() { + #[test] + fn bare_files_trigger_required() { let tmp = tempfile::tempdir().unwrap(); fs::create_dir_all(tmp.path().join("blog")).unwrap(); @@ -797,13 +827,13 @@ mod tests { }; config.write(&tmp.path().join("mdvs.toml")).unwrap(); - let step = run(tmp.path(), true, false).await; + let step = run(tmp.path(), true, false); let result = unwrap_check(&step); assert!(!result.violations.is_empty()); } - #[tokio::test] - async fn ignored_fields_skipped() { + #[test] + fn ignored_fields_skipped() { let tmp = tempfile::tempdir().unwrap(); create_test_vault(tmp.path()); @@ -813,15 +843,15 @@ mod tests { vec!["draft".into(), "tags".into()], ); - let step = run(tmp.path(), true, false).await; + let step = run(tmp.path(), true, false); let result = unwrap_check(&step); assert!(result.violations.is_empty()); assert!(result.new_fields.is_empty()); } - #[tokio::test] - async fn multiple_violations() { + #[test] + fn multiple_violations() { let tmp = tempfile::tempdir().unwrap(); let blog_dir = tmp.path().join("blog"); let notes_dir = tmp.path().join("notes"); @@ -870,15 +900,15 @@ mod tests { vec![], ); - let step = run(tmp.path(), true, false).await; + let step = run(tmp.path(), true, false); let result = unwrap_check(&step); assert!(!result.violations.is_empty()); assert!(result.violations.len() >= 3); } - #[tokio::test] - async fn null_on_non_nullable_non_required_field() { + #[test] + fn null_on_non_nullable_non_required_field() { let tmp = tempfile::tempdir().unwrap(); fs::create_dir_all(tmp.path().join("notes")).unwrap(); @@ -903,7 +933,7 @@ mod tests { vec![], ); - let step = run(tmp.path(), true, false).await; + let step = run(tmp.path(), true, false); let result = unwrap_check(&step); assert!(!result.violations.is_empty()); let v = result @@ -914,8 +944,8 @@ mod tests { assert!(matches!(v.kind, ViolationKind::NullNotAllowed)); } - #[tokio::test] - async fn null_on_nullable_non_required_field() { + #[test] + fn null_on_nullable_non_required_field() { let tmp = tempfile::tempdir().unwrap(); fs::create_dir_all(tmp.path().join("notes")).unwrap(); @@ -940,13 +970,13 @@ mod tests { vec![], ); - let step = run(tmp.path(), true, false).await; + let step = run(tmp.path(), true, false); let result = unwrap_check(&step); assert!(result.violations.is_empty()); } - #[tokio::test] - async fn null_on_disallowed_path() { + #[test] + fn null_on_disallowed_path() { let tmp = tempfile::tempdir().unwrap(); fs::create_dir_all(tmp.path().join("notes")).unwrap(); @@ -971,7 +1001,7 @@ mod tests { vec![], ); - let step = run(tmp.path(), true, false).await; + let step = run(tmp.path(), true, false); let result = unwrap_check(&step); assert!(!result.violations.is_empty()); let v = result @@ -982,8 +1012,8 @@ mod tests { assert!(matches!(v.kind, ViolationKind::Disallowed)); } - #[tokio::test] - async fn null_on_disallowed_path_and_not_nullable() { + #[test] + fn null_on_disallowed_path_and_not_nullable() { let tmp = tempfile::tempdir().unwrap(); fs::create_dir_all(tmp.path().join("notes")).unwrap(); @@ -1008,7 +1038,7 @@ mod tests { vec![], ); - let step = run(tmp.path(), true, false).await; + let step = run(tmp.path(), true, false); let result = unwrap_check(&step); assert!(!result.violations.is_empty()); diff --git a/src/cmd/search.rs b/src/cmd/search.rs index 3fb1a1f..62bf35e 100644 --- a/src/cmd/search.rs +++ b/src/cmd/search.rs @@ -62,7 +62,7 @@ pub async fn run( // 1. Read config — calls MdvsToml::read() + validate() directly let config_start = Instant::now(); let config_path_buf = path.join("mdvs.toml"); - let config = match MdvsToml::read(&config_path_buf) { + let mut config = match MdvsToml::read(&config_path_buf) { Ok(cfg) => match cfg.validate() { Ok(()) => { substeps.push(Step::leaf( @@ -92,68 +92,36 @@ pub async fn run( } }; - // Auto-build: run build before searching if configured - let auto_build = if let Some(ref cfg) = config { + // Auto-build: run build core pipeline before searching if configured + let mut build_embedder: Option = None; + if let Some(ref mut cfg) = config { let should_build = !no_build && cfg.search.as_ref().is_some_and(|s| s.auto_build); if should_build { let build_no_update = no_update || !cfg.search.as_ref().is_some_and(|s| s.auto_update); - let build_step = - crate::cmd::build::run(path, None, None, None, false, build_no_update, false).await; - if crate::step::has_failed(&build_step) { - substeps.push(build_step); - return fail_msg(&mut substeps, start, ErrorKind::User, "auto-build failed"); - } - Some(build_step) - } else { - None - } - } else { - None - }; - - // Push auto-build substep if it ran - if let Some(build_step) = auto_build { - substeps.push(build_step); - } - - // Re-read config if auto-build ran - let config = if substeps.len() > 1 { - let re_read_start = Instant::now(); - let re_read_path = path.join("mdvs.toml"); - match MdvsToml::read(&re_read_path) { - Ok(cfg) => match cfg.validate() { - Ok(()) => { - substeps.push(Step::leaf( - Outcome::ReadConfig(ReadConfigOutcome { - config_path: re_read_path.display().to_string(), - }), - re_read_start.elapsed().as_millis() as u64, - )); - Some(cfg) + let auto_update = !build_no_update && cfg.build.as_ref().is_some_and(|b| b.auto_update); + + // Fill missing build sections (embedding_model, chunking, search, build) + crate::cmd::build::mutate_config(cfg, path, None, None, None, false); + + match crate::cmd::build::build_core( + path, + cfg, + &config_path_buf, + false, + auto_update, + &mut substeps, + ) + .await + { + Ok((_build_outcome, embedder)) => { + build_embedder = embedder; } - Err(e) => { - substeps.push(Step::failed( - ErrorKind::User, - format!( - "mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'" - ), - re_read_start.elapsed().as_millis() as u64, - )); - return fail_from_last(&mut substeps, start); + Err(()) => { + return fail_msg(&mut substeps, start, ErrorKind::User, "auto-build failed"); } - }, - Err(e) => { - substeps.push(Step::failed( - ErrorKind::User, - e.to_string(), - re_read_start.elapsed().as_millis() as u64, - )); - return fail_from_last(&mut substeps, start); } } - } else { - config - }; + } let embedding = config.as_ref().and_then(|c| c.embedding_model.as_ref()); @@ -232,20 +200,40 @@ pub async fn run( return fail_from_last(&mut substeps, start); } + // 3. Load model (reuse from build if available) let emb_config = embedding.unwrap(); - let model_start = Instant::now(); - let embedder = match ModelConfig::try_from(emb_config) { - Ok(mc) => match Embedder::load(&mc) { - Ok(emb) => { - substeps.push(Step::leaf( - Outcome::LoadModel(LoadModelOutcome { - model_name: emb_config.name.clone(), - dimension: emb.dimension(), - }), - model_start.elapsed().as_millis() as u64, - )); - emb - } + let embedder = if let Some(emb) = build_embedder { + substeps.push(Step::leaf( + Outcome::LoadModel(LoadModelOutcome { + model_name: emb_config.name.clone(), + dimension: emb.dimension(), + }), + 0, // already loaded during build + )); + emb + } else { + let model_start = Instant::now(); + match ModelConfig::try_from(emb_config) { + Ok(mc) => match Embedder::load(&mc) { + Ok(emb) => { + substeps.push(Step::leaf( + Outcome::LoadModel(LoadModelOutcome { + model_name: emb_config.name.clone(), + dimension: emb.dimension(), + }), + model_start.elapsed().as_millis() as u64, + )); + emb + } + Err(e) => { + substeps.push(Step::failed( + ErrorKind::Application, + e.to_string(), + model_start.elapsed().as_millis() as u64, + )); + return fail_from_last(&mut substeps, start); + } + }, Err(e) => { substeps.push(Step::failed( ErrorKind::Application, @@ -254,14 +242,6 @@ pub async fn run( )); return fail_from_last(&mut substeps, start); } - }, - Err(e) => { - substeps.push(Step::failed( - ErrorKind::Application, - e.to_string(), - model_start.elapsed().as_millis() as u64, - )); - return fail_from_last(&mut substeps, start); } }; diff --git a/src/main.rs b/src/main.rs index e5a6185..f3d47ad 100644 --- a/src/main.rs +++ b/src/main.rs @@ -280,7 +280,7 @@ async fn main() -> anyhow::Result<()> { Ok(()) } Command::Check { path, no_update } => { - let step = mdvs::cmd::check::run(&path, no_update, cli.verbose).await; + let step = mdvs::cmd::check::run(&path, no_update, cli.verbose); let failed = mdvs::step::has_failed(&step); let violations = mdvs::step::has_violations(&step); let output_str = match (&cli.output, cli.verbose) { From bfd20e5cb04d8775e5b6e802707e951db06b1082 Mon Sep 17 00:00:00 2001 From: edoch Date: Fri, 20 Mar 2026 23:07:40 +0100 Subject: [PATCH 11/35] docs: add TODO-0136 (inline auto-update/auto-build) and Justfile TODO-0136 tracks the three waves of inlining subcommand logic to eliminate redundant config reads and directory scans. Justfile provides a convenience wrapper for running mdvs against example_kb via the release binary. Co-Authored-By: Claude --- Justfile | 2 + docs/spec/todos/TODO-0136.md | 103 +++++++++++++++++++++++++++++++++++ docs/spec/todos/index.md | 1 + 3 files changed, 106 insertions(+) create mode 100644 Justfile create mode 100644 docs/spec/todos/TODO-0136.md diff --git a/Justfile b/Justfile new file mode 100644 index 0000000..b0a01e4 --- /dev/null +++ b/Justfile @@ -0,0 +1,2 @@ +mdvs *args: + ./target/release/mdvs {{args}} diff --git a/docs/spec/todos/TODO-0136.md b/docs/spec/todos/TODO-0136.md new file mode 100644 index 0000000..0421401 --- /dev/null +++ b/docs/spec/todos/TODO-0136.md @@ -0,0 +1,103 @@ +--- +id: 136 +title: Inline auto-update and auto-build logic to eliminate redundant reads +status: todo +priority: high +created: 2026-03-20 +depends_on: [131] +blocks: [] +--- + +# TODO-0136: Inline auto-update and auto-build logic to eliminate redundant reads + +## Summary + +Commands that call other commands' `run()` functions (check → update, build → update, search → build) cause redundant config reads and directory scans. Inline the subcommand logic so each command reads config once, scans once, and passes data forward. + +## Problem + +Current behavior with auto flags enabled: + +- `mdvs check`: 3 config reads, 2 full scans +- `mdvs build`: 3 config reads, 2 full scans +- `mdvs search` (auto_build + auto_update): 5+ config reads, 3+ scans + +Each command calls another command's `run()` as a black box, which starts from scratch. + +## Solution + +Commands inline the subcommand logic using shared data. Core functions (`ScannedFiles::scan()`, `InferredSchema::infer()`, `check::validate()`, etc.) are already available — commands just need to wire them together without redundant reads. + +## Wave 1: check (auto_update) + +**Current flow:** +1. Read config +2. Call `update::run(path, ...)` → reads config, scans, infers, writes config +3. Re-read config +4. Scan +5. Validate + +**New flow:** +1. Read config +2. Scan (once) +3. If auto_update: infer → compare fields → write config if changed → re-read config +4. Validate (using same scanned files) + +**Result:** 1 config read (+ 1 re-read if config changed), 1 scan. + +**Files:** `src/cmd/check.rs` + +## Wave 2: build (auto_update) + +**Current flow:** +1. Read config +2. Call `update::run(path, ...)` → reads config, scans, infers, writes config +3. Re-read config +4. Scan +5. Validate → classify → load model → embed → write index + +**New flow:** +1. Read config +2. Scan (once) +3. If auto_update: infer → compare fields → write config if changed → re-read config +4. Validate (same scanned files) → classify → load model → embed → write index + +**Result:** 1 config read (+ 1 re-read if config changed), 1 scan. + +**Files:** `src/cmd/build.rs` + +## Wave 3: search (auto_build) + +**Current flow:** +1. Read config +2. Call `build::run(path, ...)` → reads config, calls update::run(), re-reads config, scans, validates, classifies, embeds, writes index +3. Re-read config +4. Read index → load model → embed query → execute search + +**New flow:** +1. Read config +2. If auto_build: + a. Scan (once) + b. If auto_update: infer → compare fields → write config if changed → re-read config + c. Validate (same scanned files) → classify → load model → embed → write index +3. Read index → load model (reuse if already loaded) → embed query → execute search + +**Result:** 1 config read (+ 1 re-read if config changed), 1 scan, 1 model load (shared between build embed and search query embed). + +**Files:** `src/cmd/search.rs` + +## Core functions available (no new code needed) + +| Function | Module | Used by | +|----------|--------|---------| +| `MdvsToml::read()` + `.validate()` | `schema::config` | all | +| `ScannedFiles::scan()` | `discover::scan` | all | +| `InferredSchema::infer()` | `discover::infer` | check, build (auto_update) | +| `check::validate()` | `cmd::check` | check, build | +| `classify_files()` | `cmd::build` (private) | build, search (auto_build) | +| `embed_file()` | `cmd::build` (private) | build, search (auto_build) | +| `ModelConfig::try_from()` + `Embedder::load()` | `index::embed` | build, search | +| `Backend::write_index()` | `index::backend` | build, search (auto_build) | +| `Backend::search()` | `index::backend` | search | + +**Note:** `classify_files()` and `embed_file()` are currently private to build.rs. For wave 3 (search inlining build logic), these may need to move to a shared location or search must duplicate them. diff --git a/docs/spec/todos/index.md b/docs/spec/todos/index.md index 3554cb3..dd652c8 100644 --- a/docs/spec/todos/index.md +++ b/docs/spec/todos/index.md @@ -137,3 +137,4 @@ | [0133](TODO-0133.md) | Macro for step pipeline boilerplate (early-return pattern) | todo | low | 2026-03-19 | | [0134](TODO-0134.md) | Step tree post-migration cleanup | todo | medium | 2026-03-20 | | [0135](TODO-0135.md) | Remove Skipped padding from Step tree error paths | todo | medium | 2026-03-20 | +| [0136](TODO-0136.md) | Inline auto-update and auto-build logic to eliminate redundant reads | todo | high | 2026-03-20 | From e6b24582619fd43816c67bcab2cd033841b8aa39 Mon Sep 17 00:00:00 2001 From: edoch Date: Mon, 23 Mar 2026 11:48:28 +0100 Subject: [PATCH 12/35] fix(check): deduplicate NullNotAllowed violation for required+non-nullable fields check_required_fields() emitted NullNotAllowed for null values on required non-nullable fields, but check_field_values() already catches this case. Result: same violation appeared twice in output. Fix: check_required_fields() now only emits MissingRequired (field absent entirely). Null-on-non-nullable is handled exclusively by check_field_values(). Adds test: null_on_required_non_nullable_produces_single_violation. Co-Authored-By: Claude --- src/cmd/check.rs | 78 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 54 insertions(+), 24 deletions(-) diff --git a/src/cmd/check.rs b/src/cmd/check.rs index 4677254..4da2e12 100644 --- a/src/cmd/check.rs +++ b/src/cmd/check.rs @@ -424,30 +424,17 @@ fn check_required_fields( .and_then(|v| v.as_object()) .and_then(|map| map.get(&toml_field.name)); - match value { - None => { - let key = ViolationKey { - field: toml_field.name.clone(), - kind: ViolationKind::MissingRequired, - rule: format!("required in {:?}", toml_field.required), - }; - violations.entry(key).or_default().push(ViolatingFile { - path: file.path.clone(), - detail: None, - }); - } - Some(v) if v.is_null() && !toml_field.nullable => { - let key = ViolationKey { - field: toml_field.name.clone(), - kind: ViolationKind::NullNotAllowed, - rule: format!("not nullable, required in {:?}", toml_field.required), - }; - violations.entry(key).or_default().push(ViolatingFile { - path: file.path.clone(), - detail: None, - }); - } - _ => {} + // Null on non-nullable is caught by check_field_values — only check absence here + if value.is_none() { + let key = ViolationKey { + field: toml_field.name.clone(), + kind: ViolationKind::MissingRequired, + rule: format!("required in {:?}", toml_field.required), + }; + violations.entry(key).or_default().push(ViolatingFile { + path: file.path.clone(), + detail: None, + }); } } } @@ -1054,4 +1041,47 @@ mod tests { assert!(has_disallowed, "expected Disallowed for draft"); assert!(has_null_not_allowed, "expected NullNotAllowed for draft"); } + + #[test] + fn null_on_required_non_nullable_produces_single_violation() { + let tmp = tempfile::tempdir().unwrap(); + fs::create_dir_all(tmp.path().join("notes")).unwrap(); + + fs::write( + tmp.path().join("notes/note1.md"), + "---\ntitle: Hello\nstatus:\n---\n# Hello\nBody.", + ) + .unwrap(); + + write_toml( + tmp.path(), + vec![ + string_field("title"), + TomlField { + name: "status".into(), + field_type: FieldTypeSerde::Scalar("String".into()), + allowed: vec!["**".into()], + required: vec!["**".into()], + nullable: false, + }, + ], + vec![], + ); + + let step = run(tmp.path(), true, false); + let result = unwrap_check(&step); + + // Should produce exactly 1 NullNotAllowed — not duplicated by check_required_fields + let null_violations: Vec<_> = result + .violations + .iter() + .filter(|v| v.field == "status" && matches!(v.kind, ViolationKind::NullNotAllowed)) + .collect(); + assert_eq!( + null_violations.len(), + 1, + "expected exactly 1 NullNotAllowed, got {}", + null_violations.len() + ); + } } From fe4bf555bf2721e8825e161fe01b1841b4e7f406 Mon Sep 17 00:00:00 2001 From: edoch Date: Mon, 23 Mar 2026 11:48:44 +0100 Subject: [PATCH 13/35] docs: add TODO-0137 (flatten Step tree into steps + result) Simplify the recursive Step tree into a flat steps + result structure. Deletes CompactOutcome (~1,000 lines), fixes double outcome nesting in JSON, makes compact/verbose a rendering choice rather than a type choice. Co-Authored-By: Claude --- docs/spec/todos/TODO-0137.md | 122 +++++++++++++++++++++++++++++++++++ docs/spec/todos/index.md | 1 + 2 files changed, 123 insertions(+) create mode 100644 docs/spec/todos/TODO-0137.md diff --git a/docs/spec/todos/TODO-0137.md b/docs/spec/todos/TODO-0137.md new file mode 100644 index 0000000..9110f7f --- /dev/null +++ b/docs/spec/todos/TODO-0137.md @@ -0,0 +1,122 @@ +--- +id: 137 +title: Flatten Step tree into steps + result structure +status: todo +priority: high +created: 2026-03-21 +depends_on: [136] +blocks: [] +--- + +# TODO-0137: Flatten Step tree into steps + result structure + +## Summary + +After TODO-0136 (inlining auto-update/auto-build), no command nests another command. The recursive `Step { substeps, outcome }` tree is always flat — substeps are always leaf nodes, never nested. Simplify the architecture to match reality: a flat list of process steps + a command result. + +## Problem + +Current JSON output has awkward nesting: +```json +{ + "substeps": [ + { "substeps": [], "outcome": { "status": "complete", "elapsed_ms": 5, "outcome": { "ReadConfig": { ... } } } }, + ... + ], + "outcome": { "status": "complete", "elapsed_ms": 97, "outcome": { "Check": { ... } } } +} +``` + +Issues: +- `substeps` always contains leaf nodes (empty `substeps: []`) — the recursive structure is unused +- `outcome` wraps another `outcome` — double nesting from `StepOutcome::Complete { result: Ok(Outcome::...) }` +- Verbose for consumers who just want the result + +## Design decisions (settled) + +### Compact vs verbose = "do you see the process steps or not?" + +- **Compact** (default): command output only — summary line, tables, details. Everything the command produces. +- **Verbose** (`-v`): process step lines (ReadConfig, Scan, Validate, etc.) shown above the command output. + +The `-v` flag controls visibility of process steps. It does NOT change what data the command outcome contains. + +### Text and JSON show the same information + +Whatever is visible in text is also present in JSON, and vice versa. No hidden data in either format. + +### Error mode is always verbose + +When a command fails with a technical error (not validation), output is always verbose regardless of `-v`. The user needs to see which step failed. + +### CompactOutcome is deleted + +No separate compact enum. Verbose just means "also serialize/render the steps." The command result struct is always the same — compact JSON is just the result struct serialized directly. + +This kills ~1,000 lines: all compact outcome structs, From impls, to_compact() conversion, and duplicate Render impls. + +### Render trait stays + +Every outcome type implements `Render` → `Vec`. In verbose mode, each step's outcome is rendered. In compact mode, only the command outcome is rendered. The Render trait is how each step self-manages its own output. + +### Failed steps include elapsed_ms + +Useful for debugging (tells you if the step timed out or failed instantly): +```json +{ "type": "Scan", "status": "failed", "error": "permission denied", "elapsed_ms": 0 } +``` + +## Proposed JSON structure + +### Compact (default) + +Just the command result — no wrapper: +```json +{ "files_checked": 43, "violations": [], "new_fields": [] } +``` + +### Verbose (-v) + +Steps + result + timing: +```json +{ + "steps": [ + { "type": "ReadConfig", "elapsed_ms": 5, "config_path": "example_kb/mdvs.toml" }, + { "type": "Scan", "elapsed_ms": 4, "files_found": 43, "glob": "**" }, + { "type": "Validate", "elapsed_ms": 87, "files_checked": 43, "violations": [...] } + ], + "result": { "files_checked": 43, "violations": [], "new_fields": [] }, + "elapsed_ms": 97 +} +``` + +### Error (always verbose) + +```json +{ + "steps": [ + { "type": "ReadConfig", "elapsed_ms": 5, "config_path": "example_kb/mdvs.toml" }, + { "type": "Scan", "status": "failed", "error": "permission denied", "elapsed_ms": 0 } + ], + "error": "scan failed", + "elapsed_ms": 12 +} +``` + +## Scope + +1. Delete `CompactOutcome` enum + all compact outcome structs + `to_compact()` + compact Render impls (~1,000 lines) +2. Replace `Step { substeps, outcome }` with flat `CommandResult { steps, result, elapsed_ms }` +3. Replace `StepOutcome` with simpler per-step serialization (type tag + fields) +4. Update custom Serialize: compact JSON = just result struct; verbose JSON = steps + result + elapsed +5. Update `has_failed()` / `has_violations()` for flat structure +6. Update all 7 commands to build `Vec` + command result +7. Update main.rs dispatch (compact vs verbose controls what to serialize/render) +8. Update Render impls (remove compact duplicates, keep full impls) +9. Update all tests + +## Subsumed TODOs + +This TODO likely subsumes: +- **TODO-0132** (compact struct generation macro) — no longer needed if CompactOutcome is deleted +- **TODO-0135** (Skipped padding removal) — flat structure has no padding concept diff --git a/docs/spec/todos/index.md b/docs/spec/todos/index.md index dd652c8..9aae7fc 100644 --- a/docs/spec/todos/index.md +++ b/docs/spec/todos/index.md @@ -138,3 +138,4 @@ | [0134](TODO-0134.md) | Step tree post-migration cleanup | todo | medium | 2026-03-20 | | [0135](TODO-0135.md) | Remove Skipped padding from Step tree error paths | todo | medium | 2026-03-20 | | [0136](TODO-0136.md) | Inline auto-update and auto-build logic to eliminate redundant reads | todo | high | 2026-03-20 | +| [0137](TODO-0137.md) | Flatten Step tree into steps + result structure | todo | high | 2026-03-21 | From 4623b5e11b921709d6c5886e54317b1f0f27b9fd Mon Sep 17 00:00:00 2001 From: edoch Date: Mon, 23 Mar 2026 12:42:04 +0100 Subject: [PATCH 14/35] refactor: flatten Step tree into CommandResult, delete CompactOutcome Replace recursive Step { substeps, outcome: StepOutcome } with flat CommandResult { steps: Vec, result, elapsed_ms }. - Delete CompactOutcome enum + all 20 compact structs + From impls + compact Render impls (~1,300 lines removed) - Delete compact output types from output.rs - New types: CommandResult, StepEntry, ProcessStep, FailedStep - Compact = command outcome only; verbose = steps + command outcome - Compact JSON is bare result struct; verbose JSON has steps array - No double "outcome" nesting in JSON - Errors force verbose output Co-Authored-By: Claude --- src/cmd/build.rs | 247 ++++------- src/cmd/check.rs | 153 +++---- src/cmd/clean.rs | 128 +++--- src/cmd/info.rs | 75 ++-- src/cmd/init.rs | 123 +++--- src/cmd/search.rs | 131 +++--- src/cmd/update.rs | 119 +++--- src/main.rs | 145 ++++--- src/outcome/classify.rs | 30 -- src/outcome/commands/build.rs | 91 +--- src/outcome/commands/check.rs | 97 +---- src/outcome/commands/clean.rs | 58 --- src/outcome/commands/info.rs | 51 --- src/outcome/commands/init.rs | 82 +--- src/outcome/commands/mod.rs | 17 +- src/outcome/commands/search.rs | 77 ---- src/outcome/commands/update.rs | 50 --- src/outcome/config.rs | 50 +-- src/outcome/embed.rs | 47 --- src/outcome/index.rs | 112 +---- src/outcome/infer.rs | 21 - src/outcome/mod.rs | 274 +----------- src/outcome/model.rs | 24 -- src/outcome/scan.rs | 24 -- src/outcome/search.rs | 19 - src/outcome/validate.rs | 35 +- src/output.rs | 99 ----- src/step.rs | 740 ++++++++++++++------------------- 28 files changed, 838 insertions(+), 2281 deletions(-) diff --git a/src/cmd/build.rs b/src/cmd/build.rs index 8c046c5..c5d3c1e 100644 --- a/src/cmd/build.rs +++ b/src/cmd/build.rs @@ -13,7 +13,7 @@ use crate::outcome::{ use crate::output::BuildFileDetail; use crate::schema::config::{BuildConfig, MdvsToml, SearchConfig, TomlField}; use crate::schema::shared::{ChunkingConfig, EmbeddingModelConfig, FieldTypeSerde}; -use crate::step::{ErrorKind, Step, StepError, StepOutcome}; +use crate::step::{CommandResult, ErrorKind, StepEntry, StepError}; use std::collections::{HashMap, HashSet}; use std::path::Path; use std::time::Instant; @@ -186,9 +186,9 @@ pub async fn run( force: bool, no_update: bool, _verbose: bool, -) -> Step { +) -> CommandResult { let start = Instant::now(); - let mut substeps = Vec::new(); + let mut steps = Vec::new(); // 1. Read config — calls MdvsToml::read() + validate() directly let config_start = Instant::now(); @@ -196,7 +196,7 @@ pub async fn run( let config = match MdvsToml::read(&config_path_buf) { Ok(cfg) => match cfg.validate() { Ok(()) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::ReadConfig(ReadConfigOutcome { config_path: config_path_buf.display().to_string(), }), @@ -205,7 +205,7 @@ pub async fn run( Some(cfg) } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::User, format!("mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'"), config_start.elapsed().as_millis() as u64, @@ -214,7 +214,7 @@ pub async fn run( } }, Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::User, e.to_string(), config_start.elapsed().as_millis() as u64, @@ -224,7 +224,7 @@ pub async fn run( }; let mut config = match config { Some(c) => c, - None => return fail_from_last(&mut substeps, start), + None => return fail_from_last(&mut steps, start), }; let should_update = !no_update && config.build.as_ref().is_some_and(|b| b.auto_update); @@ -240,17 +240,8 @@ pub async fn run( ); if let Some(msg) = mutation_error { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: msg, - }), - elapsed_ms: 0, - }, - }); - return fail_from_last(&mut substeps, start); + steps.push(StepEntry::err(ErrorKind::User, msg, 0)); + return fail_from_last(&mut steps, start); } // 3. Core build pipeline (scan → auto-update → validate → classify → embed → write) @@ -260,20 +251,18 @@ pub async fn run( &config_path_buf, force, should_update, - &mut substeps, + &mut steps, ) .await { Ok(result) => result, - Err(()) => return fail_from_last(&mut substeps, start), + Err(()) => return fail_from_last(&mut steps, start), }; - Step { - substeps, - outcome: StepOutcome::Complete { - result: Ok(Outcome::Build(Box::new(build_outcome))), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + CommandResult { + steps, + result: Ok(Outcome::Build(Box::new(build_outcome))), + elapsed_ms: start.elapsed().as_millis() as u64, } } @@ -284,21 +273,21 @@ pub async fn run( /// Core build pipeline: scan → auto-update → validate → classify → embed → write index. /// /// Returns `BuildOutcome` + optional `Embedder` (for reuse by search) on success. -/// On failure, pushes error substeps and returns `Err(())` — the caller constructs -/// the failed command Step from the substeps. +/// On failure, pushes error steps and returns `Err(())` — the caller constructs +/// the failed CommandResult from the steps. pub(crate) async fn build_core( path: &Path, config: &mut MdvsToml, config_path: &Path, force: bool, auto_update: bool, - substeps: &mut Vec>, + steps: &mut Vec, ) -> Result<(BuildOutcome, Option), ()> { // 1. Scan let scan_start = Instant::now(); let scanned = match ScannedFiles::scan(path, &config.scan) { Ok(s) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::Scan(ScanOutcome { files_found: s.files.len(), glob: config.scan.glob.clone(), @@ -308,7 +297,7 @@ pub(crate) async fn build_core( s } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), scan_start.elapsed().as_millis() as u64, @@ -321,7 +310,7 @@ pub(crate) async fn build_core( if auto_update { let infer_start = Instant::now(); let schema = InferredSchema::infer(&scanned); - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::Infer(InferOutcome { fields_inferred: schema.fields.len(), }), @@ -353,7 +342,7 @@ pub(crate) async fn build_core( let write_start = Instant::now(); match config.write(config_path) { Ok(()) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::WriteConfig(WriteConfigOutcome { config_path: config_path.display().to_string(), fields_written: config.fields.field.len(), @@ -366,7 +355,7 @@ pub(crate) async fn build_core( } } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), write_start.elapsed().as_millis() as u64, @@ -379,16 +368,11 @@ pub(crate) async fn build_core( // 3. Validate if scanned.files.is_empty() { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: format!("no markdown files found in '{}'", path.display()), - }), - elapsed_ms: 0, - }, - }); + steps.push(StepEntry::err( + ErrorKind::User, + format!("no markdown files found in '{}'", path.display()), + 0, + )); return Err(()); } @@ -396,7 +380,7 @@ pub(crate) async fn build_core( let check_result = match crate::cmd::check::validate(&scanned, config, false) { Ok(r) => r, Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), validate_start.elapsed().as_millis() as u64, @@ -404,7 +388,7 @@ pub(crate) async fn build_core( return Err(()); } }; - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::Validate(ValidateOutcome { files_checked: check_result.files_checked, violations: check_result.field_violations.clone(), @@ -416,19 +400,14 @@ pub(crate) async fn build_core( let new_fields = check_result.new_fields; if !violations.is_empty() { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: format!( - "{} violation(s) found. Run `mdvs check` for details.", - violations.len() - ), - }), - elapsed_ms: 0, - }, - }); + steps.push(StepEntry::err( + ErrorKind::User, + format!( + "{} violation(s) found. Run `mdvs check` for details.", + violations.len() + ), + 0, + )); return Err(()); } @@ -446,16 +425,7 @@ pub(crate) async fn build_core( { Ok(sf) => sf, Err(msg) => { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: msg, - }), - elapsed_ms: 0, - }, - }); + steps.push(StepEntry::err(ErrorKind::Application, msg, 0)); return Err(()); } }; @@ -463,32 +433,22 @@ pub(crate) async fn build_core( let embedding = match config.embedding_model.as_ref() { Some(e) => e, None => { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: "missing [embedding_model] in mdvs.toml".into(), - }), - elapsed_ms: 0, - }, - }); + steps.push(StepEntry::err( + ErrorKind::User, + "missing [embedding_model] in mdvs.toml".into(), + 0, + )); return Err(()); } }; let chunking = match config.chunking.as_ref() { Some(c) => c, None => { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: "missing [chunking] in mdvs.toml".into(), - }), - elapsed_ms: 0, - }, - }); + steps.push(StepEntry::err( + ErrorKind::User, + "missing [chunking] in mdvs.toml".into(), + 0, + )); return Err(()); } }; @@ -499,16 +459,7 @@ pub(crate) async fn build_core( let full_rebuild = force || !backend.exists(); if let Some(msg) = config_change_error { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: msg, - }), - elapsed_ms: 0, - }, - }); + steps.push(StepEntry::err(ErrorKind::User, msg, 0)); return Err(()); } @@ -518,16 +469,7 @@ pub(crate) async fn build_core( match backend.read_file_index() { Ok(idx) => idx, Err(e) => { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: e.to_string(), - }), - elapsed_ms: 0, - }, - }); + steps.push(StepEntry::err(ErrorKind::Application, e.to_string(), 0)); return Err(()); } } @@ -538,16 +480,7 @@ pub(crate) async fn build_core( match backend.read_chunk_rows() { Ok(crs) => crs, Err(e) => { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: e.to_string(), - }), - elapsed_ms: 0, - }, - }); + steps.push(StepEntry::err(ErrorKind::Application, e.to_string(), 0)); return Err(()); } } @@ -570,7 +503,7 @@ pub(crate) async fn build_core( }) .collect(); let count = needs_embedding.len(); - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::Classify(ClassifyOutcome { full_rebuild: true, needs_embedding: count, @@ -622,7 +555,7 @@ pub(crate) async fn build_core( let unchanged_count = classification.unchanged_file_ids.len(); let removed_count = classification.removed_count; - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::Classify(ClassifyOutcome { full_rebuild: false, needs_embedding: needs_count, @@ -650,7 +583,7 @@ pub(crate) async fn build_core( match ModelConfig::try_from(embedding) { Ok(mc) => match Embedder::load(&mc) { Ok(emb) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::LoadModel(LoadModelOutcome { model_name: embedding.name.clone(), dimension: emb.dimension(), @@ -660,7 +593,7 @@ pub(crate) async fn build_core( Some(emb) } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), model_start.elapsed().as_millis() as u64, @@ -669,7 +602,7 @@ pub(crate) async fn build_core( } }, Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), model_start.elapsed().as_millis() as u64, @@ -678,7 +611,7 @@ pub(crate) async fn build_core( } } } else { - substeps.push(Step::skipped()); + steps.push(StepEntry::skipped()); None }; @@ -705,16 +638,11 @@ pub(crate) async fn build_core( }; if needs_embedding && embedder.is_none() { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: "model loading failed".into(), - }), - elapsed_ms: 0, - }, - }); + steps.push(StepEntry::err( + ErrorKind::Application, + "model loading failed".into(), + 0, + )); return Err(()); } @@ -723,7 +651,7 @@ pub(crate) async fn build_core( let built_at = chrono::Utc::now().timestamp_micros(); if let Some(msg) = dim_error { - substeps.push(Step::failed(ErrorKind::User, msg, 0)); + steps.push(StepEntry::err(ErrorKind::User, msg, 0)); return Err(()); } @@ -740,7 +668,7 @@ pub(crate) async fn build_core( }); embed_chunk_rows.extend(crs); } - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::EmbedFiles(EmbedFilesOutcome { files_embedded: classify_data.needs_embedding.len(), chunks_produced: embed_chunk_rows.len(), @@ -752,7 +680,7 @@ pub(crate) async fn build_core( details, }) } else { - substeps.push(Step::skipped()); + steps.push(StepEntry::skipped()); None }; @@ -790,7 +718,7 @@ pub(crate) async fn build_core( let write_start = Instant::now(); match backend.write_index(&schema_fields, &file_rows, &chunk_rows, build_meta) { Ok(()) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::WriteIndex(WriteIndexOutcome { files_written: file_rows.len(), chunks_written: chunk_rows.len(), @@ -799,7 +727,7 @@ pub(crate) async fn build_core( )); } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), write_start.elapsed().as_millis() as u64, @@ -831,24 +759,22 @@ pub(crate) async fn build_core( Ok((outcome, embedder)) } -/// Extract error from last failed substep and return a failed command Step. -fn fail_from_last(substeps: &mut Vec>, start: Instant) -> Step { - let msg = match substeps.iter().rev().find_map(|s| match &s.outcome { - StepOutcome::Complete { result: Err(e), .. } => Some(e.message.clone()), +/// Extract error from last failed step and return a failed CommandResult. +fn fail_from_last(steps: &mut Vec, start: Instant) -> CommandResult { + let msg = match steps.iter().rev().find_map(|s| match s { + StepEntry::Failed(f) => Some(f.message.clone()), _ => None, }) { Some(m) => m, None => "step failed".into(), }; - Step { - substeps: std::mem::take(substeps), - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + CommandResult { + steps: std::mem::take(steps), + result: Err(StepError { + kind: ErrorKind::Application, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, } } @@ -1007,19 +933,16 @@ mod tests { use std::collections::{HashMap, HashSet}; use std::fs; - fn unwrap_build(step: &Step) -> &BuildOutcome { - match &step.outcome { - StepOutcome::Complete { - result: Ok(Outcome::Build(o)), - .. - } => o, + fn unwrap_build(result: &CommandResult) -> &BuildOutcome { + match &result.result { + Ok(Outcome::Build(o)) => o, other => panic!("expected Ok(Build), got: {other:?}"), } } - fn unwrap_error(step: &Step) -> &StepError { - match &step.outcome { - StepOutcome::Complete { result: Err(e), .. } => e, + fn unwrap_error(result: &CommandResult) -> &StepError { + match &result.result { + Err(e) => e, other => panic!("expected Err, got: {other:?}"), } } diff --git a/src/cmd/check.rs b/src/cmd/check.rs index 4da2e12..98984df 100644 --- a/src/cmd/check.rs +++ b/src/cmd/check.rs @@ -8,7 +8,7 @@ use crate::outcome::{ use crate::output::{FieldViolation, NewField, ViolatingFile, ViolationKind}; use crate::schema::config::{MdvsToml, TomlField}; use crate::schema::shared::FieldTypeSerde; -use crate::step::{ErrorKind, Step, StepError, StepOutcome}; +use crate::step::{CommandResult, ErrorKind, StepEntry, StepError}; use globset::Glob; use serde::Serialize; use serde_json::Value; @@ -46,9 +46,9 @@ impl CheckResult { /// Read config, optionally auto-update, scan files, and validate frontmatter. #[instrument(name = "check", skip_all)] -pub fn run(path: &Path, no_update: bool, verbose: bool) -> Step { +pub fn run(path: &Path, no_update: bool, verbose: bool) -> CommandResult { let start = Instant::now(); - let mut substeps = Vec::new(); + let mut steps = Vec::new(); // 1. Read config — calls MdvsToml::read() + validate() directly let config_start = std::time::Instant::now(); @@ -56,7 +56,7 @@ pub fn run(path: &Path, no_update: bool, verbose: bool) -> Step { let config = match MdvsToml::read(&config_path_buf) { Ok(cfg) => match cfg.validate() { Ok(()) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::ReadConfig(ReadConfigOutcome { config_path: config_path_buf.display().to_string(), }), @@ -65,7 +65,7 @@ pub fn run(path: &Path, no_update: bool, verbose: bool) -> Step { Some(cfg) } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::User, format!("mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'"), config_start.elapsed().as_millis() as u64, @@ -74,7 +74,7 @@ pub fn run(path: &Path, no_update: bool, verbose: bool) -> Step { } }, Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::User, e.to_string(), config_start.elapsed().as_millis() as u64, @@ -86,19 +86,17 @@ pub fn run(path: &Path, no_update: bool, verbose: bool) -> Step { let config = match config { Some(c) => c, None => { - let msg = match &substeps[0].outcome { - StepOutcome::Complete { result: Err(e), .. } => e.message.clone(), + let msg = match &steps[0] { + StepEntry::Failed(f) => f.message.clone(), _ => "failed to read config".into(), }; - return Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + return CommandResult { + steps, + result: Err(StepError { + kind: ErrorKind::User, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, }; } }; @@ -107,7 +105,7 @@ pub fn run(path: &Path, no_update: bool, verbose: bool) -> Step { let scan_start = Instant::now(); let scanned = match ScannedFiles::scan(path, &config.scan) { Ok(s) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::Scan(ScanOutcome { files_found: s.files.len(), glob: config.scan.glob.clone(), @@ -117,21 +115,19 @@ pub fn run(path: &Path, no_update: bool, verbose: bool) -> Step { s } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), scan_start.elapsed().as_millis() as u64, )); let msg = e.to_string(); - return Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + return CommandResult { + steps, + result: Err(StepError { + kind: ErrorKind::Application, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, }; } }; @@ -141,7 +137,7 @@ pub fn run(path: &Path, no_update: bool, verbose: bool) -> Step { let config = if should_update { let infer_start = Instant::now(); let schema = InferredSchema::infer(&scanned); - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::Infer(InferOutcome { fields_inferred: schema.fields.len(), }), @@ -177,7 +173,7 @@ pub fn run(path: &Path, no_update: bool, verbose: bool) -> Step { let write_start = Instant::now(); match config.write(&config_path_buf) { Ok(()) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::WriteConfig(WriteConfigOutcome { config_path: config_path_buf.display().to_string(), fields_written: config.fields.field.len(), @@ -191,20 +187,18 @@ pub fn run(path: &Path, no_update: bool, verbose: bool) -> Step { } } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), write_start.elapsed().as_millis() as u64, )); - return Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: "auto-update failed to write config".into(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + return CommandResult { + steps, + result: Err(StepError { + kind: ErrorKind::Application, + message: "auto-update failed to write config".into(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, }; } } @@ -218,53 +212,41 @@ pub fn run(path: &Path, no_update: bool, verbose: bool) -> Step { let check_result = match validate(&scanned, &config, verbose) { Ok(r) => r, Err(e) => { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: e.to_string(), - }), - elapsed_ms: validate_start.elapsed().as_millis() as u64, - }, - }); - return Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: "validation failed".into(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + steps.push(StepEntry::err( + ErrorKind::Application, + e.to_string(), + validate_start.elapsed().as_millis() as u64, + )); + return CommandResult { + steps, + result: Err(StepError { + kind: ErrorKind::Application, + message: "validation failed".into(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, }; } }; - // Push validate substep - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Ok(Outcome::Validate(ValidateOutcome { - files_checked: check_result.files_checked, - violations: check_result.field_violations.clone(), - new_fields: check_result.new_fields.clone(), - })), - elapsed_ms: validate_start.elapsed().as_millis() as u64, - }, - }); + // Push validate step + steps.push(StepEntry::ok( + Outcome::Validate(ValidateOutcome { + files_checked: check_result.files_checked, + violations: check_result.field_violations.clone(), + new_fields: check_result.new_fields.clone(), + }), + validate_start.elapsed().as_millis() as u64, + )); // Build command outcome - Step { - substeps, - outcome: StepOutcome::Complete { - result: Ok(Outcome::Check(Box::new(CheckOutcome { - files_checked: check_result.files_checked, - violations: check_result.field_violations, - new_fields: check_result.new_fields, - }))), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + CommandResult { + steps, + result: Ok(Outcome::Check(Box::new(CheckOutcome { + files_checked: check_result.files_checked, + violations: check_result.field_violations, + new_fields: check_result.new_fields, + }))), + elapsed_ms: start.elapsed().as_millis() as u64, } } @@ -508,12 +490,9 @@ mod tests { use crate::schema::shared::ScanConfig; use std::fs; - fn unwrap_check(step: &Step) -> &CheckOutcome { - match &step.outcome { - StepOutcome::Complete { - result: Ok(Outcome::Check(o)), - .. - } => o, + fn unwrap_check(result: &CommandResult) -> &CheckOutcome { + match &result.result { + Ok(Outcome::Check(o)) => o, other => panic!("expected Ok(Check), got: {other:?}"), } } diff --git a/src/cmd/clean.rs b/src/cmd/clean.rs index a2cfa00..aa79149 100644 --- a/src/cmd/clean.rs +++ b/src/cmd/clean.rs @@ -1,7 +1,7 @@ use crate::index::backend::Backend; use crate::outcome::commands::CleanOutcome; use crate::outcome::{DeleteIndexOutcome, Outcome}; -use crate::step::{ErrorKind, Step, StepError, StepOutcome}; +use crate::step::{CommandResult, ErrorKind, StepEntry, StepError}; use std::path::{Path, PathBuf}; use std::time::Instant; use tracing::instrument; @@ -27,36 +27,31 @@ fn walk_dir_stats(dir: &Path) -> anyhow::Result<(usize, u64)> { /// Delete the `.mdvs/` index directory if it exists. #[instrument(name = "clean", skip_all)] -pub fn run(path: &Path) -> Step { +pub fn run(path: &Path) -> CommandResult { let start = Instant::now(); - let mut substeps = Vec::new(); + let mut steps = Vec::new(); // Delete index step — inlined from pipeline/delete_index.rs let delete_start = Instant::now(); let mdvs_dir = path.join(".mdvs"); if mdvs_dir.is_symlink() { - substeps.push(Step::failed( - ErrorKind::User, - format!( - "'{}' is a symlink — refusing to delete for safety", - mdvs_dir.display() - ), - delete_start.elapsed().as_millis() as u64, - )); let msg = format!( "'{}' is a symlink — refusing to delete for safety", mdvs_dir.display() ); - return Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + steps.push(StepEntry::err( + ErrorKind::User, + msg.clone(), + delete_start.elapsed().as_millis() as u64, + )); + return CommandResult { + steps, + result: Err(StepError { + kind: ErrorKind::User, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, }; } @@ -64,40 +59,36 @@ pub fn run(path: &Path) -> Step { let (files_removed, size_bytes) = match walk_dir_stats(&mdvs_dir) { Ok(stats) => stats, Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), delete_start.elapsed().as_millis() as u64, )); - return Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: e.to_string(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + return CommandResult { + steps, + result: Err(StepError { + kind: ErrorKind::Application, + message: e.to_string(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, }; } }; let backend = Backend::parquet(path); if let Err(e) = backend.clean() { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), delete_start.elapsed().as_millis() as u64, )); - return Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: e.to_string(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + return CommandResult { + steps, + result: Err(StepError { + kind: ErrorKind::Application, + message: e.to_string(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, }; } @@ -111,7 +102,7 @@ pub fn run(path: &Path) -> Step { (false, mdvs_dir.display().to_string(), 0, 0) }; - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::DeleteIndex(DeleteIndexOutcome { removed, path: path_str.clone(), @@ -121,17 +112,15 @@ pub fn run(path: &Path) -> Step { delete_start.elapsed().as_millis() as u64, )); - Step { - substeps, - outcome: StepOutcome::Complete { - result: Ok(Outcome::Clean(CleanOutcome { - removed, - path: PathBuf::from(&path_str), - files_removed, - size_bytes, - })), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + CommandResult { + steps, + result: Ok(Outcome::Clean(CleanOutcome { + removed, + path: PathBuf::from(&path_str), + files_removed, + size_bytes, + })), + elapsed_ms: start.elapsed().as_millis() as u64, } } @@ -139,15 +128,12 @@ pub fn run(path: &Path) -> Step { mod tests { use super::*; use crate::outcome::Outcome; - use crate::step::StepOutcome; + use crate::step::CommandResult; use std::fs; - fn unwrap_clean(step: &Step) -> &CleanOutcome { - match &step.outcome { - StepOutcome::Complete { - result: Ok(Outcome::Clean(o)), - .. - } => o, + fn unwrap_clean(result: &CommandResult) -> &CleanOutcome { + match &result.result { + Ok(Outcome::Clean(o)) => o, other => panic!("expected Ok(Clean), got: {other:?}"), } } @@ -160,14 +146,14 @@ mod tests { fs::create_dir_all(&mdvs_dir).unwrap(); fs::write(mdvs_dir.join("files.parquet"), "dummy").unwrap(); - let step = run(tmp.path()); - assert!(!crate::step::has_failed(&step)); + let result = run(tmp.path()); + assert!(!crate::step::has_failed(&result)); - let result = unwrap_clean(&step); - assert!(result.removed); + let outcome = unwrap_clean(&result); + assert!(outcome.removed); assert!(!mdvs_dir.exists()); - assert_eq!(result.files_removed, 1); - assert_eq!(result.size_bytes, 5); + assert_eq!(outcome.files_removed, 1); + assert_eq!(outcome.size_bytes, 5); assert!(tmp.path().join("mdvs.toml").exists()); } @@ -175,12 +161,12 @@ mod tests { fn clean_nothing_to_clean() { let tmp = tempfile::tempdir().unwrap(); - let step = run(tmp.path()); - assert!(!crate::step::has_failed(&step)); + let result = run(tmp.path()); + assert!(!crate::step::has_failed(&result)); - let result = unwrap_clean(&step); - assert!(!result.removed); - assert_eq!(result.files_removed, 0); - assert_eq!(result.size_bytes, 0); + let outcome = unwrap_clean(&result); + assert!(!outcome.removed); + assert_eq!(outcome.files_removed, 0); + assert_eq!(outcome.size_bytes, 0); } } diff --git a/src/cmd/info.rs b/src/cmd/info.rs index e34ba93..f07a26f 100644 --- a/src/cmd/info.rs +++ b/src/cmd/info.rs @@ -4,7 +4,7 @@ use crate::outcome::commands::InfoOutcome; use crate::outcome::{Outcome, ReadConfigOutcome, ReadIndexOutcome, ScanOutcome}; use crate::output::{field_hints, FieldHint}; use crate::schema::config::MdvsToml; -use crate::step::{ErrorKind, Step, StepError, StepOutcome}; +use crate::step::{CommandResult, ErrorKind, StepEntry, StepError}; use serde::Serialize; use serde_json::Value; use std::collections::HashMap; @@ -60,9 +60,9 @@ pub struct IndexInfo { /// Read config, scan files, and read index metadata. #[instrument(name = "info", skip_all)] -pub fn run(path: &Path, _verbose: bool) -> Step { +pub fn run(path: &Path, _verbose: bool) -> CommandResult { let start = Instant::now(); - let mut substeps = Vec::new(); + let mut steps = Vec::new(); // 1. Read config — calls MdvsToml::read() + validate() directly let config_start = Instant::now(); @@ -70,7 +70,7 @@ pub fn run(path: &Path, _verbose: bool) -> Step { let config = match MdvsToml::read(&config_path_buf) { Ok(cfg) => match cfg.validate() { Ok(()) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::ReadConfig(ReadConfigOutcome { config_path: config_path_buf.display().to_string(), }), @@ -79,7 +79,7 @@ pub fn run(path: &Path, _verbose: bool) -> Step { Some(cfg) } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::User, format!("mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'"), config_start.elapsed().as_millis() as u64, @@ -88,7 +88,7 @@ pub fn run(path: &Path, _verbose: bool) -> Step { } }, Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::User, e.to_string(), config_start.elapsed().as_millis() as u64, @@ -100,19 +100,17 @@ pub fn run(path: &Path, _verbose: bool) -> Step { let config = match config { Some(c) => c, None => { - let msg = match &substeps[0].outcome { - StepOutcome::Complete { result: Err(e), .. } => e.message.clone(), + let msg = match &steps[0] { + StepEntry::Failed(f) => f.message.clone(), _ => "failed to read config".into(), }; - return Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + return CommandResult { + steps, + result: Err(StepError { + kind: ErrorKind::User, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, }; } }; @@ -121,7 +119,7 @@ pub fn run(path: &Path, _verbose: bool) -> Step { let scan_start = Instant::now(); let scanned = match ScannedFiles::scan(path, &config.scan) { Ok(s) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::Scan(ScanOutcome { files_found: s.files.len(), glob: config.scan.glob.clone(), @@ -131,7 +129,7 @@ pub fn run(path: &Path, _verbose: bool) -> Step { Some(s) } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), scan_start.elapsed().as_millis() as u64, @@ -144,7 +142,7 @@ pub fn run(path: &Path, _verbose: bool) -> Step { let index_start = Instant::now(); let backend = Backend::parquet(path); let index_data = if !backend.exists() { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::ReadIndex(ReadIndexOutcome { exists: false, files_indexed: 0, @@ -158,7 +156,7 @@ pub fn run(path: &Path, _verbose: bool) -> Step { let idx_stats = backend.stats().ok().flatten(); match (build_meta, idx_stats) { (Some(metadata), Some(stats)) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::ReadIndex(ReadIndexOutcome { exists: true, files_indexed: stats.files_indexed, @@ -169,7 +167,7 @@ pub fn run(path: &Path, _verbose: bool) -> Step { Some((metadata, stats)) } _ => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::ReadIndex(ReadIndexOutcome { exists: false, files_indexed: 0, @@ -234,18 +232,16 @@ pub fn run(path: &Path, _verbose: bool) -> Step { } }); - Step { - substeps, - outcome: StepOutcome::Complete { - result: Ok(Outcome::Info(Box::new(InfoOutcome { - scan_glob: config.scan.glob.clone(), - files_on_disk: total_files, - fields, - ignored_fields: config.fields.ignore.clone(), - index, - }))), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + CommandResult { + steps, + result: Ok(Outcome::Info(Box::new(InfoOutcome { + scan_glob: config.scan.glob.clone(), + files_on_disk: total_files, + fields, + ignored_fields: config.fields.ignore.clone(), + index, + }))), + elapsed_ms: start.elapsed().as_millis() as u64, } } @@ -255,15 +251,12 @@ mod tests { use crate::outcome::Outcome; use crate::schema::config::{FieldsConfig, MdvsToml, SearchConfig, UpdateConfig}; use crate::schema::shared::{ChunkingConfig, EmbeddingModelConfig, FieldTypeSerde, ScanConfig}; - use crate::step::StepOutcome; + use crate::step::CommandResult; use std::fs; - fn unwrap_info(step: &Step) -> &InfoOutcome { - match &step.outcome { - StepOutcome::Complete { - result: Ok(Outcome::Info(o)), - .. - } => o, + fn unwrap_info(result: &CommandResult) -> &InfoOutcome { + match &result.result { + Ok(Outcome::Info(o)) => o, other => panic!("expected Ok(Info), got: {other:?}"), } } diff --git a/src/cmd/init.rs b/src/cmd/init.rs index d7ff6f6..cbdad96 100644 --- a/src/cmd/init.rs +++ b/src/cmd/init.rs @@ -5,7 +5,7 @@ use crate::outcome::{InferOutcome, Outcome, ScanOutcome, WriteConfigOutcome}; use crate::output::DiscoveredField; use crate::schema::config::MdvsToml; use crate::schema::shared::ScanConfig; -use crate::step::{ErrorKind, Step, StepError, StepOutcome}; +use crate::step::{CommandResult, ErrorKind, StepEntry, StepError}; use std::path::Path; use std::time::Instant; use tracing::{info, instrument}; @@ -21,16 +21,16 @@ pub fn run( ignore_bare_files: bool, skip_gitignore: bool, _verbose: bool, -) -> Step { +) -> CommandResult { let start = Instant::now(); - let mut substeps = Vec::new(); + let mut steps = Vec::new(); info!(path = %path.display(), "initializing"); // Pre-checks if !path.is_dir() { return fail_early( - substeps, + steps, start, ErrorKind::User, format!("'{}' is not a directory", path.display()), @@ -41,7 +41,7 @@ pub fn run( let mdvs_dir = path.join(".mdvs"); if !force && (config_path.exists() || mdvs_dir.exists()) { return fail_early( - substeps, + steps, start, ErrorKind::User, format!( @@ -69,7 +69,7 @@ pub fn run( let scan_start = Instant::now(); let scanned = match ScannedFiles::scan(path, &scan_config) { Ok(s) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::Scan(ScanOutcome { files_found: s.files.len(), glob: scan_config.glob.clone(), @@ -79,43 +79,33 @@ pub fn run( s } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), scan_start.elapsed().as_millis() as u64, )); - return fail_from_last_substep(&mut substeps, start); + return fail_from_last_substep(&mut steps, start); } }; // 2. Infer if scanned.files.is_empty() { - substeps.push(Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: format!("no markdown files found in '{}'", path.display()), - }), - elapsed_ms: 0, - }, - }); - return Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: format!("no markdown files found in '{}'", path.display()), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + let msg = format!("no markdown files found in '{}'", path.display()); + steps.push(StepEntry::err(ErrorKind::User, msg.clone(), 0)); + return CommandResult { + steps, + result: Err(StepError { + kind: ErrorKind::User, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, }; } // 2b. Infer — InferredSchema::infer() is infallible let infer_start = Instant::now(); let schema = InferredSchema::infer(&scanned); - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::Infer(InferOutcome { fields_inferred: schema.fields.len(), }), @@ -134,13 +124,13 @@ pub fn run( // 3. Write config — MdvsToml::from_inferred() + write() directly if dry_run { - substeps.push(Step::skipped()); + steps.push(StepEntry::skipped()); } else { let write_start = Instant::now(); let toml_doc = MdvsToml::from_inferred(&schema, scan_config); match toml_doc.write(&config_path) { Ok(()) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::WriteConfig(WriteConfigOutcome { config_path: config_path.display().to_string(), fields_written: schema.fields.len(), @@ -149,7 +139,7 @@ pub fn run( )); } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), write_start.elapsed().as_millis() as u64, @@ -158,51 +148,45 @@ pub fn run( } } - Step { - substeps, - outcome: StepOutcome::Complete { - result: Ok(Outcome::Init(Box::new(InitOutcome { - path: path.to_path_buf(), - files_scanned: total_files, - fields, - dry_run, - }))), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + CommandResult { + steps, + result: Ok(Outcome::Init(Box::new(InitOutcome { + path: path.to_path_buf(), + files_scanned: total_files, + fields, + dry_run, + }))), + elapsed_ms: start.elapsed().as_millis() as u64, } } -/// Helper: return a failed Step with the given error. +/// Helper: return a failed CommandResult with the given error. fn fail_early( - substeps: Vec>, + steps: Vec, start: Instant, kind: ErrorKind, message: String, -) -> Step { - Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { kind, message }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, +) -> CommandResult { + CommandResult { + steps, + result: Err(StepError { kind, message }), + elapsed_ms: start.elapsed().as_millis() as u64, } } -/// Helper: extract error from last substep and return a failed Step. -fn fail_from_last_substep(substeps: &mut Vec>, start: Instant) -> Step { - let msg = match substeps.last().map(|s| &s.outcome) { - Some(StepOutcome::Complete { result: Err(e), .. }) => e.message.clone(), +/// Helper: extract error from last step and return a failed CommandResult. +fn fail_from_last_substep(steps: &mut Vec, start: Instant) -> CommandResult { + let msg = match steps.last() { + Some(StepEntry::Failed(f)) => f.message.clone(), _ => "step failed".into(), }; - Step { - substeps: std::mem::take(substeps), - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + CommandResult { + steps: std::mem::take(steps), + result: Err(StepError { + kind: ErrorKind::Application, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, } } @@ -211,15 +195,12 @@ mod tests { use super::*; use crate::outcome::Outcome; use crate::output::FieldHint; - use crate::step::StepOutcome; + use crate::step::CommandResult; use std::fs; - fn unwrap_init(step: &Step) -> &InitOutcome { - match &step.outcome { - StepOutcome::Complete { - result: Ok(Outcome::Init(o)), - .. - } => o, + fn unwrap_init(result: &CommandResult) -> &InitOutcome { + match &result.result { + Ok(Outcome::Init(o)) => o, other => panic!("expected Ok(Init), got: {other:?}"), } } diff --git a/src/cmd/search.rs b/src/cmd/search.rs index 62bf35e..640e831 100644 --- a/src/cmd/search.rs +++ b/src/cmd/search.rs @@ -7,7 +7,7 @@ use crate::outcome::{ ReadIndexOutcome, }; use crate::schema::config::MdvsToml; -use crate::step::{ErrorKind, Step, StepError, StepOutcome}; +use crate::step::{CommandResult, ErrorKind, StepEntry, StepError}; use std::path::Path; use std::time::Instant; use tracing::{instrument, warn}; @@ -55,9 +55,9 @@ pub async fn run( no_update: bool, no_build: bool, _verbose: bool, -) -> Step { +) -> CommandResult { let start = Instant::now(); - let mut substeps = Vec::new(); + let mut steps = Vec::new(); // 1. Read config — calls MdvsToml::read() + validate() directly let config_start = Instant::now(); @@ -65,7 +65,7 @@ pub async fn run( let mut config = match MdvsToml::read(&config_path_buf) { Ok(cfg) => match cfg.validate() { Ok(()) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::ReadConfig(ReadConfigOutcome { config_path: config_path_buf.display().to_string(), }), @@ -74,7 +74,7 @@ pub async fn run( Some(cfg) } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::User, format!("mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'"), config_start.elapsed().as_millis() as u64, @@ -83,7 +83,7 @@ pub async fn run( } }, Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::User, e.to_string(), config_start.elapsed().as_millis() as u64, @@ -109,7 +109,7 @@ pub async fn run( &config_path_buf, false, auto_update, - &mut substeps, + &mut steps, ) .await { @@ -117,7 +117,7 @@ pub async fn run( build_embedder = embedder; } Err(()) => { - return fail_msg(&mut substeps, start, ErrorKind::User, "auto-build failed"); + return fail_msg(&mut steps, start, ErrorKind::User, "auto-build failed"); } } } @@ -131,7 +131,7 @@ pub async fn run( let index_start = Instant::now(); let backend = Backend::parquet(path); if !backend.exists() { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::ReadIndex(ReadIndexOutcome { exists: false, files_indexed: 0, @@ -145,7 +145,7 @@ pub async fn run( let idx_stats = backend.stats().ok().flatten(); match (build_meta, idx_stats) { (Some(metadata), Some(stats)) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::ReadIndex(ReadIndexOutcome { exists: true, files_indexed: stats.files_indexed, @@ -156,7 +156,7 @@ pub async fn run( Some(IndexData { metadata }) } _ => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::ReadIndex(ReadIndexOutcome { exists: false, files_indexed: 0, @@ -170,7 +170,7 @@ pub async fn run( } } None => { - return fail_from_last(&mut substeps, start); + return fail_from_last(&mut steps, start); } }; @@ -196,14 +196,14 @@ pub async fn run( // 3. Load model — calls ModelConfig::try_from() + Embedder::load() directly if let Some(msg) = pre_check_error { - substeps.push(Step::failed(ErrorKind::User, msg, 0)); - return fail_from_last(&mut substeps, start); + steps.push(StepEntry::err(ErrorKind::User, msg, 0)); + return fail_from_last(&mut steps, start); } // 3. Load model (reuse from build if available) let emb_config = embedding.unwrap(); let embedder = if let Some(emb) = build_embedder { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::LoadModel(LoadModelOutcome { model_name: emb_config.name.clone(), dimension: emb.dimension(), @@ -216,7 +216,7 @@ pub async fn run( match ModelConfig::try_from(emb_config) { Ok(mc) => match Embedder::load(&mc) { Ok(emb) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::LoadModel(LoadModelOutcome { model_name: emb_config.name.clone(), dimension: emb.dimension(), @@ -226,21 +226,21 @@ pub async fn run( emb } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), model_start.elapsed().as_millis() as u64, )); - return fail_from_last(&mut substeps, start); + return fail_from_last(&mut steps, start); } }, Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), model_start.elapsed().as_millis() as u64, )); - return fail_from_last(&mut substeps, start); + return fail_from_last(&mut steps, start); } } }; @@ -248,7 +248,7 @@ pub async fn run( // 4. Embed query — calls embedder.embed() directly (infallible) let embed_start = Instant::now(); let query_embedding = embedder.embed(query).await; - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::EmbedQuery(EmbedQueryOutcome { query: query.to_string(), }), @@ -265,8 +265,8 @@ pub async fn run( if let Some(w) = where_clause { if let Err(msg) = validate_where_clause(w) { - substeps.push(Step::failed(ErrorKind::User, msg, 0)); - return fail_from_last(&mut substeps, start); + steps.push(StepEntry::err(ErrorKind::User, msg, 0)); + return fail_from_last(&mut steps, start); } } @@ -276,19 +276,19 @@ pub async fn run( .await { Ok(hits) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::ExecuteSearch(ExecuteSearchOutcome { hits: hits.len() }), search_start.elapsed().as_millis() as u64, )); hits } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), search_start.elapsed().as_millis() as u64, )); - return fail_from_last(&mut substeps, start); + return fail_from_last(&mut steps, start); } }; @@ -307,55 +307,49 @@ pub async fn run( } let model_name = emb_config.name.clone(); - Step { - substeps, - outcome: StepOutcome::Complete { - result: Ok(Outcome::Search(Box::new(SearchOutcome { - query: query.to_string(), - hits, - model_name, - limit, - }))), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + CommandResult { + steps, + result: Ok(Outcome::Search(Box::new(SearchOutcome { + query: query.to_string(), + hits, + model_name, + limit, + }))), + elapsed_ms: start.elapsed().as_millis() as u64, } } -fn fail_from_last(substeps: &mut Vec>, start: Instant) -> Step { - let msg = match substeps.iter().rev().find_map(|s| match &s.outcome { - StepOutcome::Complete { result: Err(e), .. } => Some(e.message.clone()), +fn fail_from_last(steps: &mut Vec, start: Instant) -> CommandResult { + let msg = match steps.iter().rev().find_map(|s| match s { + StepEntry::Failed(f) => Some(f.message.clone()), _ => None, }) { Some(m) => m, None => "step failed".into(), }; - Step { - substeps: std::mem::take(substeps), - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + CommandResult { + steps: std::mem::take(steps), + result: Err(StepError { + kind: ErrorKind::Application, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, } } fn fail_msg( - substeps: &mut Vec>, + steps: &mut Vec, start: Instant, kind: ErrorKind, msg: &str, -) -> Step { - Step { - substeps: std::mem::take(substeps), - outcome: StepOutcome::Complete { - result: Err(StepError { - kind, - message: msg.into(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, +) -> CommandResult { + CommandResult { + steps: std::mem::take(steps), + result: Err(StepError { + kind, + message: msg.into(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, } } @@ -368,19 +362,16 @@ mod tests { use crate::schema::shared::{ChunkingConfig, EmbeddingModelConfig, ScanConfig}; use std::fs; - fn unwrap_search(step: &Step) -> &SearchOutcome { - match &step.outcome { - StepOutcome::Complete { - result: Ok(Outcome::Search(o)), - .. - } => o, + fn unwrap_search(result: &CommandResult) -> &SearchOutcome { + match &result.result { + Ok(Outcome::Search(o)) => o, other => panic!("expected Ok(Search), got: {other:?}"), } } - fn unwrap_error(step: &Step) -> &StepError { - match &step.outcome { - StepOutcome::Complete { result: Err(e), .. } => e, + fn unwrap_error(result: &CommandResult) -> &StepError { + match &result.result { + Err(e) => e, other => panic!("expected Err, got: {other:?}"), } } @@ -633,7 +624,7 @@ mod tests { ) .await; // Should not fail with quote parity error - if let StepOutcome::Complete { result: Err(e), .. } = &output.outcome { + if let Err(e) = &output.result { assert!( !e.message.contains("unmatched"), "balanced quotes should not trigger parity check" diff --git a/src/cmd/update.rs b/src/cmd/update.rs index dad2518..709374d 100644 --- a/src/cmd/update.rs +++ b/src/cmd/update.rs @@ -5,7 +5,7 @@ use crate::outcome::{InferOutcome, Outcome, ReadConfigOutcome, ScanOutcome, Writ use crate::output::{ChangedField, FieldChange, RemovedField}; use crate::schema::config::{MdvsToml, TomlField}; use crate::schema::shared::FieldTypeSerde; -use crate::step::{ErrorKind, Step, StepError, StepOutcome}; +use crate::step::{CommandResult, ErrorKind, StepEntry, StepError}; use std::collections::HashMap; use std::path::Path; use std::time::Instant; @@ -20,14 +20,14 @@ pub async fn run( reinfer_all: bool, dry_run: bool, _verbose: bool, -) -> Step { +) -> CommandResult { let start = Instant::now(); - let mut substeps = Vec::new(); + let mut steps = Vec::new(); // Pre-check: flag conflict if !reinfer.is_empty() && reinfer_all { return fail_early( - substeps, + steps, start, ErrorKind::User, "cannot use --reinfer and --reinfer-all together".into(), @@ -40,7 +40,7 @@ pub async fn run( let mut config = match MdvsToml::read(&config_path_buf) { Ok(cfg) => match cfg.validate() { Ok(()) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::ReadConfig(ReadConfigOutcome { config_path: config_path_buf.display().to_string(), }), @@ -49,21 +49,21 @@ pub async fn run( cfg } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::User, format!("mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'"), config_start.elapsed().as_millis() as u64, )); - return fail_from_last_substep(&mut substeps, start); + return fail_from_last_substep(&mut steps, start); } }, Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::User, e.to_string(), config_start.elapsed().as_millis() as u64, )); - return fail_from_last_substep(&mut substeps, start); + return fail_from_last_substep(&mut steps, start); } }; @@ -71,7 +71,7 @@ pub async fn run( for name in reinfer { if !config.fields.field.iter().any(|f| f.name == *name) { return fail_early( - std::mem::take(&mut substeps), + std::mem::take(&mut steps), start, ErrorKind::User, format!("field '{name}' is not in mdvs.toml"), @@ -83,7 +83,7 @@ pub async fn run( let scan_start = Instant::now(); let scanned = match ScannedFiles::scan(path, &config.scan) { Ok(s) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::Scan(ScanOutcome { files_found: s.files.len(), glob: config.scan.glob.clone(), @@ -93,19 +93,19 @@ pub async fn run( s } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), scan_start.elapsed().as_millis() as u64, )); - return fail_from_last_substep(&mut substeps, start); + return fail_from_last_substep(&mut steps, start); } }; // 3. Infer — InferredSchema::infer() is infallible let infer_start = Instant::now(); let schema = InferredSchema::infer(&scanned); - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::Infer(InferOutcome { fields_inferred: schema.fields.len(), }), @@ -216,7 +216,7 @@ pub async fn run( // 4. Write config (Skipped if dry_run or no changes) if dry_run || !has_changes { - substeps.push(Step::skipped()); + steps.push(StepEntry::skipped()); } else { let write_start = Instant::now(); let write_path = path.join("mdvs.toml"); @@ -224,7 +224,7 @@ pub async fn run( match config.write(&write_path) { Ok(()) => { - substeps.push(Step::leaf( + steps.push(StepEntry::ok( Outcome::WriteConfig(WriteConfigOutcome { config_path: write_path.display().to_string(), fields_written: config.fields.field.len(), @@ -233,70 +233,62 @@ pub async fn run( )); } Err(e) => { - substeps.push(Step::failed( + steps.push(StepEntry::err( ErrorKind::Application, e.to_string(), write_start.elapsed().as_millis() as u64, )); - return Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: "failed to write config".into(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + return CommandResult { + steps, + result: Err(StepError { + kind: ErrorKind::Application, + message: "failed to write config".into(), + }), + elapsed_ms: start.elapsed().as_millis() as u64, }; } } } - Step { - substeps, - outcome: StepOutcome::Complete { - result: Ok(Outcome::Update(Box::new(UpdateOutcome { - files_scanned: total_files, - added, - changed, - removed, - unchanged, - dry_run, - }))), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + CommandResult { + steps, + result: Ok(Outcome::Update(Box::new(UpdateOutcome { + files_scanned: total_files, + added, + changed, + removed, + unchanged, + dry_run, + }))), + elapsed_ms: start.elapsed().as_millis() as u64, } } fn fail_early( - substeps: Vec>, + steps: Vec, start: Instant, kind: ErrorKind, message: String, -) -> Step { - Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { kind, message }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, +) -> CommandResult { + CommandResult { + steps, + result: Err(StepError { kind, message }), + elapsed_ms: start.elapsed().as_millis() as u64, } } -fn fail_from_last_substep(substeps: &mut Vec>, start: Instant) -> Step { - let msg = match substeps.last().map(|s| &s.outcome) { - Some(StepOutcome::Complete { result: Err(e), .. }) => e.message.clone(), +fn fail_from_last_substep(steps: &mut Vec, start: Instant) -> CommandResult { + let msg = match steps.last() { + Some(StepEntry::Failed(f)) => f.message.clone(), _ => "step failed".into(), }; - Step { - substeps: std::mem::take(substeps), - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::Application, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, + CommandResult { + steps: std::mem::take(steps), + result: Err(StepError { + kind: ErrorKind::Application, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, } } @@ -309,12 +301,9 @@ mod tests { use crate::schema::shared::{FieldTypeSerde, ScanConfig}; use std::fs; - fn unwrap_update(step: &Step) -> &UpdateOutcome { - match &step.outcome { - StepOutcome::Complete { - result: Ok(Outcome::Update(o)), - .. - } => o, + fn unwrap_update(result: &CommandResult) -> &UpdateOutcome { + match &result.result { + Ok(Outcome::Update(o)) => o, other => panic!("expected Ok(Update), got: {other:?}"), } } diff --git a/src/main.rs b/src/main.rs index f3d47ad..4ee7f05 100644 --- a/src/main.rs +++ b/src/main.rs @@ -1,5 +1,4 @@ use clap::{Parser, Subcommand}; -use mdvs::block::Render; use std::path::PathBuf; /// Stderr logging level for `--logs`. @@ -167,7 +166,7 @@ async fn main() -> anyhow::Result<()> { ignore_bare_files, skip_gitignore, } => { - let step = mdvs::cmd::init::run( + let result = mdvs::cmd::init::run( &path, &glob, force, @@ -176,20 +175,22 @@ async fn main() -> anyhow::Result<()> { skip_gitignore, cli.verbose, ); - let failed = mdvs::step::has_failed(&step); - let output_str = match (&cli.output, cli.verbose) { + let failed = mdvs::step::has_failed(&result); + let verbose = cli.verbose || failed; + let output_str = match (&cli.output, verbose) { (mdvs::output::OutputFormat::Text, true) => { - mdvs::render::format_text(&step.render()) + mdvs::render::format_text(&result.render_verbose()) } (mdvs::output::OutputFormat::Text, false) => { - mdvs::render::format_text(&step.to_compact().render()) + mdvs::render::format_text(&result.render_compact()) } (mdvs::output::OutputFormat::Json, true) => { - serde_json::to_string_pretty(&step).unwrap() - } - (mdvs::output::OutputFormat::Json, false) => { - serde_json::to_string_pretty(&step.to_compact()).unwrap() + serde_json::to_string_pretty(&result).unwrap() } + (mdvs::output::OutputFormat::Json, false) => match result.result_value() { + Some(outcome) => serde_json::to_string_pretty(outcome).unwrap(), + None => serde_json::to_string_pretty(&result).unwrap(), + }, }; print!("{output_str}"); if failed { @@ -205,7 +206,7 @@ async fn main() -> anyhow::Result<()> { force, no_update, } => { - let step = mdvs::cmd::build::run( + let result = mdvs::cmd::build::run( &path, set_model.as_deref(), set_revision.as_deref(), @@ -215,21 +216,23 @@ async fn main() -> anyhow::Result<()> { cli.verbose, ) .await; - let failed = mdvs::step::has_failed(&step); - let violations = mdvs::step::has_violations(&step); - let output_str = match (&cli.output, cli.verbose) { + let failed = mdvs::step::has_failed(&result); + let violations = mdvs::step::has_violations(&result); + let verbose = cli.verbose || failed; + let output_str = match (&cli.output, verbose) { (mdvs::output::OutputFormat::Text, true) => { - mdvs::render::format_text(&step.render()) + mdvs::render::format_text(&result.render_verbose()) } (mdvs::output::OutputFormat::Text, false) => { - mdvs::render::format_text(&step.to_compact().render()) + mdvs::render::format_text(&result.render_compact()) } (mdvs::output::OutputFormat::Json, true) => { - serde_json::to_string_pretty(&step).unwrap() - } - (mdvs::output::OutputFormat::Json, false) => { - serde_json::to_string_pretty(&step.to_compact()).unwrap() + serde_json::to_string_pretty(&result).unwrap() } + (mdvs::output::OutputFormat::Json, false) => match result.result_value() { + Some(outcome) => serde_json::to_string_pretty(outcome).unwrap(), + None => serde_json::to_string_pretty(&result).unwrap(), + }, }; print!("{output_str}"); if failed { @@ -248,7 +251,7 @@ async fn main() -> anyhow::Result<()> { no_update, no_build, } => { - let step = mdvs::cmd::search::run( + let result = mdvs::cmd::search::run( &path, &query, limit, @@ -258,20 +261,22 @@ async fn main() -> anyhow::Result<()> { cli.verbose, ) .await; - let failed = mdvs::step::has_failed(&step); - let output_str = match (&cli.output, cli.verbose) { + let failed = mdvs::step::has_failed(&result); + let verbose = cli.verbose || failed; + let output_str = match (&cli.output, verbose) { (mdvs::output::OutputFormat::Text, true) => { - mdvs::render::format_text(&step.render()) + mdvs::render::format_text(&result.render_verbose()) } (mdvs::output::OutputFormat::Text, false) => { - mdvs::render::format_text(&step.to_compact().render()) + mdvs::render::format_text(&result.render_compact()) } (mdvs::output::OutputFormat::Json, true) => { - serde_json::to_string_pretty(&step).unwrap() - } - (mdvs::output::OutputFormat::Json, false) => { - serde_json::to_string_pretty(&step.to_compact()).unwrap() + serde_json::to_string_pretty(&result).unwrap() } + (mdvs::output::OutputFormat::Json, false) => match result.result_value() { + Some(outcome) => serde_json::to_string_pretty(outcome).unwrap(), + None => serde_json::to_string_pretty(&result).unwrap(), + }, }; print!("{output_str}"); if failed { @@ -280,22 +285,24 @@ async fn main() -> anyhow::Result<()> { Ok(()) } Command::Check { path, no_update } => { - let step = mdvs::cmd::check::run(&path, no_update, cli.verbose); - let failed = mdvs::step::has_failed(&step); - let violations = mdvs::step::has_violations(&step); - let output_str = match (&cli.output, cli.verbose) { + let result = mdvs::cmd::check::run(&path, no_update, cli.verbose); + let failed = mdvs::step::has_failed(&result); + let violations = mdvs::step::has_violations(&result); + let verbose = cli.verbose || failed; + let output_str = match (&cli.output, verbose) { (mdvs::output::OutputFormat::Text, true) => { - mdvs::render::format_text(&step.render()) + mdvs::render::format_text(&result.render_verbose()) } (mdvs::output::OutputFormat::Text, false) => { - mdvs::render::format_text(&step.to_compact().render()) + mdvs::render::format_text(&result.render_compact()) } (mdvs::output::OutputFormat::Json, true) => { - serde_json::to_string_pretty(&step).unwrap() - } - (mdvs::output::OutputFormat::Json, false) => { - serde_json::to_string_pretty(&step.to_compact()).unwrap() + serde_json::to_string_pretty(&result).unwrap() } + (mdvs::output::OutputFormat::Json, false) => match result.result_value() { + Some(outcome) => serde_json::to_string_pretty(outcome).unwrap(), + None => serde_json::to_string_pretty(&result).unwrap(), + }, }; print!("{output_str}"); if failed { @@ -312,22 +319,24 @@ async fn main() -> anyhow::Result<()> { reinfer_all, dry_run, } => { - let step = + let result = mdvs::cmd::update::run(&path, &reinfer, reinfer_all, dry_run, cli.verbose).await; - let failed = mdvs::step::has_failed(&step); - let output_str = match (&cli.output, cli.verbose) { + let failed = mdvs::step::has_failed(&result); + let verbose = cli.verbose || failed; + let output_str = match (&cli.output, verbose) { (mdvs::output::OutputFormat::Text, true) => { - mdvs::render::format_text(&step.render()) + mdvs::render::format_text(&result.render_verbose()) } (mdvs::output::OutputFormat::Text, false) => { - mdvs::render::format_text(&step.to_compact().render()) + mdvs::render::format_text(&result.render_compact()) } (mdvs::output::OutputFormat::Json, true) => { - serde_json::to_string_pretty(&step).unwrap() - } - (mdvs::output::OutputFormat::Json, false) => { - serde_json::to_string_pretty(&step.to_compact()).unwrap() + serde_json::to_string_pretty(&result).unwrap() } + (mdvs::output::OutputFormat::Json, false) => match result.result_value() { + Some(outcome) => serde_json::to_string_pretty(outcome).unwrap(), + None => serde_json::to_string_pretty(&result).unwrap(), + }, }; print!("{output_str}"); if failed { @@ -336,21 +345,23 @@ async fn main() -> anyhow::Result<()> { Ok(()) } Command::Clean { path } => { - let step = mdvs::cmd::clean::run(&path); - let failed = mdvs::step::has_failed(&step); - let output_str = match (&cli.output, cli.verbose) { + let result = mdvs::cmd::clean::run(&path); + let failed = mdvs::step::has_failed(&result); + let verbose = cli.verbose || failed; + let output_str = match (&cli.output, verbose) { (mdvs::output::OutputFormat::Text, true) => { - mdvs::render::format_text(&step.render()) + mdvs::render::format_text(&result.render_verbose()) } (mdvs::output::OutputFormat::Text, false) => { - mdvs::render::format_text(&step.to_compact().render()) + mdvs::render::format_text(&result.render_compact()) } (mdvs::output::OutputFormat::Json, true) => { - serde_json::to_string_pretty(&step).unwrap() - } - (mdvs::output::OutputFormat::Json, false) => { - serde_json::to_string_pretty(&step.to_compact()).unwrap() + serde_json::to_string_pretty(&result).unwrap() } + (mdvs::output::OutputFormat::Json, false) => match result.result_value() { + Some(outcome) => serde_json::to_string_pretty(outcome).unwrap(), + None => serde_json::to_string_pretty(&result).unwrap(), + }, }; print!("{output_str}"); if failed { @@ -359,21 +370,23 @@ async fn main() -> anyhow::Result<()> { Ok(()) } Command::Info { path } => { - let step = mdvs::cmd::info::run(&path, cli.verbose); - let failed = mdvs::step::has_failed(&step); - let output_str = match (&cli.output, cli.verbose) { + let result = mdvs::cmd::info::run(&path, cli.verbose); + let failed = mdvs::step::has_failed(&result); + let verbose = cli.verbose || failed; + let output_str = match (&cli.output, verbose) { (mdvs::output::OutputFormat::Text, true) => { - mdvs::render::format_text(&step.render()) + mdvs::render::format_text(&result.render_verbose()) } (mdvs::output::OutputFormat::Text, false) => { - mdvs::render::format_text(&step.to_compact().render()) + mdvs::render::format_text(&result.render_compact()) } (mdvs::output::OutputFormat::Json, true) => { - serde_json::to_string_pretty(&step).unwrap() - } - (mdvs::output::OutputFormat::Json, false) => { - serde_json::to_string_pretty(&step.to_compact()).unwrap() + serde_json::to_string_pretty(&result).unwrap() } + (mdvs::output::OutputFormat::Json, false) => match result.result_value() { + Some(outcome) => serde_json::to_string_pretty(outcome).unwrap(), + None => serde_json::to_string_pretty(&result).unwrap(), + }, }; print!("{output_str}"); if failed { diff --git a/src/outcome/classify.rs b/src/outcome/classify.rs index 951543f..d5a9afb 100644 --- a/src/outcome/classify.rs +++ b/src/outcome/classify.rs @@ -33,33 +33,3 @@ impl Render for ClassifyOutcome { } } } - -/// Compact outcome for the classify step (identical — no verbose-only fields). -#[derive(Debug, Serialize)] -pub struct ClassifyOutcomeCompact { - /// Whether this is a full rebuild. - pub full_rebuild: bool, - /// Number of files that need embedding. - pub needs_embedding: usize, - /// Number of files unchanged. - pub unchanged: usize, - /// Number of files removed. - pub removed: usize, -} - -impl Render for ClassifyOutcomeCompact { - fn render(&self) -> Vec { - vec![] - } -} - -impl From<&ClassifyOutcome> for ClassifyOutcomeCompact { - fn from(o: &ClassifyOutcome) -> Self { - Self { - full_rebuild: o.full_rebuild, - needs_embedding: o.needs_embedding, - unchanged: o.unchanged, - removed: o.removed, - } - } -} diff --git a/src/outcome/commands/build.rs b/src/outcome/commands/build.rs index 9dda255..e2b5736 100644 --- a/src/outcome/commands/build.rs +++ b/src/outcome/commands/build.rs @@ -73,7 +73,7 @@ impl Render for BuildOutcome { format_chunk_count(self.chunks_total) ))); - // Verbose: record tables per category with file-by-file detail + // Record tables per category with file-by-file detail if self.files_embedded > 0 { let detail = self .embedded_files @@ -133,92 +133,3 @@ impl Render for BuildOutcome { blocks } } - -/// Compact outcome for the build command. -#[derive(Debug, Serialize)] -pub struct BuildOutcomeCompact { - /// Whether this was a full rebuild. - pub full_rebuild: bool, - /// Total files in the final index. - pub files_total: usize, - /// Files embedded this run. - pub files_embedded: usize, - /// Files unchanged from previous build. - pub files_unchanged: usize, - /// Files removed since last build. - pub files_removed: usize, - /// Total chunks in the final index. - pub chunks_total: usize, - /// Chunks produced by new embeddings. - pub chunks_embedded: usize, - /// Chunks retained from unchanged files. - pub chunks_unchanged: usize, - /// Chunks dropped from removed files. - pub chunks_removed: usize, -} - -impl Render for BuildOutcomeCompact { - fn render(&self) -> Vec { - let mut blocks = vec![]; - - let rebuild_suffix = if self.full_rebuild { - " (full rebuild)" - } else { - "" - }; - blocks.push(Block::Line(format!( - "Built index — {}, {}{rebuild_suffix}", - format_file_count(self.files_total), - format_chunk_count(self.chunks_total) - ))); - - // Compact stats table - let mut rows = vec![]; - if self.files_embedded > 0 { - rows.push(vec![ - "embedded".to_string(), - format_file_count(self.files_embedded), - format_chunk_count(self.chunks_embedded), - ]); - } - if self.files_unchanged > 0 { - rows.push(vec![ - "unchanged".to_string(), - format_file_count(self.files_unchanged), - format_chunk_count(self.chunks_unchanged), - ]); - } - if self.files_removed > 0 { - rows.push(vec![ - "removed".to_string(), - format_file_count(self.files_removed), - format_chunk_count(self.chunks_removed), - ]); - } - if !rows.is_empty() { - blocks.push(Block::Table { - headers: None, - rows, - style: TableStyle::Compact, - }); - } - - blocks - } -} - -impl From<&BuildOutcome> for BuildOutcomeCompact { - fn from(o: &BuildOutcome) -> Self { - Self { - full_rebuild: o.full_rebuild, - files_total: o.files_total, - files_embedded: o.files_embedded, - files_unchanged: o.files_unchanged, - files_removed: o.files_removed, - chunks_total: o.chunks_total, - chunks_embedded: o.chunks_embedded, - chunks_unchanged: o.chunks_unchanged, - chunks_removed: o.chunks_removed, - } - } -} diff --git a/src/outcome/commands/check.rs b/src/outcome/commands/check.rs index 279bac2..d13ee5e 100644 --- a/src/outcome/commands/check.rs +++ b/src/outcome/commands/check.rs @@ -3,10 +3,7 @@ use serde::Serialize; use crate::block::{Block, Render, TableStyle}; -use crate::output::{ - format_file_count, FieldViolation, FieldViolationCompact, NewField, NewFieldCompact, - ViolationKind, -}; +use crate::output::{format_file_count, FieldViolation, NewField, ViolationKind}; /// Full outcome for the check command. #[derive(Debug, Serialize)] @@ -104,95 +101,3 @@ impl Render for CheckOutcome { blocks } } - -/// Compact outcome for the check command. -#[derive(Debug, Serialize)] -pub struct CheckOutcomeCompact { - /// Number of markdown files checked. - pub files_checked: usize, - /// Compact violations (field + kind + count). - pub violations: Vec, - /// Compact new fields (name + count). - pub new_fields: Vec, -} - -impl Render for CheckOutcomeCompact { - fn render(&self) -> Vec { - let mut blocks = vec![]; - - let violation_part = if self.violations.is_empty() { - "no violations".to_string() - } else { - format!("{} violation(s)", self.violations.len()) - }; - let new_field_part = if self.new_fields.is_empty() { - String::new() - } else { - format!(", {} new field(s)", self.new_fields.len()) - }; - blocks.push(Block::Line(format!( - "Checked {} — {violation_part}{new_field_part}", - format_file_count(self.files_checked), - ))); - - if !self.violations.is_empty() { - let rows: Vec> = self - .violations - .iter() - .map(|v| { - let kind_str = match v.kind { - ViolationKind::MissingRequired => "MissingRequired", - ViolationKind::WrongType => "WrongType", - ViolationKind::Disallowed => "Disallowed", - ViolationKind::NullNotAllowed => "NullNotAllowed", - }; - vec![ - format!("\"{}\"", v.field), - kind_str.to_string(), - format_file_count(v.file_count), - ] - }) - .collect(); - blocks.push(Block::Table { - headers: None, - rows, - style: TableStyle::Compact, - }); - } - - if !self.new_fields.is_empty() { - let rows: Vec> = self - .new_fields - .iter() - .map(|nf| { - vec![ - format!("\"{}\"", nf.name), - "new".to_string(), - format_file_count(nf.files_found), - ] - }) - .collect(); - blocks.push(Block::Table { - headers: None, - rows, - style: TableStyle::Compact, - }); - } - - blocks - } -} - -impl From<&CheckOutcome> for CheckOutcomeCompact { - fn from(o: &CheckOutcome) -> Self { - Self { - files_checked: o.files_checked, - violations: o - .violations - .iter() - .map(FieldViolationCompact::from) - .collect(), - new_fields: o.new_fields.iter().map(NewFieldCompact::from).collect(), - } - } -} diff --git a/src/outcome/commands/clean.rs b/src/outcome/commands/clean.rs index b64414b..d7f6be8 100644 --- a/src/outcome/commands/clean.rs +++ b/src/outcome/commands/clean.rs @@ -40,37 +40,6 @@ impl Render for CleanOutcome { } } -/// Compact outcome for the clean command. -#[derive(Debug, Serialize)] -pub struct CleanOutcomeCompact { - /// Whether `.mdvs/` was actually removed. - pub removed: bool, - /// Path to the `.mdvs/` directory. - pub path: PathBuf, -} - -impl Render for CleanOutcomeCompact { - fn render(&self) -> Vec { - if self.removed { - vec![Block::Line(format!("Cleaned \"{}\"", self.path.display()))] - } else { - vec![Block::Line(format!( - "Nothing to clean — \"{}\" does not exist", - self.path.display() - ))] - } - } -} - -impl From<&CleanOutcome> for CleanOutcomeCompact { - fn from(o: &CleanOutcome) -> Self { - Self { - removed: o.removed, - path: o.path.clone(), - } - } -} - #[cfg(test)] mod tests { use super::*; @@ -110,31 +79,4 @@ mod tests { _ => panic!("expected Line"), } } - - #[test] - fn clean_compact_removed() { - let outcome = CleanOutcomeCompact { - removed: true, - path: PathBuf::from(".mdvs"), - }; - let blocks = outcome.render(); - assert_eq!(blocks.len(), 1); - match &blocks[0] { - Block::Line(s) => assert_eq!(s, "Cleaned \".mdvs\""), - _ => panic!("expected Line"), - } - } - - #[test] - fn clean_compact_from_full() { - let full = CleanOutcome { - removed: true, - path: PathBuf::from(".mdvs"), - files_removed: 5, - size_bytes: 4096, - }; - let compact = CleanOutcomeCompact::from(&full); - assert!(compact.removed); - assert_eq!(compact.path, PathBuf::from(".mdvs")); - } } diff --git a/src/outcome/commands/info.rs b/src/outcome/commands/info.rs index b87430a..448efbd 100644 --- a/src/outcome/commands/info.rs +++ b/src/outcome/commands/info.rs @@ -94,54 +94,3 @@ impl Render for InfoOutcome { blocks } } - -/// Compact outcome for the info command. -#[derive(Debug, Serialize)] -pub struct InfoOutcomeCompact { - /// Glob pattern from `[scan]` config. - pub scan_glob: String, - /// Number of markdown files matching the scan pattern. - pub files_on_disk: usize, - /// Number of fields defined. - pub field_count: usize, - /// Number of ignored fields. - pub ignored_count: usize, - /// Whether an index exists. - pub has_index: bool, - /// Brief index summary. - #[serde(skip_serializing_if = "Option::is_none")] - pub index_summary: Option, -} - -impl Render for InfoOutcomeCompact { - fn render(&self) -> Vec { - let one_liner = if let Some(ref summary) = self.index_summary { - format!( - "{} files, {} fields, {summary}", - self.files_on_disk, self.field_count, - ) - } else { - format!("{} files, {} fields", self.files_on_disk, self.field_count) - }; - vec![Block::Line(one_liner)] - } -} - -impl From<&InfoOutcome> for InfoOutcomeCompact { - fn from(o: &InfoOutcome) -> Self { - let index_summary = o.index.as_ref().map(|idx| { - format!( - "{} files, {} chunks, model: {}", - idx.files_indexed, idx.chunks, idx.model, - ) - }); - Self { - scan_glob: o.scan_glob.clone(), - files_on_disk: o.files_on_disk, - field_count: o.fields.len(), - ignored_count: o.ignored_fields.len(), - has_index: o.index.is_some(), - index_summary, - } - } -} diff --git a/src/outcome/commands/init.rs b/src/outcome/commands/init.rs index ed02993..ca11705 100644 --- a/src/outcome/commands/init.rs +++ b/src/outcome/commands/init.rs @@ -5,7 +5,7 @@ use std::path::PathBuf; use serde::Serialize; use crate::block::{Block, Render, TableStyle}; -use crate::output::{format_file_count, format_hints, DiscoveredField, DiscoveredFieldCompact}; +use crate::output::{format_file_count, format_hints, DiscoveredField}; /// Full outcome for the init command. #[derive(Debug, Serialize)] @@ -89,83 +89,3 @@ impl Render for InitOutcome { blocks } } - -/// Compact outcome for the init command. -#[derive(Debug, Serialize)] -pub struct InitOutcomeCompact { - /// Directory where `mdvs.toml` was written. - pub path: PathBuf, - /// Number of markdown files scanned. - pub files_scanned: usize, - /// Number of fields inferred. - pub field_count: usize, - /// Whether this was a dry run. - pub dry_run: bool, - /// Compact field summaries. - pub fields: Vec, -} - -impl Render for InitOutcomeCompact { - fn render(&self) -> Vec { - let mut blocks = vec![]; - - let field_summary = if self.fields.is_empty() { - "no fields found".to_string() - } else { - format!("{} field(s)", self.field_count) - }; - let dry_run_suffix = if self.dry_run { " (dry run)" } else { "" }; - blocks.push(Block::Line(format!( - "Initialized {} — {field_summary}{dry_run_suffix}", - format_file_count(self.files_scanned) - ))); - - // Compact fields table - if !self.fields.is_empty() { - let rows: Vec> = self - .fields - .iter() - .map(|f| { - let type_str = if f.nullable { - format!("{}?", f.field_type) - } else { - f.field_type.clone() - }; - vec![ - format!("\"{}\"", f.name), - type_str, - format!("{}/{}", f.files_found, f.total_files), - ] - }) - .collect(); - blocks.push(Block::Table { - headers: None, - rows, - style: TableStyle::Compact, - }); - } - - if self.dry_run { - blocks.push(Block::Line("(dry run, nothing written)".into())); - } else { - blocks.push(Block::Line(format!( - "Initialized mdvs in '{}'", - self.path.display() - ))); - } - - blocks - } -} - -impl From<&InitOutcome> for InitOutcomeCompact { - fn from(o: &InitOutcome) -> Self { - Self { - path: o.path.clone(), - files_scanned: o.files_scanned, - field_count: o.fields.len(), - dry_run: o.dry_run, - fields: o.fields.iter().map(DiscoveredFieldCompact::from).collect(), - } - } -} diff --git a/src/outcome/commands/mod.rs b/src/outcome/commands/mod.rs index 4201b6c..6f6191f 100644 --- a/src/outcome/commands/mod.rs +++ b/src/outcome/commands/mod.rs @@ -1,7 +1,6 @@ //! Outcome types for command-level results. //! -//! One file per command. Each defines a full + compact outcome pair -//! with `Render` and `From` impls. +//! One file per command, each defining the outcome struct and its `Render` impl. pub mod build; pub mod check; @@ -11,10 +10,10 @@ pub mod init; pub mod search; pub mod update; -pub use build::{BuildOutcome, BuildOutcomeCompact}; -pub use check::{CheckOutcome, CheckOutcomeCompact}; -pub use clean::{CleanOutcome, CleanOutcomeCompact}; -pub use info::{InfoOutcome, InfoOutcomeCompact}; -pub use init::{InitOutcome, InitOutcomeCompact}; -pub use search::{SearchOutcome, SearchOutcomeCompact}; -pub use update::{UpdateOutcome, UpdateOutcomeCompact}; +pub use build::BuildOutcome; +pub use check::CheckOutcome; +pub use clean::CleanOutcome; +pub use info::InfoOutcome; +pub use init::InitOutcome; +pub use search::SearchOutcome; +pub use update::UpdateOutcome; diff --git a/src/outcome/commands/search.rs b/src/outcome/commands/search.rs index 583fbbd..689b15a 100644 --- a/src/outcome/commands/search.rs +++ b/src/outcome/commands/search.rs @@ -81,80 +81,3 @@ impl Render for SearchOutcome { blocks } } - -/// Compact search hit — filename and score only. -#[derive(Debug, Serialize)] -pub struct SearchHitCompact { - /// Filename of the matched file. - pub filename: String, - /// Cosine similarity score. - pub score: f64, -} - -impl From<&SearchHit> for SearchHitCompact { - fn from(h: &SearchHit) -> Self { - Self { - filename: h.filename.clone(), - score: h.score, - } - } -} - -/// Compact outcome for the search command. -#[derive(Debug, Serialize)] -pub struct SearchOutcomeCompact { - /// The query string. - pub query: String, - /// Compact hits (filename + score only). - pub hits: Vec, - /// Name of the embedding model used. - pub model_name: String, - /// Result limit that was applied. - pub limit: usize, -} - -impl Render for SearchOutcomeCompact { - fn render(&self) -> Vec { - let mut blocks = vec![]; - - let hit_word = if self.hits.len() == 1 { "hit" } else { "hits" }; - blocks.push(Block::Line(format!( - "Searched \"{}\" — {} {hit_word}", - self.query, - self.hits.len() - ))); - - if !self.hits.is_empty() { - let rows: Vec> = self - .hits - .iter() - .enumerate() - .map(|(i, h)| { - vec![ - format!("{}", i + 1), - format!("\"{}\"", h.filename), - format!("{:.3}", h.score), - ] - }) - .collect(); - blocks.push(Block::Table { - headers: None, - rows, - style: TableStyle::Compact, - }); - } - - blocks - } -} - -impl From<&SearchOutcome> for SearchOutcomeCompact { - fn from(o: &SearchOutcome) -> Self { - Self { - query: o.query.clone(), - hits: o.hits.iter().map(SearchHitCompact::from).collect(), - model_name: o.model_name.clone(), - limit: o.limit, - } - } -} diff --git a/src/outcome/commands/update.rs b/src/outcome/commands/update.rs index 1ac6ada..1662989 100644 --- a/src/outcome/commands/update.rs +++ b/src/outcome/commands/update.rs @@ -32,7 +32,6 @@ impl Render for UpdateOutcome { fn render(&self) -> Vec { let mut blocks = vec![]; - // One-liner let total_changes = self.added.len() + self.changed.len() + self.removed.len(); let summary = if total_changes == 0 { "no changes".to_string() @@ -49,7 +48,6 @@ impl Render for UpdateOutcome { return blocks; } - // Per-added: record tables for field in &self.added { let mut detail_lines = Vec::new(); if let Some(ref globs) = field.allowed { @@ -80,7 +78,6 @@ impl Render for UpdateOutcome { }); } - // Per-changed: compact table with aspect columns for field in &self.changed { let mut rows = vec![vec![ "field".into(), @@ -104,7 +101,6 @@ impl Render for UpdateOutcome { }); } - // Per-removed: record tables for field in &self.removed { let detail = match &field.allowed { Some(globs) => { @@ -135,49 +131,3 @@ impl Render for UpdateOutcome { blocks } } - -/// Compact outcome for the update command. -#[derive(Debug, Serialize)] -pub struct UpdateOutcomeCompact { - /// Number of markdown files scanned. - pub files_scanned: usize, - /// Number of newly discovered fields. - pub added_count: usize, - /// Number of changed fields. - pub changed_count: usize, - /// Number of removed fields. - pub removed_count: usize, - /// Number of unchanged fields. - pub unchanged: usize, - /// Whether this was a dry run. - pub dry_run: bool, -} - -impl Render for UpdateOutcomeCompact { - fn render(&self) -> Vec { - let total = self.added_count + self.changed_count + self.removed_count; - let summary = if total == 0 { - "no changes".to_string() - } else { - format!("{total} field(s) changed") - }; - let dry_run_suffix = if self.dry_run { " (dry run)" } else { "" }; - vec![Block::Line(format!( - "Scanned {} — {summary}{dry_run_suffix}", - format_file_count(self.files_scanned) - ))] - } -} - -impl From<&UpdateOutcome> for UpdateOutcomeCompact { - fn from(o: &UpdateOutcome) -> Self { - Self { - files_scanned: o.files_scanned, - added_count: o.added.len(), - changed_count: o.changed.len(), - removed_count: o.removed.len(), - unchanged: o.unchanged, - dry_run: o.dry_run, - } - } -} diff --git a/src/outcome/config.rs b/src/outcome/config.rs index c864f14..47f944a 100644 --- a/src/outcome/config.rs +++ b/src/outcome/config.rs @@ -1,7 +1,4 @@ -//! Outcome types for config-related leaf steps (ReadConfig, WriteConfig, etc.). -//! -//! Only ReadConfig is defined initially. WriteConfig, MutateConfig, and -//! CheckConfigChanged are added when init/build commands are converted. +//! Outcome types for config-related leaf steps (ReadConfig, WriteConfig). use serde::Serialize; @@ -20,27 +17,6 @@ impl Render for ReadConfigOutcome { } } -/// Compact outcome for the read_config step (identical — no verbose-only fields). -#[derive(Debug, Serialize)] -pub struct ReadConfigOutcomeCompact { - /// Path to the config file that was read. - pub config_path: String, -} - -impl Render for ReadConfigOutcomeCompact { - fn render(&self) -> Vec { - vec![] // Leaf compact outcomes are silent - } -} - -impl From<&ReadConfigOutcome> for ReadConfigOutcomeCompact { - fn from(o: &ReadConfigOutcome) -> Self { - Self { - config_path: o.config_path.clone(), - } - } -} - /// Full outcome for the write_config step. #[derive(Debug, Serialize)] pub struct WriteConfigOutcome { @@ -55,27 +31,3 @@ impl Render for WriteConfigOutcome { vec![Block::Line(format!("Write config: {}", self.config_path))] } } - -/// Compact outcome for the write_config step (identical — no verbose-only fields). -#[derive(Debug, Serialize)] -pub struct WriteConfigOutcomeCompact { - /// Path to the config file that was written. - pub config_path: String, - /// Number of fields written to the config. - pub fields_written: usize, -} - -impl Render for WriteConfigOutcomeCompact { - fn render(&self) -> Vec { - vec![] // Leaf compact outcomes are silent - } -} - -impl From<&WriteConfigOutcome> for WriteConfigOutcomeCompact { - fn from(o: &WriteConfigOutcome) -> Self { - Self { - config_path: o.config_path.clone(), - fields_written: o.fields_written, - } - } -} diff --git a/src/outcome/embed.rs b/src/outcome/embed.rs index 3030702..d305845 100644 --- a/src/outcome/embed.rs +++ b/src/outcome/embed.rs @@ -1,6 +1,4 @@ //! Outcome types for the embed leaf steps (EmbedFiles, EmbedQuery). -//! -//! Only EmbedFiles is defined initially. EmbedQuery added when search is converted. use serde::Serialize; @@ -24,21 +22,6 @@ impl Render for EmbedFilesOutcome { } } -/// Compact outcome for the embed_files step (identical). -#[derive(Debug, Serialize)] -pub struct EmbedFilesOutcomeCompact { - /// Number of files embedded. - pub files_embedded: usize, - /// Number of chunks produced. - pub chunks_produced: usize, -} - -impl Render for EmbedFilesOutcomeCompact { - fn render(&self) -> Vec { - vec![] - } -} - /// Full outcome for the embed_query step. #[derive(Debug, Serialize)] pub struct EmbedQueryOutcome { @@ -51,33 +34,3 @@ impl Render for EmbedQueryOutcome { vec![Block::Line(format!("Embed query: \"{}\"", self.query))] } } - -/// Compact outcome for the embed_query step (identical). -#[derive(Debug, Serialize)] -pub struct EmbedQueryOutcomeCompact { - /// The query string that was embedded. - pub query: String, -} - -impl Render for EmbedQueryOutcomeCompact { - fn render(&self) -> Vec { - vec![] - } -} - -impl From<&EmbedQueryOutcome> for EmbedQueryOutcomeCompact { - fn from(o: &EmbedQueryOutcome) -> Self { - Self { - query: o.query.clone(), - } - } -} - -impl From<&EmbedFilesOutcome> for EmbedFilesOutcomeCompact { - fn from(o: &EmbedFilesOutcome) -> Self { - Self { - files_embedded: o.files_embedded, - chunks_produced: o.chunks_produced, - } - } -} diff --git a/src/outcome/index.rs b/src/outcome/index.rs index 0946695..a089db0 100644 --- a/src/outcome/index.rs +++ b/src/outcome/index.rs @@ -1,7 +1,4 @@ -//! Outcome types for index-related leaf steps (DeleteIndex, ReadIndex, etc.). -//! -//! Only DeleteIndex is defined initially. Other index outcomes are added -//! incrementally as commands are converted. +//! Outcome types for index-related leaf steps (DeleteIndex, ReadIndex, WriteIndex). use serde::Serialize; @@ -39,36 +36,6 @@ impl Render for DeleteIndexOutcome { } } -/// Compact outcome for the delete_index step (identical fields — leaf step). -#[derive(Debug, Serialize)] -pub struct DeleteIndexOutcomeCompact { - /// Whether `.mdvs/` existed and was removed. - pub removed: bool, - /// Path to the `.mdvs/` directory. - pub path: String, - /// Number of files removed. - pub files_removed: usize, - /// Total bytes freed. - pub size_bytes: u64, -} - -impl Render for DeleteIndexOutcomeCompact { - fn render(&self) -> Vec { - vec![] // Leaf compact outcomes are silent - } -} - -impl From<&DeleteIndexOutcome> for DeleteIndexOutcomeCompact { - fn from(o: &DeleteIndexOutcome) -> Self { - Self { - removed: o.removed, - path: o.path.clone(), - files_removed: o.files_removed, - size_bytes: o.size_bytes, - } - } -} - /// Full outcome for the read_index step. #[derive(Debug, Serialize)] pub struct ReadIndexOutcome { @@ -93,33 +60,6 @@ impl Render for ReadIndexOutcome { } } -/// Compact outcome for the read_index step (identical — no verbose-only fields). -#[derive(Debug, Serialize)] -pub struct ReadIndexOutcomeCompact { - /// Whether the index exists. - pub exists: bool, - /// Number of files in the index. - pub files_indexed: usize, - /// Number of chunks in the index. - pub chunks: usize, -} - -impl Render for ReadIndexOutcomeCompact { - fn render(&self) -> Vec { - vec![] // Leaf compact outcomes are silent - } -} - -impl From<&ReadIndexOutcome> for ReadIndexOutcomeCompact { - fn from(o: &ReadIndexOutcome) -> Self { - Self { - exists: o.exists, - files_indexed: o.files_indexed, - chunks: o.chunks, - } - } -} - /// Full outcome for the write_index step. #[derive(Debug, Serialize)] pub struct WriteIndexOutcome { @@ -138,30 +78,6 @@ impl Render for WriteIndexOutcome { } } -/// Compact outcome for the write_index step (identical). -#[derive(Debug, Serialize)] -pub struct WriteIndexOutcomeCompact { - /// Number of files written. - pub files_written: usize, - /// Number of chunks written. - pub chunks_written: usize, -} - -impl Render for WriteIndexOutcomeCompact { - fn render(&self) -> Vec { - vec![] - } -} - -impl From<&WriteIndexOutcome> for WriteIndexOutcomeCompact { - fn from(o: &WriteIndexOutcome) -> Self { - Self { - files_written: o.files_written, - chunks_written: o.chunks_written, - } - } -} - #[cfg(test)] mod tests { use super::*; @@ -197,30 +113,4 @@ mod tests { _ => panic!("expected Line"), } } - - #[test] - fn delete_index_compact_is_silent() { - let outcome = DeleteIndexOutcomeCompact { - removed: true, - path: ".mdvs".into(), - files_removed: 2, - size_bytes: 1024, - }; - assert!(outcome.render().is_empty()); - } - - #[test] - fn delete_index_from_full() { - let full = DeleteIndexOutcome { - removed: true, - path: ".mdvs".into(), - files_removed: 3, - size_bytes: 2048, - }; - let compact = DeleteIndexOutcomeCompact::from(&full); - assert_eq!(compact.removed, true); - assert_eq!(compact.path, ".mdvs"); - assert_eq!(compact.files_removed, 3); - assert_eq!(compact.size_bytes, 2048); - } } diff --git a/src/outcome/infer.rs b/src/outcome/infer.rs index 2c93a4b..2091b0a 100644 --- a/src/outcome/infer.rs +++ b/src/outcome/infer.rs @@ -19,24 +19,3 @@ impl Render for InferOutcome { ))] } } - -/// Compact outcome for the infer step (identical — no verbose-only fields). -#[derive(Debug, Serialize)] -pub struct InferOutcomeCompact { - /// Number of fields inferred from frontmatter. - pub fields_inferred: usize, -} - -impl Render for InferOutcomeCompact { - fn render(&self) -> Vec { - vec![] // Leaf compact outcomes are silent - } -} - -impl From<&InferOutcome> for InferOutcomeCompact { - fn from(o: &InferOutcome) -> Self { - Self { - fields_inferred: o.fields_inferred, - } - } -} diff --git a/src/outcome/mod.rs b/src/outcome/mod.rs index 1fd1876..719cbf3 100644 --- a/src/outcome/mod.rs +++ b/src/outcome/mod.rs @@ -1,8 +1,6 @@ //! Outcome types for all pipeline steps and commands. //! -//! The `Outcome` and `CompactOutcome` enums contain one variant per step/command. -//! Variants are added incrementally as commands are converted to the Step tree -//! architecture. +//! The `Outcome` enum contains one variant per step/command. pub mod classify; pub mod commands; @@ -18,35 +16,26 @@ pub mod validate; use serde::Serialize; use crate::block::{Block, Render}; -use crate::step::Step; -pub use classify::{ClassifyOutcome, ClassifyOutcomeCompact}; +pub use classify::ClassifyOutcome; pub use commands::{ - BuildOutcome, BuildOutcomeCompact, CheckOutcome, CheckOutcomeCompact, CleanOutcome, - CleanOutcomeCompact, InfoOutcome, InfoOutcomeCompact, InitOutcome, InitOutcomeCompact, - SearchOutcome, SearchOutcomeCompact, UpdateOutcome, UpdateOutcomeCompact, + BuildOutcome, CheckOutcome, CleanOutcome, InfoOutcome, InitOutcome, SearchOutcome, + UpdateOutcome, }; -pub use config::{ - ReadConfigOutcome, ReadConfigOutcomeCompact, WriteConfigOutcome, WriteConfigOutcomeCompact, -}; -pub use embed::{ - EmbedFilesOutcome, EmbedFilesOutcomeCompact, EmbedQueryOutcome, EmbedQueryOutcomeCompact, -}; -pub use index::{ - DeleteIndexOutcome, DeleteIndexOutcomeCompact, ReadIndexOutcome, ReadIndexOutcomeCompact, - WriteIndexOutcome, WriteIndexOutcomeCompact, -}; -pub use infer::{InferOutcome, InferOutcomeCompact}; -pub use model::{LoadModelOutcome, LoadModelOutcomeCompact}; -pub use scan::{ScanOutcome, ScanOutcomeCompact}; -pub use search::{ExecuteSearchOutcome, ExecuteSearchOutcomeCompact}; -pub use validate::{ValidateOutcome, ValidateOutcomeCompact}; - -/// Full outcome for all steps and commands. +pub use config::{ReadConfigOutcome, WriteConfigOutcome}; +pub use embed::{EmbedFilesOutcome, EmbedQueryOutcome}; +pub use index::{DeleteIndexOutcome, ReadIndexOutcome, WriteIndexOutcome}; +pub use infer::InferOutcome; +pub use model::LoadModelOutcome; +pub use scan::ScanOutcome; +pub use search::ExecuteSearchOutcome; +pub use validate::ValidateOutcome; + +/// Outcome for all steps and commands. /// /// Each variant wraps a named outcome struct carrying all data needed for -/// verbose rendering and JSON serialization. Command-level outcomes are -/// `Box`ed to avoid bloating the enum. +/// rendering and JSON serialization. Command-level outcomes are `Box`ed +/// to avoid bloating the enum. #[derive(Debug, Serialize)] pub enum Outcome { /// Delete the `.mdvs/` directory. @@ -119,136 +108,12 @@ impl Render for Outcome { } impl Outcome { - /// Convert this full outcome to its compact counterpart. - /// - /// Command outcomes may read `substeps` to derive summary data. - /// Leaf outcomes ignore `substeps`. - pub fn to_compact(&self, _substeps: &[Step]) -> CompactOutcome { - match self { - Self::DeleteIndex(o) => CompactOutcome::DeleteIndex(o.into()), - Self::ReadConfig(o) => CompactOutcome::ReadConfig(o.into()), - Self::Scan(o) => CompactOutcome::Scan(o.into()), - Self::ReadIndex(o) => CompactOutcome::ReadIndex(o.into()), - Self::Validate(o) => CompactOutcome::Validate(o.into()), - Self::Infer(o) => CompactOutcome::Infer(o.into()), - Self::WriteConfig(o) => CompactOutcome::WriteConfig(o.into()), - Self::Clean(o) => CompactOutcome::Clean(o.into()), - Self::Check(o) => CompactOutcome::Check(Box::new(o.as_ref().into())), - Self::Info(o) => CompactOutcome::Info(Box::new(o.as_ref().into())), - Self::Classify(o) => CompactOutcome::Classify(o.into()), - Self::LoadModel(o) => CompactOutcome::LoadModel(o.into()), - Self::EmbedFiles(o) => CompactOutcome::EmbedFiles(o.into()), - Self::WriteIndex(o) => CompactOutcome::WriteIndex(o.into()), - Self::Init(o) => CompactOutcome::Init(Box::new(o.as_ref().into())), - Self::Update(o) => CompactOutcome::Update(Box::new(o.as_ref().into())), - Self::EmbedQuery(o) => CompactOutcome::EmbedQuery(o.into()), - Self::ExecuteSearch(o) => CompactOutcome::ExecuteSearch(o.into()), - Self::Build(o) => CompactOutcome::Build(Box::new(o.as_ref().into())), - Self::Search(o) => CompactOutcome::Search(Box::new(o.as_ref().into())), - } - } - /// Returns `true` if this outcome contains validation violations. - /// - /// Used for exit code logic. Only Validate and Check outcomes can - /// return `true` — added when those variants are implemented. pub fn contains_violations(&self) -> bool { match self { Self::Validate(v) => !v.violations.is_empty(), Self::Check(c) => !c.violations.is_empty(), - Self::DeleteIndex(_) - | Self::ReadConfig(_) - | Self::Scan(_) - | Self::ReadIndex(_) - | Self::Infer(_) - | Self::WriteConfig(_) - | Self::Clean(_) - | Self::Classify(_) - | Self::LoadModel(_) - | Self::EmbedFiles(_) - | Self::WriteIndex(_) - | Self::Info(_) - | Self::Init(_) - | Self::EmbedQuery(_) - | Self::ExecuteSearch(_) - | Self::Update(_) - | Self::Build(_) - | Self::Search(_) => false, - } - } -} - -/// Compact outcome for all steps and commands. -/// -/// Mirrors `Outcome` with compact counterpart structs. Leaf compact outcomes -/// render to empty vecs (silent). Command compact outcomes render summaries. -#[derive(Debug, Serialize)] -pub enum CompactOutcome { - /// Delete the `.mdvs/` directory (compact). - DeleteIndex(DeleteIndexOutcomeCompact), - /// Read and parse `mdvs.toml` (compact). - ReadConfig(ReadConfigOutcomeCompact), - /// Scan the project directory (compact). - Scan(ScanOutcomeCompact), - /// Read the existing index (compact). - ReadIndex(ReadIndexOutcomeCompact), - /// Validate frontmatter (compact). - Validate(ValidateOutcomeCompact), - /// Clean command (compact). - Clean(CleanOutcomeCompact), - /// Check command (compact). - Check(Box), - /// Infer (compact). - Infer(InferOutcomeCompact), - /// Write config (compact). - WriteConfig(WriteConfigOutcomeCompact), - /// Info command (compact). - Info(Box), - /// Init command (compact). - Init(Box), - /// Classify (compact). - Classify(ClassifyOutcomeCompact), - /// Load model (compact). - LoadModel(LoadModelOutcomeCompact), - /// Embed files (compact). - EmbedFiles(EmbedFilesOutcomeCompact), - /// Write index (compact). - WriteIndex(WriteIndexOutcomeCompact), - /// Update command (compact). - Update(Box), - /// Embed query (compact). - EmbedQuery(EmbedQueryOutcomeCompact), - /// Execute search (compact). - ExecuteSearch(ExecuteSearchOutcomeCompact), - /// Build command (compact). - Build(Box), - /// Search command (compact). - Search(Box), -} - -impl Render for CompactOutcome { - fn render(&self) -> Vec { - match self { - Self::DeleteIndex(o) => o.render(), - Self::ReadConfig(o) => o.render(), - Self::Scan(o) => o.render(), - Self::ReadIndex(o) => o.render(), - Self::Validate(o) => o.render(), - Self::Infer(o) => o.render(), - Self::WriteConfig(o) => o.render(), - Self::Clean(o) => o.render(), - Self::Check(o) => o.render(), - Self::Classify(o) => o.render(), - Self::LoadModel(o) => o.render(), - Self::EmbedFiles(o) => o.render(), - Self::WriteIndex(o) => o.render(), - Self::Info(o) => o.render(), - Self::Init(o) => o.render(), - Self::Update(o) => o.render(), - Self::EmbedQuery(o) => o.render(), - Self::ExecuteSearch(o) => o.render(), - Self::Build(o) => o.render(), - Self::Search(o) => o.render(), + _ => false, } } } @@ -256,7 +121,6 @@ impl Render for CompactOutcome { #[cfg(test)] mod tests { use super::*; - use crate::step::{ErrorKind, StepError, StepOutcome}; use std::path::PathBuf; #[test] @@ -271,45 +135,6 @@ mod tests { assert_eq!(blocks.len(), 2); } - #[test] - fn compact_outcome_render_delegates() { - let outcome = CompactOutcome::Clean(CleanOutcomeCompact { - removed: true, - path: PathBuf::from(".mdvs"), - }); - let blocks = outcome.render(); - assert_eq!(blocks.len(), 1); // Compact command renders summary - } - - #[test] - fn compact_leaf_is_silent() { - let outcome = CompactOutcome::DeleteIndex(DeleteIndexOutcomeCompact { - removed: true, - path: ".mdvs".into(), - files_removed: 1, - size_bytes: 100, - }); - assert!(outcome.render().is_empty()); - } - - #[test] - fn to_compact_roundtrip() { - let outcome = Outcome::Clean(CleanOutcome { - removed: true, - path: PathBuf::from(".mdvs"), - files_removed: 3, - size_bytes: 2048, - }); - let compact = outcome.to_compact(&[]); - match &compact { - CompactOutcome::Clean(c) => { - assert!(c.removed); - assert_eq!(c.path, PathBuf::from(".mdvs")); - } - _ => panic!("expected Clean compact"), - } - } - #[test] fn contains_violations_false_for_clean() { let outcome = Outcome::Clean(CleanOutcome { @@ -320,69 +145,4 @@ mod tests { }); assert!(!outcome.contains_violations()); } - - #[test] - fn step_to_compact_full_tree() { - let leaf = Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Ok(Outcome::DeleteIndex(DeleteIndexOutcome { - removed: true, - path: ".mdvs".into(), - files_removed: 2, - size_bytes: 1024, - })), - elapsed_ms: 5, - }, - }; - let command = Step { - substeps: vec![leaf], - outcome: StepOutcome::Complete { - result: Ok(Outcome::Clean(CleanOutcome { - removed: true, - path: PathBuf::from(".mdvs"), - files_removed: 2, - size_bytes: 1024, - })), - elapsed_ms: 5, - }, - }; - - let compact = command.to_compact(); - assert_eq!(compact.substeps.len(), 1); - // Leaf compact renders silent - assert!(compact.substeps[0].render().is_empty()); - // Command compact renders summary - let blocks = compact.render(); - assert!(!blocks.is_empty()); - } - - #[test] - fn step_to_compact_error_preserved() { - let step: Step = Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: "test error".into(), - }), - elapsed_ms: 1, - }, - }; - let compact = step.to_compact(); - match &compact.outcome { - StepOutcome::Complete { result: Err(e), .. } => assert_eq!(e.message, "test error"), - _ => panic!("expected error preserved"), - } - } - - #[test] - fn step_to_compact_skipped_preserved() { - let step: Step = Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }; - let compact = step.to_compact(); - assert!(matches!(compact.outcome, StepOutcome::Skipped)); - } } diff --git a/src/outcome/model.rs b/src/outcome/model.rs index 12f2bfb..034a876 100644 --- a/src/outcome/model.rs +++ b/src/outcome/model.rs @@ -18,27 +18,3 @@ impl Render for LoadModelOutcome { vec![Block::Line(format!("Load model: {}", self.model_name))] } } - -/// Compact outcome for the load_model step (identical). -#[derive(Debug, Serialize)] -pub struct LoadModelOutcomeCompact { - /// Name of the embedding model loaded. - pub model_name: String, - /// Embedding dimension. - pub dimension: usize, -} - -impl Render for LoadModelOutcomeCompact { - fn render(&self) -> Vec { - vec![] - } -} - -impl From<&LoadModelOutcome> for LoadModelOutcomeCompact { - fn from(o: &LoadModelOutcome) -> Self { - Self { - model_name: o.model_name.clone(), - dimension: o.dimension, - } - } -} diff --git a/src/outcome/scan.rs b/src/outcome/scan.rs index d9534bb..d82887f 100644 --- a/src/outcome/scan.rs +++ b/src/outcome/scan.rs @@ -22,27 +22,3 @@ impl Render for ScanOutcome { ))] } } - -/// Compact outcome for the scan step (identical — no verbose-only fields). -#[derive(Debug, Serialize)] -pub struct ScanOutcomeCompact { - /// Number of markdown files found. - pub files_found: usize, - /// Glob pattern used for scanning. - pub glob: String, -} - -impl Render for ScanOutcomeCompact { - fn render(&self) -> Vec { - vec![] // Leaf compact outcomes are silent - } -} - -impl From<&ScanOutcome> for ScanOutcomeCompact { - fn from(o: &ScanOutcome) -> Self { - Self { - files_found: o.files_found, - glob: o.glob.clone(), - } - } -} diff --git a/src/outcome/search.rs b/src/outcome/search.rs index 13fdf0a..06aadd1 100644 --- a/src/outcome/search.rs +++ b/src/outcome/search.rs @@ -16,22 +16,3 @@ impl Render for ExecuteSearchOutcome { vec![Block::Line(format!("Execute search: {} hits", self.hits))] } } - -/// Compact outcome for the execute_search step (identical). -#[derive(Debug, Serialize)] -pub struct ExecuteSearchOutcomeCompact { - /// Number of hits found. - pub hits: usize, -} - -impl Render for ExecuteSearchOutcomeCompact { - fn render(&self) -> Vec { - vec![] - } -} - -impl From<&ExecuteSearchOutcome> for ExecuteSearchOutcomeCompact { - fn from(o: &ExecuteSearchOutcome) -> Self { - Self { hits: o.hits } - } -} diff --git a/src/outcome/validate.rs b/src/outcome/validate.rs index 98039b5..cf051fd 100644 --- a/src/outcome/validate.rs +++ b/src/outcome/validate.rs @@ -3,9 +3,7 @@ use serde::Serialize; use crate::block::{Block, Render}; -use crate::output::{ - format_file_count, FieldViolation, FieldViolationCompact, NewField, NewFieldCompact, -}; +use crate::output::{format_file_count, FieldViolation, NewField}; /// Full outcome for the validate step. #[derive(Debug, Serialize)] @@ -31,34 +29,3 @@ impl Render for ValidateOutcome { ))] } } - -/// Compact outcome for the validate step. -#[derive(Debug, Serialize)] -pub struct ValidateOutcomeCompact { - /// Number of markdown files validated. - pub files_checked: usize, - /// Compact violations (count only, no file paths). - pub violations: Vec, - /// Compact new fields (count only, no file paths). - pub new_fields: Vec, -} - -impl Render for ValidateOutcomeCompact { - fn render(&self) -> Vec { - vec![] // Leaf compact outcomes are silent - } -} - -impl From<&ValidateOutcome> for ValidateOutcomeCompact { - fn from(o: &ValidateOutcome) -> Self { - Self { - files_checked: o.files_checked, - violations: o - .violations - .iter() - .map(FieldViolationCompact::from) - .collect(), - new_fields: o.new_fields.iter().map(NewFieldCompact::from).collect(), - } - } -} diff --git a/src/output.rs b/src/output.rs index 95f3e85..33733af 100644 --- a/src/output.rs +++ b/src/output.rs @@ -83,33 +83,6 @@ pub struct DiscoveredField { pub hints: Vec, } -/// Compact version of [`DiscoveredField`] — summary only, no globs or hints. -#[derive(Debug, Serialize)] -pub struct DiscoveredFieldCompact { - /// Field name. - pub name: String, - /// Inferred type. - pub field_type: String, - /// Number of files containing this field. - pub files_found: usize, - /// Total scanned files. - pub total_files: usize, - /// Whether null values are accepted. - pub nullable: bool, -} - -impl From<&DiscoveredField> for DiscoveredFieldCompact { - fn from(f: &DiscoveredField) -> Self { - Self { - name: f.name.clone(), - field_type: f.field_type.clone(), - files_found: f.files_found, - total_files: f.total_files, - nullable: f.nullable, - } - } -} - /// A field whose definition changed between the previous and current scan. #[derive(Debug, Serialize)] pub struct ChangedField { @@ -195,39 +168,6 @@ pub struct RemovedField { pub allowed: Option>, } -/// Compact version of [`ChangedField`] — aspect labels only, no old/new values. -#[derive(Debug, Serialize)] -pub struct ChangedFieldCompact { - /// Field name. - pub name: String, - /// Labels of aspects that changed (e.g. `["type", "allowed"]`). - pub aspects: Vec, -} - -impl From<&ChangedField> for ChangedFieldCompact { - fn from(f: &ChangedField) -> Self { - Self { - name: f.name.clone(), - aspects: f.changes.iter().map(|c| c.label().to_string()).collect(), - } - } -} - -/// Compact version of [`RemovedField`] — name only, no globs. -#[derive(Debug, Serialize)] -pub struct RemovedFieldCompact { - /// Field name. - pub name: String, -} - -impl From<&RemovedField> for RemovedFieldCompact { - fn from(f: &RemovedField) -> Self { - Self { - name: f.name.clone(), - } - } -} - /// Category of a frontmatter validation failure. #[derive(Debug, Clone, PartialEq, Eq, Hash, Serialize)] pub enum ViolationKind { @@ -275,45 +215,6 @@ pub struct NewField { pub files: Option>, } -/// Compact version of [`FieldViolation`] — summary counts, no file paths. -#[derive(Debug, Serialize)] -pub struct FieldViolationCompact { - /// Name of the frontmatter field. - pub field: String, - /// What kind of violation occurred. - pub kind: ViolationKind, - /// Number of files that triggered this violation. - pub file_count: usize, -} - -impl From<&FieldViolation> for FieldViolationCompact { - fn from(v: &FieldViolation) -> Self { - Self { - field: v.field.clone(), - kind: v.kind.clone(), - file_count: v.files.len(), - } - } -} - -/// Compact version of [`NewField`] — name and count only, no file paths. -#[derive(Debug, Serialize)] -pub struct NewFieldCompact { - /// Field name. - pub name: String, - /// Number of files containing this field. - pub files_found: usize, -} - -impl From<&NewField> for NewFieldCompact { - fn from(nf: &NewField) -> Self { - Self { - name: nf.name.clone(), - files_found: nf.files_found, - } - } -} - /// Per-file chunk count for build output. #[derive(Debug, Serialize)] pub struct BuildFileDetail { diff --git a/src/step.rs b/src/step.rs index fcc12d2..75d9d04 100644 --- a/src/step.rs +++ b/src/step.rs @@ -1,93 +1,69 @@ -//! Core Step tree types for structured command output. +//! Flat command result types for structured output. //! -//! Every command returns a `Step` tree where `O` is the outcome type. -//! Leaf steps (scan, validate, etc.) have empty substeps. Commands (build, -//! search, etc.) have populated substeps forming a tree that mirrors the -//! execution pipeline. -//! -//! Two instantiations: `Step` (full data, verbose) and -//! `Step` (summary data, compact). Conversion between -//! them is recursive via `to_compact()`. +//! Every command returns a `CommandResult` with a flat list of process steps +//! and a final result. No recursive nesting — steps are always leaf entries. use crate::block::{Block, Render}; -use crate::outcome::{CompactOutcome, Outcome}; +use crate::outcome::Outcome; use serde::ser::SerializeMap; use serde::{Serialize, Serializer}; -/// A Step tree with full outcome data (verbose mode). -pub type FullStep = Step; - -/// A Step tree with compact outcome data (compact mode). -pub type CompactStep = Step; +/// A process step that completed successfully. +#[derive(Debug)] +pub struct ProcessStep { + /// The step's typed outcome data. + pub outcome: Outcome, + /// Wall-clock time for this step in milliseconds. + pub elapsed_ms: u64, +} -/// A node in the execution tree. -/// -/// Leaf steps have `substeps: vec![]`. Commands have populated substeps -/// representing the pipeline stages that ran. +/// A process step that failed. #[derive(Debug)] -pub struct Step { - /// Child steps that ran as part of this step's pipeline. - pub substeps: Vec>, - /// The outcome of this step itself. - pub outcome: StepOutcome, +pub struct FailedStep { + /// Whether this is a user error or application error. + pub kind: ErrorKind, + /// Human-readable error message. + pub message: String, + /// Wall-clock time before failure in milliseconds. + pub elapsed_ms: u64, } -/// The result of executing a step. +/// An entry in the command's step list. #[derive(Debug)] -pub enum StepOutcome { - /// The step ran, producing a successful outcome or an error. - Complete { - /// The step's result: `Ok` with outcome data, or `Err` with error details. - result: Result, - /// Wall-clock time for this step in milliseconds. - elapsed_ms: u64, - }, - /// The step was skipped (upstream failure, not needed, !verbose, etc.). +pub enum StepEntry { + /// The step ran successfully. + Completed(ProcessStep), + /// The step failed. + Failed(FailedStep), + /// The step was skipped (not needed based on command logic). Skipped, } -impl StepOutcome { - /// Returns the elapsed time if the step completed, `None` if skipped. - pub fn elapsed_ms(&self) -> Option { - match self { - Self::Complete { elapsed_ms, .. } => Some(*elapsed_ms), - Self::Skipped => None, - } +impl StepEntry { + /// Create a successful step entry. + pub fn ok(outcome: Outcome, elapsed_ms: u64) -> Self { + Self::Completed(ProcessStep { + outcome, + elapsed_ms, + }) } -} -impl Step { - /// Recursively convert the full tree to a compact tree. - /// - /// Each outcome is converted via `Outcome::to_compact()`, which may - /// read substep data for command-level summaries. Errors and Skipped - /// outcomes are preserved as-is. - pub fn to_compact(&self) -> Step { - let compact_outcome = match &self.outcome { - StepOutcome::Complete { - result: Ok(outcome), - elapsed_ms, - } => StepOutcome::Complete { - result: Ok(outcome.to_compact(&self.substeps)), - elapsed_ms: *elapsed_ms, - }, - StepOutcome::Complete { - result: Err(e), - elapsed_ms, - } => StepOutcome::Complete { - result: Err(e.clone()), - elapsed_ms: *elapsed_ms, - }, - StepOutcome::Skipped => StepOutcome::Skipped, - }; - Step { - substeps: self.substeps.iter().map(|s| s.to_compact()).collect(), - outcome: compact_outcome, - } + /// Create a failed step entry. + pub fn err(kind: ErrorKind, message: String, elapsed_ms: u64) -> Self { + Self::Failed(FailedStep { + kind, + message, + elapsed_ms, + }) + } + + /// Create a skipped step entry. + pub fn skipped() -> Self { + Self::Skipped } } -/// An error that occurred during a step. +/// An error that occurred during a step or command. #[derive(Debug, Clone, Serialize)] pub struct StepError { /// Whether this is a user error (bad input) or application error (internal failure). @@ -106,90 +82,145 @@ pub enum ErrorKind { Application, } -impl Step { - /// Create a leaf step (no substeps) with a successful outcome. - pub fn leaf(outcome: O, elapsed_ms: u64) -> Self { - Self { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Ok(outcome), - elapsed_ms, - }, - } +/// Result of running a command. +/// +/// Contains a flat list of process steps and a final result (success or error). +#[derive(Debug)] +pub struct CommandResult { + /// Process steps that ran (or were skipped) during this command. + pub steps: Vec, + /// The command's final result: `Ok` with outcome data, or `Err` with error details. + pub result: Result, + /// Total wall-clock time for the entire command in milliseconds. + pub elapsed_ms: u64, +} + +impl CommandResult { + /// Returns a reference to the successful outcome value, if any. + pub fn result_value(&self) -> Option<&Outcome> { + self.result.as_ref().ok() } - /// Create a leaf step with a failed outcome. - pub fn failed(kind: ErrorKind, message: String, elapsed_ms: u64) -> Self { - Self { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { kind, message }), - elapsed_ms, - }, + /// Render verbose output: process step lines + command outcome. + pub fn render_verbose(&self) -> Vec { + let mut blocks = vec![]; + + // Render each step with timing + for entry in &self.steps { + blocks.extend(entry.render()); + } + + // Render command outcome + match &self.result { + Ok(outcome) => blocks.extend(outcome.render()), + Err(e) => blocks.push(Block::Line(format!("Error: {}", e.message))), } + + blocks } - /// Create a skipped step. - pub fn skipped() -> Self { - Self { - substeps: vec![], - outcome: StepOutcome::Skipped, + /// Render compact output: command outcome only. + pub fn render_compact(&self) -> Vec { + match &self.result { + Ok(outcome) => outcome.render(), + Err(e) => vec![Block::Line(format!("Error: {}", e.message))], } } } // --- Free functions --- -/// Returns `true` if any step in the tree failed (has `Err` outcome). -pub fn has_failed(step: &Step) -> bool { - step.substeps.iter().any(|s| has_failed(s)) - || matches!(step.outcome, StepOutcome::Complete { result: Err(_), .. }) +/// Returns `true` if the command or any step failed. +pub fn has_failed(result: &CommandResult) -> bool { + result.result.is_err() + || result + .steps + .iter() + .any(|s| matches!(s, StepEntry::Failed(_))) } -/// Returns `true` if any step in the tree contains validation violations. -pub fn has_violations(step: &Step) -> bool { - step.substeps.iter().any(has_violations) - || match &step.outcome { - StepOutcome::Complete { - result: Ok(outcome), - .. - } => outcome.contains_violations(), +/// Returns `true` if any outcome contains validation violations. +pub fn has_violations(result: &CommandResult) -> bool { + let command_violations = match &result.result { + Ok(outcome) => outcome.contains_violations(), + Err(_) => false, + }; + command_violations + || result.steps.iter().any(|s| match s { + StepEntry::Completed(ps) => ps.outcome.contains_violations(), _ => false, + }) +} + +// --- Render impl for StepEntry --- + +impl Render for StepEntry { + fn render(&self) -> Vec { + match self { + Self::Completed(ps) => { + let outcome_blocks = ps.outcome.render(); + // Inject timing into the first Block::Line + let mut result = vec![]; + let mut injected = false; + for block in outcome_blocks { + if !injected { + if let Block::Line(text) = block { + result.push(Block::Line(format!("{text} ({}ms)", ps.elapsed_ms))); + injected = true; + continue; + } + } + result.push(block); + } + result + } + Self::Failed(fs) => { + vec![Block::Line(format!( + "Error: {} ({}ms)", + fs.message, fs.elapsed_ms + ))] + } + Self::Skipped => vec![], } + } } -// --- Serialize impls (hand-written, not derived) --- +// --- Serialize impls --- -impl Serialize for Step { +impl Serialize for CommandResult { fn serialize(&self, serializer: S) -> Result { - let mut map = serializer.serialize_map(Some(2))?; - map.serialize_entry("substeps", &self.substeps)?; - map.serialize_entry("outcome", &self.outcome)?; + let mut map = serializer.serialize_map(None)?; + map.serialize_entry("steps", &self.steps)?; + match &self.result { + Ok(outcome) => map.serialize_entry("result", outcome)?, + Err(error) => map.serialize_entry("error", error)?, + } + map.serialize_entry("elapsed_ms", &self.elapsed_ms)?; map.end() } } -impl Serialize for StepOutcome { +impl Serialize for StepEntry { fn serialize(&self, serializer: S) -> Result { match self { - Self::Complete { - result: Ok(outcome), - elapsed_ms, - } => { + Self::Completed(ps) => { let mut map = serializer.serialize_map(Some(3))?; map.serialize_entry("status", "complete")?; - map.serialize_entry("elapsed_ms", elapsed_ms)?; - map.serialize_entry("outcome", outcome)?; + map.serialize_entry("elapsed_ms", &ps.elapsed_ms)?; + map.serialize_entry("outcome", &ps.outcome)?; map.end() } - Self::Complete { - result: Err(error), - elapsed_ms, - } => { + Self::Failed(fs) => { let mut map = serializer.serialize_map(Some(3))?; map.serialize_entry("status", "failed")?; - map.serialize_entry("elapsed_ms", elapsed_ms)?; - map.serialize_entry("error", error)?; + map.serialize_entry("elapsed_ms", &fs.elapsed_ms)?; + map.serialize_entry( + "error", + &StepError { + kind: fs.kind.clone(), + message: fs.message.clone(), + }, + )?; map.end() } Self::Skipped => { @@ -201,153 +232,102 @@ impl Serialize for StepOutcome { } } -// --- Render impls --- - -impl Render for Step { - fn render(&self) -> Vec { - let mut blocks = vec![]; - - // Render all substeps first - for substep in &self.substeps { - blocks.extend(substep.render()); - } - - // Render own outcome — with timing injection for leaf steps - let outcome_blocks = self.outcome.render(); - if self.substeps.is_empty() { - // Leaf step: inject elapsed_ms into the first Block::Line - if let Some(elapsed) = self.outcome.elapsed_ms() { - let mut injected = false; - for block in outcome_blocks { - if !injected { - if let Block::Line(text) = block { - blocks.push(Block::Line(format!("{text} ({elapsed}ms)"))); - injected = true; - continue; - } - } - blocks.push(block); - } - } else { - blocks.extend(outcome_blocks); - } - } else { - // Command step: no timing injection, outcome renders as-is - blocks.extend(outcome_blocks); - } - - blocks - } -} - -impl Render for StepOutcome { - fn render(&self) -> Vec { - match self { - Self::Complete { - result: Ok(outcome), - .. - } => outcome.render(), - Self::Complete { result: Err(e), .. } => { - vec![Block::Line(format!("Error: {}", e.message))] - } - Self::Skipped => vec![], - } - } -} - #[cfg(test)] mod tests { use super::*; + use crate::outcome::{CleanOutcome, DeleteIndexOutcome, Outcome}; + use std::path::PathBuf; #[test] - fn leaf_step_complete() { - let step: Step = Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Ok("scanned 5 files".to_string()), - elapsed_ms: 42, - }, - }; - assert_eq!(step.outcome.elapsed_ms(), Some(42)); - assert!(step.substeps.is_empty()); + fn step_entry_ok() { + let entry = StepEntry::ok( + Outcome::Scan(crate::outcome::ScanOutcome { + files_found: 5, + glob: "**".into(), + }), + 42, + ); + assert!(matches!(entry, StepEntry::Completed(_))); } #[test] - fn leaf_step_failed() { - let step: Step = Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: "config not found".into(), - }), - elapsed_ms: 2, - }, - }; - assert_eq!(step.outcome.elapsed_ms(), Some(2)); - match &step.outcome { - StepOutcome::Complete { result: Err(e), .. } => { - assert_eq!(e.message, "config not found"); - } - _ => panic!("expected failed step"), - } + fn step_entry_err() { + let entry = StepEntry::err(ErrorKind::User, "not found".into(), 2); + assert!(matches!(entry, StepEntry::Failed(_))); } #[test] - fn skipped_step() { - let step: Step = Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }; - assert_eq!(step.outcome.elapsed_ms(), None); + fn step_entry_skipped() { + let entry = StepEntry::skipped(); + assert!(matches!(entry, StepEntry::Skipped)); } #[test] - fn step_tree_with_substeps() { - let leaf1: Step = Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Ok("scan".into()), - elapsed_ms: 10, - }, - }; - let leaf2: Step = Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Ok("validate".into()), - elapsed_ms: 5, - }, + fn command_result_has_failed_on_err() { + let result = CommandResult { + steps: vec![], + result: Err(StepError { + kind: ErrorKind::User, + message: "config not found".into(), + }), + elapsed_ms: 2, }; - let command: Step = Step { - substeps: vec![leaf1, leaf2], - outcome: StepOutcome::Complete { - result: Ok("build complete".into()), - elapsed_ms: 15, - }, - }; - assert_eq!(command.substeps.len(), 2); - assert_eq!(command.outcome.elapsed_ms(), Some(15)); + assert!(has_failed(&result)); } - // --- Render tests --- - - /// Implement Render for String so we can test Step rendering. - impl Render for String { - fn render(&self) -> Vec { - vec![Block::Line(self.clone())] - } + #[test] + fn command_result_has_failed_on_step_failure() { + let result = CommandResult { + steps: vec![StepEntry::err( + ErrorKind::Application, + "scan failed".into(), + 0, + )], + result: Ok(Outcome::Clean(CleanOutcome { + removed: true, + path: PathBuf::from(".mdvs"), + files_removed: 1, + size_bytes: 100, + })), + elapsed_ms: 5, + }; + assert!(has_failed(&result)); } #[test] - fn render_leaf_step_injects_timing() { - let step: Step = Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Ok("Scan: 43 files".into()), - elapsed_ms: 15, - }, + fn command_result_success() { + let result = CommandResult { + steps: vec![StepEntry::ok( + Outcome::DeleteIndex(DeleteIndexOutcome { + removed: true, + path: ".mdvs".into(), + files_removed: 2, + size_bytes: 1024, + }), + 3, + )], + result: Ok(Outcome::Clean(CleanOutcome { + removed: true, + path: PathBuf::from(".mdvs"), + files_removed: 2, + size_bytes: 1024, + })), + elapsed_ms: 5, }; - let blocks = step.render(); + assert!(!has_failed(&result)); + assert!(result.result_value().is_some()); + } + + #[test] + fn render_step_entry_with_timing() { + let entry = StepEntry::ok( + Outcome::Scan(crate::outcome::ScanOutcome { + files_found: 43, + glob: "**".into(), + }), + 15, + ); + let blocks = entry.render(); assert_eq!(blocks.len(), 1); match &blocks[0] { Block::Line(s) => assert_eq!(s, "Scan: 43 files (15ms)"), @@ -356,209 +336,127 @@ mod tests { } #[test] - fn render_command_step_no_timing() { - let leaf: Step = Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Ok("Scan: 5 files".into()), - elapsed_ms: 10, - }, - }; - let command: Step = Step { - substeps: vec![leaf], - outcome: StepOutcome::Complete { - result: Ok("Built index".into()), - elapsed_ms: 100, - }, - }; - let blocks = command.render(); - assert_eq!(blocks.len(), 2); - // First block: substep with timing + fn render_failed_step() { + let entry = StepEntry::err(ErrorKind::User, "config not found".into(), 2); + let blocks = entry.render(); + assert_eq!(blocks.len(), 1); match &blocks[0] { - Block::Line(s) => assert_eq!(s, "Scan: 5 files (10ms)"), - _ => panic!("expected Line"), - } - // Second block: command outcome WITHOUT timing - match &blocks[1] { - Block::Line(s) => assert_eq!(s, "Built index"), + Block::Line(s) => assert_eq!(s, "Error: config not found (2ms)"), _ => panic!("expected Line"), } } #[test] fn render_skipped_step_empty() { - let step: Step = Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }; - let blocks = step.render(); - assert!(blocks.is_empty()); + let entry = StepEntry::skipped(); + assert!(entry.render().is_empty()); } #[test] - fn render_error_step() { - let step: Step = Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: "config not found".into(), + fn render_verbose() { + let result = CommandResult { + steps: vec![StepEntry::ok( + Outcome::Scan(crate::outcome::ScanOutcome { + files_found: 5, + glob: "**".into(), }), - elapsed_ms: 2, - }, + 10, + )], + result: Ok(Outcome::Clean(CleanOutcome { + removed: true, + path: PathBuf::from(".mdvs"), + files_removed: 1, + size_bytes: 100, + })), + elapsed_ms: 15, }; - let blocks = step.render(); - assert_eq!(blocks.len(), 1); + let blocks = result.render_verbose(); + assert_eq!(blocks.len(), 3); // step line + 2 clean lines match &blocks[0] { - Block::Line(s) => assert_eq!(s, "Error: config not found (2ms)"), + Block::Line(s) => assert!(s.contains("Scan") && s.contains("(10ms)")), _ => panic!("expected Line"), } } #[test] - fn render_empty_outcome_no_timing_crash() { - // An outcome that renders to empty vec — timing injection does nothing - struct EmptyOutcome; - impl Render for EmptyOutcome { - fn render(&self) -> Vec { - vec![] - } - } - let step: Step = Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Ok(EmptyOutcome), - elapsed_ms: 5, - }, - }; - let blocks = step.render(); - assert!(blocks.is_empty()); - } - - #[test] - fn step_error_is_clone() { - let err = StepError { - kind: ErrorKind::Application, - message: "I/O error".into(), - }; - let cloned = err.clone(); - assert_eq!(cloned.message, "I/O error"); - } - - // --- Serialize tests --- - - #[test] - fn serialize_step_complete_ok() { - use crate::outcome::{CleanOutcome, Outcome}; - use std::path::PathBuf; - - let step = Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Ok(Outcome::Clean(CleanOutcome { - removed: true, - path: PathBuf::from(".mdvs"), - files_removed: 2, - size_bytes: 1024, - })), - elapsed_ms: 5, - }, - }; - let json = serde_json::to_value(&step).unwrap(); - assert_eq!(json["outcome"]["status"], "complete"); - assert_eq!(json["outcome"]["elapsed_ms"], 5); - assert!(json["outcome"]["outcome"]["Clean"].is_object()); - assert_eq!(json["outcome"]["outcome"]["Clean"]["removed"], true); - assert_eq!(json["substeps"].as_array().unwrap().len(), 0); - } - - #[test] - fn serialize_step_complete_err() { - let step: Step = Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Err(StepError { - kind: ErrorKind::User, - message: "config not found".into(), + fn render_compact_no_steps() { + let result = CommandResult { + steps: vec![StepEntry::ok( + Outcome::Scan(crate::outcome::ScanOutcome { + files_found: 5, + glob: "**".into(), }), - elapsed_ms: 2, - }, + 10, + )], + result: Ok(Outcome::Clean(CleanOutcome { + removed: true, + path: PathBuf::from(".mdvs"), + files_removed: 1, + size_bytes: 100, + })), + elapsed_ms: 15, }; - let json = serde_json::to_value(&step).unwrap(); - assert_eq!(json["outcome"]["status"], "failed"); - assert_eq!(json["outcome"]["elapsed_ms"], 2); - assert_eq!(json["outcome"]["error"]["kind"], "user"); - assert_eq!(json["outcome"]["error"]["message"], "config not found"); + let blocks = result.render_compact(); + assert_eq!(blocks.len(), 2); // 2 clean lines, no step line } #[test] - fn serialize_step_skipped() { - let step: Step = Step { - substeps: vec![], - outcome: StepOutcome::Skipped, - }; - let json = serde_json::to_value(&step).unwrap(); - assert_eq!(json["outcome"]["status"], "skipped"); - assert!(json["outcome"].get("elapsed_ms").is_none()); - } - - #[test] - fn serialize_step_tree_recursive() { - use crate::outcome::{DeleteIndexOutcome, Outcome}; - - let leaf = Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Ok(Outcome::DeleteIndex(DeleteIndexOutcome { + fn serialize_verbose_json() { + let result = CommandResult { + steps: vec![StepEntry::ok( + Outcome::DeleteIndex(DeleteIndexOutcome { removed: true, path: ".mdvs".into(), files_removed: 1, size_bytes: 512, - })), - elapsed_ms: 3, - }, + }), + 3, + )], + result: Ok(Outcome::Clean(CleanOutcome { + removed: true, + path: PathBuf::from(".mdvs"), + files_removed: 1, + size_bytes: 512, + })), + elapsed_ms: 5, }; - let command = Step { - substeps: vec![leaf], - outcome: StepOutcome::Complete { - result: Ok(Outcome::Clean(crate::outcome::CleanOutcome { - removed: true, - path: std::path::PathBuf::from(".mdvs"), - files_removed: 1, - size_bytes: 512, - })), - elapsed_ms: 3, - }, + let json = serde_json::to_value(&result).unwrap(); + assert_eq!(json["steps"].as_array().unwrap().len(), 1); + assert_eq!(json["steps"][0]["status"], "complete"); + assert!(json["steps"][0]["outcome"]["DeleteIndex"].is_object()); + assert!(json["result"]["Clean"].is_object()); + assert_eq!(json["elapsed_ms"], 5); + assert!(json.get("error").is_none()); + } + + #[test] + fn serialize_error_json() { + let result = CommandResult { + steps: vec![StepEntry::err( + ErrorKind::User, + "config not found".into(), + 2, + )], + result: Err(StepError { + kind: ErrorKind::User, + message: "config not found".into(), + }), + elapsed_ms: 2, }; - let json = serde_json::to_value(&command).unwrap(); - assert_eq!(json["substeps"].as_array().unwrap().len(), 1); - let sub = &json["substeps"][0]; - assert_eq!(sub["outcome"]["status"], "complete"); - assert!(sub["outcome"]["outcome"]["DeleteIndex"].is_object()); + let json = serde_json::to_value(&result).unwrap(); + assert!(json.get("result").is_none()); + assert_eq!(json["error"]["kind"], "user"); + assert_eq!(json["error"]["message"], "config not found"); + assert_eq!(json["steps"][0]["status"], "failed"); } #[test] - fn serialize_compact_step() { - use crate::outcome::{CleanOutcome, Outcome}; - use std::path::PathBuf; - - let step = Step { - substeps: vec![], - outcome: StepOutcome::Complete { - result: Ok(Outcome::Clean(CleanOutcome { - removed: true, - path: PathBuf::from(".mdvs"), - files_removed: 2, - size_bytes: 1024, - })), - elapsed_ms: 5, - }, + fn step_error_is_clone() { + let err = StepError { + kind: ErrorKind::Application, + message: "I/O error".into(), }; - let compact = step.to_compact(); - let json = serde_json::to_value(&compact).unwrap(); - assert_eq!(json["outcome"]["status"], "complete"); - assert!(json["outcome"]["outcome"]["Clean"].is_object()); - // Compact Clean has removed + path, no files_removed/size_bytes - assert_eq!(json["outcome"]["outcome"]["Clean"]["removed"], true); + let cloned = err.clone(); + assert_eq!(cloned.message, "I/O error"); } } From cb14a57f2def816c3dec63583db7218397f33d26 Mon Sep 17 00:00:00 2001 From: edoch Date: Mon, 23 Mar 2026 12:43:21 +0100 Subject: [PATCH 15/35] docs: mark TODOs 0131, 0132, 0135, 0136, 0137 as done - 0131: pipeline cleanup complete (all 7 waves) - 0132: subsumed by 0137 (CompactOutcome deleted, no macro needed) - 0135: subsumed by 0137 (flat structure has no padding concept) - 0136: inline auto-update/auto-build complete (all 3 waves) - 0137: flatten Step tree complete (both waves) Co-Authored-By: Claude --- docs/spec/todos/TODO-0131.md | 3 ++- docs/spec/todos/TODO-0132.md | 3 ++- docs/spec/todos/TODO-0135.md | 3 ++- docs/spec/todos/TODO-0136.md | 3 ++- docs/spec/todos/TODO-0137.md | 3 ++- docs/spec/todos/index.md | 10 +++++----- 6 files changed, 15 insertions(+), 10 deletions(-) diff --git a/docs/spec/todos/TODO-0131.md b/docs/spec/todos/TODO-0131.md index 82d8e1e..749047c 100644 --- a/docs/spec/todos/TODO-0131.md +++ b/docs/spec/todos/TODO-0131.md @@ -1,11 +1,12 @@ --- id: 131 title: "Step tree: delete old pipeline, full cleanup" -status: todo +status: done priority: high created: 2026-03-19 depends_on: [130] blocks: [132, 133] +completed: 2026-03-23 --- # TODO-0131: Step tree: delete old pipeline, full cleanup diff --git a/docs/spec/todos/TODO-0132.md b/docs/spec/todos/TODO-0132.md index 5fc56b9..ff8ef31 100644 --- a/docs/spec/todos/TODO-0132.md +++ b/docs/spec/todos/TODO-0132.md @@ -1,11 +1,12 @@ --- id: 132 title: "Macro for compact struct generation (crabtime)" -status: todo +status: done priority: low created: 2026-03-19 depends_on: [131] blocks: [] +completed: 2026-03-23 --- # TODO-0132: Macro for compact struct generation (crabtime) diff --git a/docs/spec/todos/TODO-0135.md b/docs/spec/todos/TODO-0135.md index 08d9143..f1dfc6f 100644 --- a/docs/spec/todos/TODO-0135.md +++ b/docs/spec/todos/TODO-0135.md @@ -1,11 +1,12 @@ --- id: 135 title: Remove Skipped padding from Step tree error paths -status: todo +status: done priority: medium created: 2026-03-20 depends_on: [131] blocks: [] +completed: 2026-03-23 --- # TODO-0135: Remove Skipped padding from Step tree error paths diff --git a/docs/spec/todos/TODO-0136.md b/docs/spec/todos/TODO-0136.md index 0421401..1dad124 100644 --- a/docs/spec/todos/TODO-0136.md +++ b/docs/spec/todos/TODO-0136.md @@ -1,11 +1,12 @@ --- id: 136 title: Inline auto-update and auto-build logic to eliminate redundant reads -status: todo +status: done priority: high created: 2026-03-20 depends_on: [131] blocks: [] +completed: 2026-03-23 --- # TODO-0136: Inline auto-update and auto-build logic to eliminate redundant reads diff --git a/docs/spec/todos/TODO-0137.md b/docs/spec/todos/TODO-0137.md index 9110f7f..08dbffd 100644 --- a/docs/spec/todos/TODO-0137.md +++ b/docs/spec/todos/TODO-0137.md @@ -1,11 +1,12 @@ --- id: 137 title: Flatten Step tree into steps + result structure -status: todo +status: done priority: high created: 2026-03-21 depends_on: [136] blocks: [] +completed: 2026-03-23 --- # TODO-0137: Flatten Step tree into steps + result structure diff --git a/docs/spec/todos/index.md b/docs/spec/todos/index.md index 9aae7fc..04297e7 100644 --- a/docs/spec/todos/index.md +++ b/docs/spec/todos/index.md @@ -132,10 +132,10 @@ | [0128](TODO-0128.md) | Step tree: convert init command | done | high | 2026-03-19 | | [0129](TODO-0129.md) | Step tree: convert update command | done | high | 2026-03-19 | | [0130](TODO-0130.md) | Step tree: convert build + search commands | done | high | 2026-03-19 | -| [0131](TODO-0131.md) | Step tree: delete old pipeline, update main.rs, simplify output.rs | todo | high | 2026-03-19 | -| [0132](TODO-0132.md) | Macro for compact struct generation (crabtime) | todo | low | 2026-03-19 | +| [0131](TODO-0131.md) | Step tree: delete old pipeline, update main.rs, simplify output.rs | done | high | 2026-03-19 | +| [0132](TODO-0132.md) | Macro for compact struct generation (crabtime) | done (subsumed by 0137) | low | 2026-03-19 | | [0133](TODO-0133.md) | Macro for step pipeline boilerplate (early-return pattern) | todo | low | 2026-03-19 | | [0134](TODO-0134.md) | Step tree post-migration cleanup | todo | medium | 2026-03-20 | -| [0135](TODO-0135.md) | Remove Skipped padding from Step tree error paths | todo | medium | 2026-03-20 | -| [0136](TODO-0136.md) | Inline auto-update and auto-build logic to eliminate redundant reads | todo | high | 2026-03-20 | -| [0137](TODO-0137.md) | Flatten Step tree into steps + result structure | todo | high | 2026-03-21 | +| [0135](TODO-0135.md) | Remove Skipped padding from Step tree error paths | done | medium | 2026-03-20 | +| [0136](TODO-0136.md) | Inline auto-update and auto-build logic to eliminate redundant reads | done | high | 2026-03-20 | +| [0137](TODO-0137.md) | Flatten Step tree into steps + result structure | done | high | 2026-03-21 | From 5a962f828db6a30aeb506066b0e7ffdffa4748ac Mon Sep 17 00:00:00 2001 From: edoch Date: Mon, 23 Mar 2026 16:23:35 +0100 Subject: [PATCH 16/35] refactor: use untagged serialization for Outcome enum Add #[serde(untagged)] so Outcome variants serialize their inner struct directly without the variant name wrapper. JSON output changes from { "Check": { ... } } to { "files_checked": 43, ... }. Co-Authored-By: Claude --- src/outcome/mod.rs | 1 + src/step.rs | 107 ++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 106 insertions(+), 2 deletions(-) diff --git a/src/outcome/mod.rs b/src/outcome/mod.rs index 719cbf3..07295a3 100644 --- a/src/outcome/mod.rs +++ b/src/outcome/mod.rs @@ -37,6 +37,7 @@ pub use validate::ValidateOutcome; /// rendering and JSON serialization. Command-level outcomes are `Box`ed /// to avoid bloating the enum. #[derive(Debug, Serialize)] +#[serde(untagged)] pub enum Outcome { /// Delete the `.mdvs/` directory. DeleteIndex(DeleteIndexOutcome), diff --git a/src/step.rs b/src/step.rs index 75d9d04..7280dd6 100644 --- a/src/step.rs +++ b/src/step.rs @@ -7,6 +7,7 @@ use crate::block::{Block, Render}; use crate::outcome::Outcome; use serde::ser::SerializeMap; use serde::{Serialize, Serializer}; +use std::time::Instant; /// A process step that completed successfully. #[derive(Debug)] @@ -119,6 +120,35 @@ impl CommandResult { blocks } + /// Create a failed result by extracting the error message from the last failed step. + pub fn failed_from_steps(steps: Vec, start: Instant) -> Self { + let msg = steps + .iter() + .rev() + .find_map(|s| match s { + StepEntry::Failed(f) => Some(f.message.clone()), + _ => None, + }) + .unwrap_or_else(|| "step failed".into()); + Self { + steps, + result: Err(StepError { + kind: ErrorKind::Application, + message: msg, + }), + elapsed_ms: start.elapsed().as_millis() as u64, + } + } + + /// Create a failed result with an explicit error. + pub fn failed(steps: Vec, kind: ErrorKind, message: String, start: Instant) -> Self { + Self { + steps, + result: Err(StepError { kind, message }), + elapsed_ms: start.elapsed().as_millis() as u64, + } + } + /// Render compact output: command outcome only. pub fn render_compact(&self) -> Vec { match &self.result { @@ -423,8 +453,8 @@ mod tests { let json = serde_json::to_value(&result).unwrap(); assert_eq!(json["steps"].as_array().unwrap().len(), 1); assert_eq!(json["steps"][0]["status"], "complete"); - assert!(json["steps"][0]["outcome"]["DeleteIndex"].is_object()); - assert!(json["result"]["Clean"].is_object()); + assert!(json["steps"][0]["outcome"]["removed"].is_boolean()); // untagged: fields directly + assert!(json["result"]["removed"].is_boolean()); // untagged: no "Clean" wrapper assert_eq!(json["elapsed_ms"], 5); assert!(json.get("error").is_none()); } @@ -459,4 +489,77 @@ mod tests { let cloned = err.clone(); assert_eq!(cloned.message, "I/O error"); } + + // --- has_violations tests --- + + #[test] + fn has_violations_none() { + let result = CommandResult { + steps: vec![], + result: Ok(Outcome::Clean(CleanOutcome { + removed: true, + path: PathBuf::from(".mdvs"), + files_removed: 1, + size_bytes: 100, + })), + elapsed_ms: 5, + }; + assert!(!has_violations(&result)); + } + + #[test] + fn has_violations_in_result() { + use crate::outcome::CheckOutcome; + use crate::output::{FieldViolation, ViolatingFile, ViolationKind}; + + let result = CommandResult { + steps: vec![], + result: Ok(Outcome::Check(Box::new(CheckOutcome { + files_checked: 1, + violations: vec![FieldViolation { + field: "draft".into(), + kind: ViolationKind::WrongType, + rule: "type Boolean".into(), + files: vec![ViolatingFile { + path: "post.md".into(), + detail: None, + }], + }], + new_fields: vec![], + }))), + elapsed_ms: 5, + }; + assert!(has_violations(&result)); + } + + #[test] + fn has_violations_in_step() { + use crate::outcome::ValidateOutcome; + use crate::output::{FieldViolation, ViolatingFile, ViolationKind}; + + let result = CommandResult { + steps: vec![StepEntry::ok( + Outcome::Validate(ValidateOutcome { + files_checked: 1, + violations: vec![FieldViolation { + field: "title".into(), + kind: ViolationKind::MissingRequired, + rule: "required".into(), + files: vec![ViolatingFile { + path: "bare.md".into(), + detail: None, + }], + }], + new_fields: vec![], + }), + 10, + )], + result: Err(StepError { + kind: ErrorKind::User, + message: "violations found".into(), + }), + elapsed_ms: 15, + }; + assert!(has_violations(&result)); + } } From 6ade7ebc404f3537c3419ed466aa2b348cb569f0 Mon Sep 17 00:00:00 2001 From: edoch Date: Mon, 23 Mar 2026 16:23:56 +0100 Subject: [PATCH 17/35] refactor: sort fields in mdvs.toml, add unit tests, unify fail helpers - Sort [[fields.field]] alphabetically by name in MdvsToml::write() - Add 19 unit tests for private functions: validate_where_clause, read_lines, has_violations, type_matches, matches_any_glob - Remove unused test imports in update.rs - Unify per-command fail helpers into shared CommandResult::failed() and CommandResult::failed_from_steps() constructors - Delete 7 per-command helpers (fail_from_last, fail_early, fail_from_last_substep, fail_msg) Co-Authored-By: Claude --- src/cmd/build.rs | 40 ++++----------- src/cmd/check.rs | 120 +++++++++++++++++++++++++++++-------------- src/cmd/clean.rs | 29 ++--------- src/cmd/info.rs | 21 ++------ src/cmd/init.rs | 53 +++---------------- src/cmd/search.rs | 117 +++++++++++++++++++++++++---------------- src/cmd/update.rs | 59 +++++---------------- src/schema/config.rs | 19 +++---- 8 files changed, 206 insertions(+), 252 deletions(-) diff --git a/src/cmd/build.rs b/src/cmd/build.rs index c5d3c1e..7a50f1c 100644 --- a/src/cmd/build.rs +++ b/src/cmd/build.rs @@ -13,7 +13,7 @@ use crate::outcome::{ use crate::output::BuildFileDetail; use crate::schema::config::{BuildConfig, MdvsToml, SearchConfig, TomlField}; use crate::schema::shared::{ChunkingConfig, EmbeddingModelConfig, FieldTypeSerde}; -use crate::step::{CommandResult, ErrorKind, StepEntry, StepError}; +use crate::step::{CommandResult, ErrorKind, StepEntry}; use std::collections::{HashMap, HashSet}; use std::path::Path; use std::time::Instant; @@ -224,7 +224,7 @@ pub async fn run( }; let mut config = match config { Some(c) => c, - None => return fail_from_last(&mut steps, start), + None => return CommandResult::failed_from_steps(std::mem::take(&mut steps), start), }; let should_update = !no_update && config.build.as_ref().is_some_and(|b| b.auto_update); @@ -241,7 +241,7 @@ pub async fn run( if let Some(msg) = mutation_error { steps.push(StepEntry::err(ErrorKind::User, msg, 0)); - return fail_from_last(&mut steps, start); + return CommandResult::failed_from_steps(std::mem::take(&mut steps), start); } // 3. Core build pipeline (scan → auto-update → validate → classify → embed → write) @@ -256,7 +256,7 @@ pub async fn run( .await { Ok(result) => result, - Err(()) => return fail_from_last(&mut steps, start), + Err(()) => return CommandResult::failed_from_steps(std::mem::take(&mut steps), start), }; CommandResult { @@ -759,25 +759,6 @@ pub(crate) async fn build_core( Ok((outcome, embedder)) } -/// Extract error from last failed step and return a failed CommandResult. -fn fail_from_last(steps: &mut Vec, start: Instant) -> CommandResult { - let msg = match steps.iter().rev().find_map(|s| match s { - StepEntry::Failed(f) => Some(f.message.clone()), - _ => None, - }) { - Some(m) => m, - None => "step failed".into(), - }; - CommandResult { - steps: std::mem::take(steps), - result: Err(StepError { - kind: ErrorKind::Application, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - } -} - // ============================================================================ // Helpers // ============================================================================ @@ -929,6 +910,7 @@ mod tests { read_build_metadata, read_chunk_rows, read_file_index, read_parquet, }; use crate::schema::config::MdvsToml; + use crate::step::StepError; use datafusion::arrow::datatypes::DataType; use std::collections::{HashMap, HashSet}; use std::fs; @@ -1154,7 +1136,7 @@ mod tests { assert!(!crate::step::has_failed(&output)); // Verify no model/chunking sections (auto-flag sections are present from init) - let config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); + let mut config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); assert!(config.embedding_model.is_none()); assert!(config.chunking.is_none()); @@ -1167,7 +1149,7 @@ mod tests { ); // Verify sections were written - let config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); + let mut config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); assert_eq!(config.embedding_model.as_ref().unwrap().name, DEFAULT_MODEL); assert!(config.embedding_model.as_ref().unwrap().revision.is_none()); assert_eq!( @@ -1272,7 +1254,7 @@ mod tests { output ); - let config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); + let mut config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); assert_eq!(config.chunking.as_ref().unwrap().max_chunk_size, 512); } @@ -1330,7 +1312,7 @@ mod tests { ) .unwrap(); - let config = MdvsToml { + let mut config = MdvsToml { scan: crate::schema::shared::ScanConfig { glob: "**".into(), include_bare_files: false, @@ -1402,7 +1384,7 @@ mod tests { ) .unwrap(); - let config = MdvsToml { + let mut config = MdvsToml { scan: crate::schema::shared::ScanConfig { glob: "**".into(), include_bare_files: false, @@ -1473,7 +1455,7 @@ mod tests { ) .unwrap(); - let config = MdvsToml { + let mut config = MdvsToml { scan: crate::schema::shared::ScanConfig { glob: "**".into(), include_bare_files: false, diff --git a/src/cmd/check.rs b/src/cmd/check.rs index 98984df..88e0cf6 100644 --- a/src/cmd/check.rs +++ b/src/cmd/check.rs @@ -8,7 +8,7 @@ use crate::outcome::{ use crate::output::{FieldViolation, NewField, ViolatingFile, ViolationKind}; use crate::schema::config::{MdvsToml, TomlField}; use crate::schema::shared::FieldTypeSerde; -use crate::step::{CommandResult, ErrorKind, StepEntry, StepError}; +use crate::step::{CommandResult, ErrorKind, StepEntry}; use globset::Glob; use serde::Serialize; use serde_json::Value; @@ -86,18 +86,7 @@ pub fn run(path: &Path, no_update: bool, verbose: bool) -> CommandResult { let config = match config { Some(c) => c, None => { - let msg = match &steps[0] { - StepEntry::Failed(f) => f.message.clone(), - _ => "failed to read config".into(), - }; - return CommandResult { - steps, - result: Err(StepError { - kind: ErrorKind::User, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }; + return CommandResult::failed_from_steps(steps, start); } }; @@ -120,15 +109,7 @@ pub fn run(path: &Path, no_update: bool, verbose: bool) -> CommandResult { e.to_string(), scan_start.elapsed().as_millis() as u64, )); - let msg = e.to_string(); - return CommandResult { - steps, - result: Err(StepError { - kind: ErrorKind::Application, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }; + return CommandResult::failed_from_steps(steps, start); } }; @@ -192,14 +173,12 @@ pub fn run(path: &Path, no_update: bool, verbose: bool) -> CommandResult { e.to_string(), write_start.elapsed().as_millis() as u64, )); - return CommandResult { + return CommandResult::failed( steps, - result: Err(StepError { - kind: ErrorKind::Application, - message: "auto-update failed to write config".into(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }; + ErrorKind::Application, + "auto-update failed to write config".into(), + start, + ); } } } @@ -217,14 +196,12 @@ pub fn run(path: &Path, no_update: bool, verbose: bool) -> CommandResult { e.to_string(), validate_start.elapsed().as_millis() as u64, )); - return CommandResult { + return CommandResult::failed( steps, - result: Err(StepError { - kind: ErrorKind::Application, - message: "validation failed".into(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }; + ErrorKind::Application, + "validation failed".into(), + start, + ); } }; @@ -515,7 +492,7 @@ mod tests { } fn write_toml(dir: &Path, fields: Vec, ignore: Vec) { - let config = MdvsToml { + let mut config = MdvsToml { scan: ScanConfig { glob: "**".into(), include_bare_files: false, @@ -768,7 +745,7 @@ mod tests { ) .unwrap(); - let config = MdvsToml { + let mut config = MdvsToml { scan: ScanConfig { glob: "**".into(), include_bare_files: true, @@ -1063,4 +1040,71 @@ mod tests { null_violations.len() ); } + + // --- Unit tests for type_matches --- + + #[test] + fn string_matches_anything() { + use serde_json::json; + assert!(type_matches(&FieldType::String, &json!(true))); + assert!(type_matches(&FieldType::String, &json!(42))); + assert!(type_matches(&FieldType::String, &json!("hello"))); + assert!(type_matches(&FieldType::String, &json!([1, 2]))); + assert!(type_matches(&FieldType::String, &json!({"a": 1}))); + } + + #[test] + fn bool_matches_bool() { + use serde_json::json; + assert!(type_matches(&FieldType::Boolean, &json!(true))); + assert!(type_matches(&FieldType::Boolean, &json!(false))); + assert!(!type_matches(&FieldType::Boolean, &json!("yes"))); + } + + #[test] + fn int_matches_int() { + use serde_json::json; + assert!(type_matches(&FieldType::Integer, &json!(42))); + assert!(!type_matches(&FieldType::Integer, &json!(1.5))); + assert!(!type_matches(&FieldType::Integer, &json!("42"))); + } + + #[test] + fn float_matches_any_number() { + use serde_json::json; + assert!(type_matches(&FieldType::Float, &json!(1.5))); + assert!(type_matches(&FieldType::Float, &json!(42))); // int-in-float lenient + assert!(!type_matches(&FieldType::Float, &json!("1.5"))); + } + + #[test] + fn array_checks_elements() { + use serde_json::json; + let arr_str = FieldType::Array(Box::new(FieldType::String)); + assert!(type_matches(&arr_str, &json!(["a", "b"]))); + assert!(type_matches(&arr_str, &json!([1, 2]))); // String matches anything + + let arr_int = FieldType::Array(Box::new(FieldType::Integer)); + assert!(type_matches(&arr_int, &json!([1, 2]))); + assert!(!type_matches(&arr_int, &json!([1.5, 2.5]))); + } + + // --- Unit tests for matches_any_glob --- + + #[test] + fn glob_star_star_matches_all() { + assert!(matches_any_glob(&["**".into()], "blog/post.md")); + assert!(matches_any_glob(&["**".into()], "deep/nested/path.md")); + } + + #[test] + fn glob_specific_path() { + assert!(matches_any_glob(&["blog/**".into()], "blog/post.md")); + assert!(!matches_any_glob(&["blog/**".into()], "notes/idea.md")); + } + + #[test] + fn glob_empty_patterns() { + assert!(!matches_any_glob(&[], "anything.md")); + } } diff --git a/src/cmd/clean.rs b/src/cmd/clean.rs index aa79149..5b7f8bb 100644 --- a/src/cmd/clean.rs +++ b/src/cmd/clean.rs @@ -1,7 +1,7 @@ use crate::index::backend::Backend; use crate::outcome::commands::CleanOutcome; use crate::outcome::{DeleteIndexOutcome, Outcome}; -use crate::step::{CommandResult, ErrorKind, StepEntry, StepError}; +use crate::step::{CommandResult, ErrorKind, StepEntry}; use std::path::{Path, PathBuf}; use std::time::Instant; use tracing::instrument; @@ -45,14 +45,7 @@ pub fn run(path: &Path) -> CommandResult { msg.clone(), delete_start.elapsed().as_millis() as u64, )); - return CommandResult { - steps, - result: Err(StepError { - kind: ErrorKind::User, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }; + return CommandResult::failed(steps, ErrorKind::User, msg, start); } let (removed, path_str, files_removed, size_bytes) = if mdvs_dir.exists() { @@ -64,14 +57,7 @@ pub fn run(path: &Path) -> CommandResult { e.to_string(), delete_start.elapsed().as_millis() as u64, )); - return CommandResult { - steps, - result: Err(StepError { - kind: ErrorKind::Application, - message: e.to_string(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }; + return CommandResult::failed_from_steps(steps, start); } }; @@ -82,14 +68,7 @@ pub fn run(path: &Path) -> CommandResult { e.to_string(), delete_start.elapsed().as_millis() as u64, )); - return CommandResult { - steps, - result: Err(StepError { - kind: ErrorKind::Application, - message: e.to_string(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }; + return CommandResult::failed_from_steps(steps, start); } ( diff --git a/src/cmd/info.rs b/src/cmd/info.rs index f07a26f..a0d64be 100644 --- a/src/cmd/info.rs +++ b/src/cmd/info.rs @@ -4,7 +4,7 @@ use crate::outcome::commands::InfoOutcome; use crate::outcome::{Outcome, ReadConfigOutcome, ReadIndexOutcome, ScanOutcome}; use crate::output::{field_hints, FieldHint}; use crate::schema::config::MdvsToml; -use crate::step::{CommandResult, ErrorKind, StepEntry, StepError}; +use crate::step::{CommandResult, ErrorKind, StepEntry}; use serde::Serialize; use serde_json::Value; use std::collections::HashMap; @@ -100,18 +100,7 @@ pub fn run(path: &Path, _verbose: bool) -> CommandResult { let config = match config { Some(c) => c, None => { - let msg = match &steps[0] { - StepEntry::Failed(f) => f.message.clone(), - _ => "failed to read config".into(), - }; - return CommandResult { - steps, - result: Err(StepError { - kind: ErrorKind::User, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }; + return CommandResult::failed_from_steps(steps, start); } }; @@ -277,7 +266,7 @@ mod tests { } fn write_config(dir: &Path) { - let config = MdvsToml { + let mut config = MdvsToml { scan: ScanConfig { glob: "**".into(), include_bare_files: false, @@ -351,7 +340,7 @@ mod tests { assert_eq!(result.scan_glob, "**"); assert_eq!(result.files_on_disk, 2); assert_eq!(result.fields.len(), 3); - assert_eq!(result.fields[0].name, "title"); + assert_eq!(result.fields[0].name, "draft"); // alphabetically sorted assert_eq!(result.ignored_fields, vec!["internal_id"]); assert!(result.index.is_none()); } @@ -399,7 +388,7 @@ mod tests { "---\nauthor's_note: hello\n---\n# Note\nBody.", ) .unwrap(); - let config = MdvsToml { + let mut config = MdvsToml { scan: ScanConfig { glob: "**".into(), include_bare_files: false, diff --git a/src/cmd/init.rs b/src/cmd/init.rs index cbdad96..7fb7300 100644 --- a/src/cmd/init.rs +++ b/src/cmd/init.rs @@ -5,7 +5,7 @@ use crate::outcome::{InferOutcome, Outcome, ScanOutcome, WriteConfigOutcome}; use crate::output::DiscoveredField; use crate::schema::config::MdvsToml; use crate::schema::shared::ScanConfig; -use crate::step::{CommandResult, ErrorKind, StepEntry, StepError}; +use crate::step::{CommandResult, ErrorKind, StepEntry}; use std::path::Path; use std::time::Instant; use tracing::{info, instrument}; @@ -29,25 +29,25 @@ pub fn run( // Pre-checks if !path.is_dir() { - return fail_early( + return CommandResult::failed( steps, - start, ErrorKind::User, format!("'{}' is not a directory", path.display()), + start, ); } let config_path = path.join("mdvs.toml"); let mdvs_dir = path.join(".mdvs"); if !force && (config_path.exists() || mdvs_dir.exists()) { - return fail_early( + return CommandResult::failed( steps, - start, ErrorKind::User, format!( "mdvs is already initialized in '{}' (use --force to reinitialize)", path.display() ), + start, ); } @@ -84,7 +84,7 @@ pub fn run( e.to_string(), scan_start.elapsed().as_millis() as u64, )); - return fail_from_last_substep(&mut steps, start); + return CommandResult::failed_from_steps(std::mem::take(&mut steps), start); } }; @@ -92,14 +92,7 @@ pub fn run( if scanned.files.is_empty() { let msg = format!("no markdown files found in '{}'", path.display()); steps.push(StepEntry::err(ErrorKind::User, msg.clone(), 0)); - return CommandResult { - steps, - result: Err(StepError { - kind: ErrorKind::User, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }; + return CommandResult::failed(steps, ErrorKind::User, msg, start); } // 2b. Infer — InferredSchema::infer() is infallible @@ -127,7 +120,7 @@ pub fn run( steps.push(StepEntry::skipped()); } else { let write_start = Instant::now(); - let toml_doc = MdvsToml::from_inferred(&schema, scan_config); + let mut toml_doc = MdvsToml::from_inferred(&schema, scan_config); match toml_doc.write(&config_path) { Ok(()) => { steps.push(StepEntry::ok( @@ -160,36 +153,6 @@ pub fn run( } } -/// Helper: return a failed CommandResult with the given error. -fn fail_early( - steps: Vec, - start: Instant, - kind: ErrorKind, - message: String, -) -> CommandResult { - CommandResult { - steps, - result: Err(StepError { kind, message }), - elapsed_ms: start.elapsed().as_millis() as u64, - } -} - -/// Helper: extract error from last step and return a failed CommandResult. -fn fail_from_last_substep(steps: &mut Vec, start: Instant) -> CommandResult { - let msg = match steps.last() { - Some(StepEntry::Failed(f)) => f.message.clone(), - _ => "step failed".into(), - }; - CommandResult { - steps: std::mem::take(steps), - result: Err(StepError { - kind: ErrorKind::Application, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - } -} - #[cfg(test)] mod tests { use super::*; diff --git a/src/cmd/search.rs b/src/cmd/search.rs index 640e831..0a60831 100644 --- a/src/cmd/search.rs +++ b/src/cmd/search.rs @@ -7,7 +7,7 @@ use crate::outcome::{ ReadIndexOutcome, }; use crate::schema::config::MdvsToml; -use crate::step::{CommandResult, ErrorKind, StepEntry, StepError}; +use crate::step::{CommandResult, ErrorKind, StepEntry}; use std::path::Path; use std::time::Instant; use tracing::{instrument, warn}; @@ -117,7 +117,12 @@ pub async fn run( build_embedder = embedder; } Err(()) => { - return fail_msg(&mut steps, start, ErrorKind::User, "auto-build failed"); + return CommandResult::failed( + std::mem::take(&mut steps), + ErrorKind::User, + "auto-build failed".into(), + start, + ); } } } @@ -170,7 +175,7 @@ pub async fn run( } } None => { - return fail_from_last(&mut steps, start); + return CommandResult::failed_from_steps(std::mem::take(&mut steps), start); } }; @@ -197,7 +202,7 @@ pub async fn run( // 3. Load model — calls ModelConfig::try_from() + Embedder::load() directly if let Some(msg) = pre_check_error { steps.push(StepEntry::err(ErrorKind::User, msg, 0)); - return fail_from_last(&mut steps, start); + return CommandResult::failed_from_steps(std::mem::take(&mut steps), start); } // 3. Load model (reuse from build if available) @@ -231,7 +236,7 @@ pub async fn run( e.to_string(), model_start.elapsed().as_millis() as u64, )); - return fail_from_last(&mut steps, start); + return CommandResult::failed_from_steps(std::mem::take(&mut steps), start); } }, Err(e) => { @@ -240,7 +245,7 @@ pub async fn run( e.to_string(), model_start.elapsed().as_millis() as u64, )); - return fail_from_last(&mut steps, start); + return CommandResult::failed_from_steps(std::mem::take(&mut steps), start); } } }; @@ -266,7 +271,7 @@ pub async fn run( if let Some(w) = where_clause { if let Err(msg) = validate_where_clause(w) { steps.push(StepEntry::err(ErrorKind::User, msg, 0)); - return fail_from_last(&mut steps, start); + return CommandResult::failed_from_steps(std::mem::take(&mut steps), start); } } @@ -288,7 +293,7 @@ pub async fn run( e.to_string(), search_start.elapsed().as_millis() as u64, )); - return fail_from_last(&mut steps, start); + return CommandResult::failed_from_steps(std::mem::take(&mut steps), start); } }; @@ -319,40 +324,6 @@ pub async fn run( } } -fn fail_from_last(steps: &mut Vec, start: Instant) -> CommandResult { - let msg = match steps.iter().rev().find_map(|s| match s { - StepEntry::Failed(f) => Some(f.message.clone()), - _ => None, - }) { - Some(m) => m, - None => "step failed".into(), - }; - CommandResult { - steps: std::mem::take(steps), - result: Err(StepError { - kind: ErrorKind::Application, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - } -} - -fn fail_msg( - steps: &mut Vec, - start: Instant, - kind: ErrorKind, - msg: &str, -) -> CommandResult { - CommandResult { - steps: std::mem::take(steps), - result: Err(StepError { - kind, - message: msg.into(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - } -} - #[cfg(test)] mod tests { use super::*; @@ -360,6 +331,7 @@ mod tests { use crate::outcome::commands::SearchOutcome; use crate::schema::config::{FieldsConfig, MdvsToml, SearchConfig, UpdateConfig}; use crate::schema::shared::{ChunkingConfig, EmbeddingModelConfig, ScanConfig}; + use crate::step::StepError; use std::fs; fn unwrap_search(result: &CommandResult) -> &SearchOutcome { @@ -392,7 +364,7 @@ mod tests { } fn write_config(dir: &Path, model_name: &str) { - let config = MdvsToml { + let mut config = MdvsToml { scan: ScanConfig { glob: "**".into(), include_bare_files: false, @@ -490,7 +462,7 @@ mod tests { create_test_vault(tmp.path()); init_and_build(tmp.path()).await; - let config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); + let mut config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); let backend = Backend::parquet(tmp.path()); let embedding = config.embedding_model.as_ref().unwrap(); let model_config = ModelConfig::try_from(embedding).unwrap(); @@ -516,7 +488,7 @@ mod tests { create_test_vault(tmp.path()); init_and_build(tmp.path()).await; - let config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); + let mut config = MdvsToml::read(&tmp.path().join("mdvs.toml")).unwrap(); let backend = Backend::parquet(tmp.path()); let embedding = config.embedding_model.as_ref().unwrap(); let model_config = ModelConfig::try_from(embedding).unwrap(); @@ -631,4 +603,59 @@ mod tests { ); } } + + // --- Unit tests for validate_where_clause --- + + #[test] + fn validate_where_valid() { + assert!(validate_where_clause("draft = false").is_ok()); + } + + #[test] + fn validate_where_empty() { + assert!(validate_where_clause("").is_ok()); + } + + #[test] + fn validate_where_unmatched_single() { + assert!(validate_where_clause("name = 'O'Brien'").is_err()); + } + + #[test] + fn validate_where_unmatched_double() { + assert!(validate_where_clause("x = \"bad").is_err()); + } + + #[test] + fn validate_where_balanced_quotes() { + assert!(validate_where_clause("name = 'O''Brien'").is_ok()); + } + + // --- Unit tests for read_lines --- + + #[test] + fn read_lines_valid_range() { + let tmp = tempfile::tempdir().unwrap(); + let file = tmp.path().join("test.md"); + std::fs::write(&file, "line1\nline2\nline3\nline4\n").unwrap(); + let result = read_lines(&file, 2, 3); + assert_eq!(result, Some("line2\nline3".to_string())); + } + + #[test] + fn read_lines_out_of_bounds() { + let tmp = tempfile::tempdir().unwrap(); + let file = tmp.path().join("test.md"); + std::fs::write(&file, "line1\n").unwrap(); + assert!(read_lines(&file, 10, 20).is_none()); + } + + #[test] + fn read_lines_single_line() { + let tmp = tempfile::tempdir().unwrap(); + let file = tmp.path().join("test.md"); + std::fs::write(&file, "only\n").unwrap(); + let result = read_lines(&file, 1, 1); + assert_eq!(result, Some("only".to_string())); + } } diff --git a/src/cmd/update.rs b/src/cmd/update.rs index 709374d..fb0f68d 100644 --- a/src/cmd/update.rs +++ b/src/cmd/update.rs @@ -5,7 +5,7 @@ use crate::outcome::{InferOutcome, Outcome, ReadConfigOutcome, ScanOutcome, Writ use crate::output::{ChangedField, FieldChange, RemovedField}; use crate::schema::config::{MdvsToml, TomlField}; use crate::schema::shared::FieldTypeSerde; -use crate::step::{CommandResult, ErrorKind, StepEntry, StepError}; +use crate::step::{CommandResult, ErrorKind, StepEntry}; use std::collections::HashMap; use std::path::Path; use std::time::Instant; @@ -26,11 +26,11 @@ pub async fn run( // Pre-check: flag conflict if !reinfer.is_empty() && reinfer_all { - return fail_early( + return CommandResult::failed( steps, - start, ErrorKind::User, "cannot use --reinfer and --reinfer-all together".into(), + start, ); } @@ -54,7 +54,7 @@ pub async fn run( format!("mdvs.toml is invalid: {e} — fix the file or run 'mdvs init --force'"), config_start.elapsed().as_millis() as u64, )); - return fail_from_last_substep(&mut steps, start); + return CommandResult::failed_from_steps(std::mem::take(&mut steps), start); } }, Err(e) => { @@ -63,18 +63,18 @@ pub async fn run( e.to_string(), config_start.elapsed().as_millis() as u64, )); - return fail_from_last_substep(&mut steps, start); + return CommandResult::failed_from_steps(std::mem::take(&mut steps), start); } }; // Pre-check: reinfer field names exist for name in reinfer { if !config.fields.field.iter().any(|f| f.name == *name) { - return fail_early( + return CommandResult::failed( std::mem::take(&mut steps), - start, ErrorKind::User, format!("field '{name}' is not in mdvs.toml"), + start, ); } } @@ -98,7 +98,7 @@ pub async fn run( e.to_string(), scan_start.elapsed().as_millis() as u64, )); - return fail_from_last_substep(&mut steps, start); + return CommandResult::failed_from_steps(std::mem::take(&mut steps), start); } }; @@ -238,14 +238,12 @@ pub async fn run( e.to_string(), write_start.elapsed().as_millis() as u64, )); - return CommandResult { + return CommandResult::failed( steps, - result: Err(StepError { - kind: ErrorKind::Application, - message: "failed to write config".into(), - }), - elapsed_ms: start.elapsed().as_millis() as u64, - }; + ErrorKind::Application, + "failed to write config".into(), + start, + ); } } } @@ -264,40 +262,11 @@ pub async fn run( } } -fn fail_early( - steps: Vec, - start: Instant, - kind: ErrorKind, - message: String, -) -> CommandResult { - CommandResult { - steps, - result: Err(StepError { kind, message }), - elapsed_ms: start.elapsed().as_millis() as u64, - } -} - -fn fail_from_last_substep(steps: &mut Vec, start: Instant) -> CommandResult { - let msg = match steps.last() { - Some(StepEntry::Failed(f)) => f.message.clone(), - _ => "step failed".into(), - }; - CommandResult { - steps: std::mem::take(steps), - result: Err(StepError { - kind: ErrorKind::Application, - message: msg, - }), - elapsed_ms: start.elapsed().as_millis() as u64, - } -} - #[cfg(test)] mod tests { use super::*; use crate::outcome::commands::UpdateOutcome; - use crate::output::ViolationKind; - use crate::schema::config::{FieldsConfig, MdvsToml, UpdateConfig}; + use crate::schema::config::MdvsToml; use crate::schema::shared::{FieldTypeSerde, ScanConfig}; use std::fs; diff --git a/src/schema/config.rs b/src/schema/config.rs index 2f4faca..7c10e45 100644 --- a/src/schema/config.rs +++ b/src/schema/config.rs @@ -225,7 +225,8 @@ impl MdvsToml { /// Serialize this config to TOML and write it to disk. /// Complex field types are post-processed into inline tables for readability. #[instrument(name = "write_config", skip_all, level = "debug")] - pub fn write(&self, path: &Path) -> anyhow::Result<()> { + pub fn write(&mut self, path: &Path) -> anyhow::Result<()> { + self.fields.field.sort_by(|a, b| a.name.cmp(&b.name)); let content = toml::to_string(self)?; let content = inline_field_types(&content)?; fs::write(path, content)?; @@ -365,7 +366,7 @@ mod tests { #[test] fn mdvs_toml_roundtrip() { - let toml_doc = MdvsToml { + let mut toml_doc = MdvsToml { scan: ScanConfig { glob: "**".into(), include_bare_files: false, @@ -498,7 +499,7 @@ default_limit = 10 #[test] fn empty_fields_list_roundtrip() { - let doc = full_toml(vec![]); + let mut doc = full_toml(vec![]); let toml_str = toml::to_string(&doc).unwrap(); let parsed: MdvsToml = toml::from_str(&toml_str).unwrap(); assert_eq!(parsed.fields.field.len(), 0); @@ -540,7 +541,7 @@ default_limit = 10 include_bare_files: false, skip_gitignore: false, }; - let toml_doc = MdvsToml::from_inferred(&schema, scan); + let mut toml_doc = MdvsToml::from_inferred(&schema, scan); assert_eq!(toml_doc.scan.glob, "**"); assert!(!toml_doc.scan.include_bare_files); @@ -573,7 +574,7 @@ default_limit = 10 include_bare_files: true, skip_gitignore: false, }; - let toml_doc = MdvsToml::from_inferred(&schema, scan); + let mut toml_doc = MdvsToml::from_inferred(&schema, scan); assert_eq!(toml_doc.scan.glob, "docs/**"); assert!(toml_doc.scan.include_bare_files); assert!(toml_doc.embedding_model.is_none()); @@ -588,7 +589,7 @@ default_limit = 10 include_bare_files: false, skip_gitignore: false, }; - let toml_doc = MdvsToml::from_inferred(&schema, scan); + let mut toml_doc = MdvsToml::from_inferred(&schema, scan); assert!(toml_doc.embedding_model.is_none()); assert!(toml_doc.chunking.is_none()); assert!(toml_doc.build.is_some()); @@ -613,7 +614,7 @@ default_limit = 10 include_bare_files: false, skip_gitignore: false, }; - let toml_doc = MdvsToml::from_inferred(&schema, scan); + let mut toml_doc = MdvsToml::from_inferred(&schema, scan); let dir = tempfile::tempdir().unwrap(); let path = dir.path().join("mdvs.toml"); @@ -625,7 +626,7 @@ default_limit = 10 #[test] fn write_uses_inline_tables_for_type() { - let doc = full_toml(vec![ + let mut doc = full_toml(vec![ TomlField { name: "tags".into(), field_type: FieldTypeSerde::Array { @@ -675,7 +676,7 @@ default_limit = 10 #[test] fn validation_only_roundtrip() { - let doc = MdvsToml { + let mut doc = MdvsToml { scan: ScanConfig { glob: "**".into(), include_bare_files: false, From eb0440f5bcb6b846fe8a369afd9588166ffc9a5d Mon Sep 17 00:00:00 2001 From: edoch Date: Mon, 23 Mar 2026 16:24:10 +0100 Subject: [PATCH 18/35] docs: close TODOs 0119, 0122, 0133, 0134, 0138, 0139; add 0138, 0139; update 0100 Step tree architecture complete: - 0119 (umbrella), 0122 (outcome enums): done - 0133 (macro): subsumed by 0139 (shared fail helpers) - 0134 (post-migration cleanup): done - 0138 (untagged JSON): done - 0139 (unify fail helpers): done - 0100 (text output redesign): updated with current state Co-Authored-By: Claude --- docs/spec/todos/TODO-0100.md | 64 +++++++++++++++++------------ docs/spec/todos/TODO-0119.md | 3 +- docs/spec/todos/TODO-0122.md | 3 +- docs/spec/todos/TODO-0133.md | 48 ++++------------------ docs/spec/todos/TODO-0134.md | 3 +- docs/spec/todos/TODO-0138.md | 71 ++++++++++++++++++++++++++++++++ docs/spec/todos/TODO-0139.md | 78 ++++++++++++++++++++++++++++++++++++ docs/spec/todos/index.md | 10 +++-- 8 files changed, 207 insertions(+), 73 deletions(-) create mode 100644 docs/spec/todos/TODO-0138.md create mode 100644 docs/spec/todos/TODO-0139.md diff --git a/docs/spec/todos/TODO-0100.md b/docs/spec/todos/TODO-0100.md index 33efd53..4297966 100644 --- a/docs/spec/todos/TODO-0100.md +++ b/docs/spec/todos/TODO-0100.md @@ -12,52 +12,66 @@ blocks: [] ## Summary -The current text output tables across all commands are chaotic and inconsistent. Tables have missing headers, empty columns, no standard structure, and generally don't make sense. Redesign the text output format for every command with a coherent, uniform style. +The text output tables across all commands are inconsistent. Tables lack headers, have empty columns, and each command's layout was implemented independently. Redesign the text output format for every command with a coherent, uniform style. -## Problems +## Current state (post TODO-0119/0137) -- **No headers**: tables lack column headers, making it hard to understand what each column means -- **Empty columns**: some tables have columns that are frequently blank (e.g., hints column in init/update when no special characters) -- **Inconsistent structure**: each command's output was implemented independently with different table layouts, column counts, and conventions -- **Compact vs verbose gap**: the relationship between compact and verbose output varies wildly across commands +The output architecture is now clean: +- `CommandResult` with flat `Vec` + `Result` +- Each outcome struct implements `Render` → `Vec` (Block::Line, Block::Table, Block::Section) +- `format_text()` in `render.rs` consumes blocks and produces terminal output via `tabled` +- Compact = command outcome only; verbose = step lines + command outcome +- Errors force verbose + +**What's settled:** +- Compact/verbose dispatch model (no more CompactOutcome) +- Step lines show timing: `"Read config: example_kb/mdvs.toml (5ms)"` +- JSON output is clean (`#[serde(untagged)]`, flat steps array) + +**What's still wrong — the Render impls themselves:** +- No table headers — columns are unlabeled +- Empty columns (e.g., hints column in init when no special characters) +- Inconsistent table structure across commands +- No column width strategy (tables don't adapt to terminal width proportionally) +- Record-style detail rows are hard to read ## Scope -All 7 commands need their text output redesigned. After TODO-0099 and TODO-0110, init is schema-only (no build stats), and check/build/search have optional auto-step preamble lines (rendered by the recursive output architecture from TODO-0110). +All 7 commands' Render impls need redesigning: -- `init` — discovery table only, no build section (compact + verbose) -- `check` — optional auto-update summary line + violations table, new fields table (compact + verbose) -- `update` — added/changed/removed tables (compact + verbose) -- `build` — optional auto-update summary line + classification + embedding summary (compact + verbose) -- `search` — optional auto-update/auto-build summary lines + results table (compact + verbose) -- `info` — config + index status (compact + verbose) -- `clean` — deletion summary (compact + verbose) +| Command | Current output | Issues | +|---------|---------------|--------| +| `init` | Per-field record tables with detail rows | Too verbose for compact, detail rows hard to scan | +| `check` | Violation record tables + new field tables | No headers, kind column is raw enum name | +| `update` | Added/changed/removed record tables | Change tables have 4 columns, hard to read | +| `build` | Embedded/unchanged/removed tables | Detail rows list every file — noisy | +| `search` | Per-hit record tables with chunk text | Good structure, but no headers | +| `info` | Index status table + per-field record tables | Two different table styles mixed | +| `clean` | Two plain lines | Fine as-is | ## Approach 1. Define a standard output style guide: header conventions, column alignment, empty column handling, summary line format -2. Design each command's output on paper first (both compact and verbose) -3. Implement across all commands in a single pass for consistency +2. Design each command's output on paper first +3. Implement across all commands in a single pass 4. Update book pages to match ## Column width strategy -Use percentage-based column widths relative to terminal width. `tabled`'s `Width::list()` accepts absolute widths, so compute them at render time: +Use percentage-based column widths relative to terminal width via `tabled`'s `Width::list()`: ```rust let total = terminal_size(); -// e.g., 20%, 50%, 30% let widths = [total * 20 / 100, total * 50 / 100, total * 30 / 100]; table.with(Width::list(widths)); ``` -This keeps the table dynamic (adapts to terminal size) while giving each column a consistent proportion. Add a helper like `width_percent(total, &[20, 50, 30])` in `table.rs` so all commands use the same mechanism. - -Each command defines its own column proportions, but the helper and rendering style are shared. +Add a helper like `width_percent(total, &[20, 50, 30])` in render.rs so all commands use the same mechanism. ## Files -- `src/cmd/*.rs` — all command output formatting -- `src/table.rs` — table style helpers, add `width_percent()` helper -- `src/output.rs` — shared output structs -- `book/src/commands/*.md` — book pages with output examples +- `src/outcome/commands/*.rs` — all 7 command Render impls +- `src/outcome/*.rs` — leaf step Render impls (minor — step lines are already fine) +- `src/render.rs` — add width helpers, possibly update `format_text()` +- `src/block.rs` — possibly extend Block::Table with header support +- `book/src/commands/*.md` — update output examples diff --git a/docs/spec/todos/TODO-0119.md b/docs/spec/todos/TODO-0119.md index 7e2ecb6..a6ec53e 100644 --- a/docs/spec/todos/TODO-0119.md +++ b/docs/spec/todos/TODO-0119.md @@ -1,9 +1,10 @@ --- id: 119 title: "Unified Step tree architecture — replace pipeline/command output split" -status: todo +status: done priority: high created: 2026-03-18 +completed: 2026-03-23 depends_on: [] blocks: [100, 101] supersedes: [110] diff --git a/docs/spec/todos/TODO-0122.md b/docs/spec/todos/TODO-0122.md index ab20fc4..f66f862 100644 --- a/docs/spec/todos/TODO-0122.md +++ b/docs/spec/todos/TODO-0122.md @@ -1,9 +1,10 @@ --- id: 122 title: "Step tree: Outcome enums and all outcome structs" -status: todo +status: done priority: high created: 2026-03-19 +completed: 2026-03-23 depends_on: [120, 121] blocks: [124, 125, 126, 127, 128, 129, 130, 131] --- diff --git a/docs/spec/todos/TODO-0133.md b/docs/spec/todos/TODO-0133.md index 9a97f23..9bb7c54 100644 --- a/docs/spec/todos/TODO-0133.md +++ b/docs/spec/todos/TODO-0133.md @@ -1,55 +1,21 @@ --- id: 133 title: "Macro for step pipeline boilerplate (early-return pattern)" -status: todo +status: done priority: low created: 2026-03-19 +completed: 2026-03-23 depends_on: [131] blocks: [] +subsumed_by: 139 --- # TODO-0133: Macro for step pipeline boilerplate (early-return pattern) -## Summary +## Original Scope -Create a declarative macro to reduce the repetitive early-return pattern in command `run()` functions. Deferred until the manual pattern from TODO-0119 is proven stable. +Create a declarative macro to reduce the repetitive early-return pattern in command `run()` functions. -## Details +## Resolution -The pattern that repeats ~10 times in `build::run()`: - -```rust -let (result, data) = run_something(args); -substeps.push(Step { substeps: vec![], outcome: convert(result) }); -let data = match data { - Some(d) => d, - None => { - substeps.push(Step { substeps: vec![], outcome: StepOutcome::Skipped }); // remaining1 - substeps.push(Step { substeps: vec![], outcome: StepOutcome::Skipped }); // remaining2 - return Step { - substeps, - outcome: StepOutcome::Complete { - result: Err(StepError { kind: ErrorKind::User, message: "...".into() }), - elapsed_ms: start.elapsed().as_millis() as u64, - }, - }; - } -}; -``` - -A macro could reduce each step to one line: - -```rust -let data = run_step!(substeps, start, run_something(args), ["remaining1", "remaining2"]); -``` - -### Prerequisites - -- All commands fully converted to Step tree (TODO-0131 complete) -- The pattern is identical across all commands (confirm during implementation) -- The Skipped step count per early-return is predictable - -### Not in scope - -- Async step handling (may need different macro or helper function) -- Macro for the `to_compact()` match statement (separate concern) +Subsumed by [TODO-0139](TODO-0139.md). Analysis showed a macro is not justified — the 39 pattern instances vary too much (different error kinds, data-dependent outcome construction, nested matches, different return types). A shared fail helper on `CommandResult` is the better approach. diff --git a/docs/spec/todos/TODO-0134.md b/docs/spec/todos/TODO-0134.md index f50ab25..4655470 100644 --- a/docs/spec/todos/TODO-0134.md +++ b/docs/spec/todos/TODO-0134.md @@ -1,9 +1,10 @@ --- id: 134 title: Step tree post-migration cleanup -status: todo +status: done priority: medium created: 2026-03-20 +completed: 2026-03-23 depends_on: [131] blocks: [] --- diff --git a/docs/spec/todos/TODO-0138.md b/docs/spec/todos/TODO-0138.md new file mode 100644 index 0000000..a437dee --- /dev/null +++ b/docs/spec/todos/TODO-0138.md @@ -0,0 +1,71 @@ +--- +id: 138 +title: Remove enum variant wrapper from JSON output +status: done +priority: high +created: 2026-03-23 +completed: 2026-03-23 +depends_on: [137] +blocks: [] +--- + +# TODO-0138: Remove enum variant wrapper from JSON output + +## Summary + +The `Outcome` enum serializes with variant names as JSON keys: `{ "Check": { "files_checked": 43, ... } }`. The consumer already knows which command they ran — the wrapper is noise. Serialize the inner struct directly: `{ "files_checked": 43, ... }`. + +## Problem + +Current verbose JSON: +```json +{ + "steps": [ + { "status": "complete", "elapsed_ms": 5, "outcome": { "ReadConfig": { "config_path": "..." } } }, + { "status": "complete", "elapsed_ms": 3, "outcome": { "Scan": { "files_found": 43, "glob": "**" } } } + ], + "result": { "Check": { "files_checked": 43, "violations": [], "new_fields": [] } }, + "elapsed_ms": 97 +} +``` + +Desired: +```json +{ + "steps": [ + { "status": "complete", "elapsed_ms": 5, "outcome": { "config_path": "..." } }, + { "status": "complete", "elapsed_ms": 3, "outcome": { "files_found": 43, "glob": "**" } } + ], + "result": { "files_checked": 43, "violations": [], "new_fields": [] }, + "elapsed_ms": 97 +} +``` + +Same for compact JSON — currently `{ "Check": { ... } }`, should be `{ "files_checked": 43, ... }`. + +## Approach + +Add `#[serde(untagged)]` to the `Outcome` enum in `src/outcome/mod.rs`. This tells serde to serialize the inner struct directly without the variant name wrapper. + +## Impact + +- Changes verbose JSON: step outcomes lose variant name keys +- Changes compact JSON: result loses variant name key +- Consumers parsing `json["result"]["Check"]` must change to `json["result"]` +- Step type is no longer identifiable from JSON alone (but each struct has distinct fields) + +## Design question + +Without the variant name, a consumer can't tell which step type produced an outcome just from the JSON. Is that acceptable? Each step's outcome struct has unique fields (e.g., `files_found` + `glob` = Scan, `config_path` = ReadConfig), but it's implicit rather than explicit. + +Alternative: keep a `"type"` field alongside the flattened fields: +```json +{ "status": "complete", "elapsed_ms": 5, "type": "ReadConfig", "config_path": "..." } +``` + +This would require custom serialization on `StepEntry` to inject the type name. + +## Files to modify + +- `src/outcome/mod.rs` — add `#[serde(untagged)]` to `Outcome` enum +- Tests that assert JSON structure with variant keys diff --git a/docs/spec/todos/TODO-0139.md b/docs/spec/todos/TODO-0139.md new file mode 100644 index 0000000..91243fa --- /dev/null +++ b/docs/spec/todos/TODO-0139.md @@ -0,0 +1,78 @@ +--- +id: 139 +title: Unify fail helpers across commands +status: done +priority: low +created: 2026-03-23 +completed: 2026-03-23 +depends_on: [137] +blocks: [] +--- + +# TODO-0139: Unify fail helpers across commands + +## Summary + +Each command defines its own fail helper(s) for constructing a failed `CommandResult` from the steps list. These are copy-pasted with minor variations. Extract a shared helper into `step.rs`. + +## Problem + +Current state — 5 separate implementations across commands: + +| File | Helpers | Error extraction | +|------|---------|-----------------| +| `init.rs` | `fail_early`, `fail_from_last_substep` | `.last()` | +| `update.rs` | `fail_early`, `fail_from_last_substep` | `.last()` (identical to init) | +| `build.rs` | `fail_from_last` | `.iter().rev().find_map()` | +| `search.rs` | `fail_from_last`, `fail_msg` | `.iter().rev().find_map()` | +| `check.rs`, `info.rs`, `clean.rs` | none (inline) | inline construction | + +The init/update helpers are byte-for-byte identical. The build/search helpers are nearly identical but use a better error extraction strategy (reverse search vs last). + +## Solution + +Add to `step.rs`: + +```rust +impl CommandResult { + /// Create a failed CommandResult by extracting the error message from the last + /// failed step entry. If no failed step is found, uses a default message. + pub fn failed_from_steps(steps: Vec, start: Instant) -> Self { + let msg = steps.iter().rev().find_map(|s| match s { + StepEntry::Failed(f) => Some(f.message.clone()), + _ => None, + }).unwrap_or_else(|| "step failed".into()); + Self { + steps, + result: Err(StepError { kind: ErrorKind::Application, message: msg }), + elapsed_ms: start.elapsed().as_millis() as u64, + } + } + + /// Create a failed CommandResult with a specific error message. + pub fn failed(steps: Vec, kind: ErrorKind, message: String, start: Instant) -> Self { + Self { + steps, + result: Err(StepError { kind, message }), + elapsed_ms: start.elapsed().as_millis() as u64, + } + } +} +``` + +Then replace per-command helpers: +- `fail_from_last(&mut steps, start)` → `CommandResult::failed_from_steps(std::mem::take(&mut steps), start)` +- `fail_early(steps, start, kind, msg)` → `CommandResult::failed(steps, kind, msg, start)` +- `fail_msg(&mut steps, start, kind, msg)` → `CommandResult::failed(std::mem::take(&mut steps), kind, msg.into(), start)` +- Inline constructions → use the appropriate method + +## Files to modify + +- `src/step.rs` — add `failed_from_steps` and `failed` constructors on `CommandResult` +- `src/cmd/init.rs` — delete `fail_early`, `fail_from_last_substep`, use shared methods +- `src/cmd/update.rs` — same +- `src/cmd/build.rs` — delete `fail_from_last`, use shared method +- `src/cmd/search.rs` — delete `fail_from_last`, `fail_msg`, use shared methods +- `src/cmd/check.rs` — replace inline constructions +- `src/cmd/info.rs` — replace inline construction +- `src/cmd/clean.rs` — replace inline construction (if any) diff --git a/docs/spec/todos/index.md b/docs/spec/todos/index.md index 04297e7..55b50a3 100644 --- a/docs/spec/todos/index.md +++ b/docs/spec/todos/index.md @@ -120,10 +120,10 @@ | [0116](TODO-0116.md) | Trim DataFusion default features to reduce binary size | todo | medium | 2026-03-17 | | [0117](TODO-0117.md) | Fix null values skipping Disallowed and NullNotAllowed checks | done | high | 2026-03-17 | | [0118](TODO-0118.md) | Rework README and book intro to show directory-aware schema | todo | high | 2026-03-17 | -| [0119](TODO-0119.md) | Unified Step tree architecture — replace pipeline/command output split | todo | high | 2026-03-18 | +| [0119](TODO-0119.md) | Unified Step tree architecture — replace pipeline/command output split | done | high | 2026-03-18 | | [0120](TODO-0120.md) | Step tree: core types — Step, StepOutcome, StepError | done | high | 2026-03-19 | | [0121](TODO-0121.md) | Step tree: Block enum and Render trait | done | high | 2026-03-19 | -| [0122](TODO-0122.md) | Step tree: Outcome enums and all outcome structs | todo | high | 2026-03-19 | +| [0122](TODO-0122.md) | Step tree: Outcome enums and all outcome structs | done | high | 2026-03-19 | | [0123](TODO-0123.md) | Step tree: shared formatters (format_text, format_markdown) | done | high | 2026-03-19 | | [0124](TODO-0124.md) | Step tree: custom Serialize on Step and StepOutcome | done | high | 2026-03-19 | | [0125](TODO-0125.md) | Step tree: convert clean command | done | high | 2026-03-19 | @@ -134,8 +134,10 @@ | [0130](TODO-0130.md) | Step tree: convert build + search commands | done | high | 2026-03-19 | | [0131](TODO-0131.md) | Step tree: delete old pipeline, update main.rs, simplify output.rs | done | high | 2026-03-19 | | [0132](TODO-0132.md) | Macro for compact struct generation (crabtime) | done (subsumed by 0137) | low | 2026-03-19 | -| [0133](TODO-0133.md) | Macro for step pipeline boilerplate (early-return pattern) | todo | low | 2026-03-19 | -| [0134](TODO-0134.md) | Step tree post-migration cleanup | todo | medium | 2026-03-20 | +| [0133](TODO-0133.md) | Macro for step pipeline boilerplate (early-return pattern) | done (subsumed by 0139) | low | 2026-03-19 | +| [0134](TODO-0134.md) | Step tree post-migration cleanup | done | medium | 2026-03-20 | | [0135](TODO-0135.md) | Remove Skipped padding from Step tree error paths | done | medium | 2026-03-20 | | [0136](TODO-0136.md) | Inline auto-update and auto-build logic to eliminate redundant reads | done | high | 2026-03-20 | | [0137](TODO-0137.md) | Flatten Step tree into steps + result structure | done | high | 2026-03-21 | +| [0138](TODO-0138.md) | Remove enum variant wrapper from JSON output | done | high | 2026-03-23 | +| [0139](TODO-0139.md) | Unify fail helpers across commands | done | low | 2026-03-23 | From dc42fdeca66ab2252bf6ef9d626b8ff048de6bd6 Mon Sep 17 00:00:00 2001 From: edoch Date: Tue, 24 Mar 2026 23:44:43 +0100 Subject: [PATCH 19/35] refactor: use Panel for detail rows, fix column width proportions Replace ColumnSpan with Panel::horizontal for Record-style detail rows. Panel inserts spanning rows without inflating column width calculation. Apply fixed 40/30/30 column proportions via per-column Modify + Width::wrap/increase. Fixes inconsistent column widths across per-field tables. Co-Authored-By: Claude --- src/render.rs | 94 ++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 70 insertions(+), 24 deletions(-) diff --git a/src/render.rs b/src/render.rs index 04745ee..0270d35 100644 --- a/src/render.rs +++ b/src/render.rs @@ -5,8 +5,7 @@ //! format means writing one function here — no command code changes needed. use tabled::settings::{ - object::Cell, peaker::PriorityMax, span::ColumnSpan, style::Style, themes::BorderCorrection, - width::Width, Modify, + object::Column, style::Style, themes::BorderCorrection, width::Width, Modify, Panel, }; use crate::block::{Block, TableStyle}; @@ -34,39 +33,86 @@ fn format_text_block(block: &Block, out: &mut String, indent: usize) { rows, style, } => { - let mut builder = Builder::default(); - if let Some(hdrs) = headers { - builder.push_record(hdrs.iter().map(String::as_str)); - } - for row in rows { - builder.push_record(row.iter().map(String::as_str)); - } - let mut table = builder.build(); - - match style { + let mut table = match style { TableStyle::Compact => { + let mut builder = Builder::default(); + if let Some(hdrs) = headers { + builder.push_record(hdrs.iter().map(String::as_str)); + } + for row in rows { + builder.push_record(row.iter().map(String::as_str)); + } + let mut table = builder.build(); style_compact(&mut table); + table } TableStyle::Record { detail_rows } => { - let col_count = headers - .as_ref() - .map(|h| h.len()) - .or_else(|| rows.first().map(|r| r.len())) - .unwrap_or(1) as isize; + // Build table with only non-detail rows so detail text + // doesn't inflate column widths + let mut builder = Builder::default(); + if let Some(hdrs) = headers { + builder.push_record(hdrs.iter().map(String::as_str)); + } + for (i, row) in rows.iter().enumerate() { + if !detail_rows.contains(&i) { + builder.push_record(row.iter().map(String::as_str)); + } + } + let mut table = builder.build(); let w = term_width(); let header_offset = if headers.is_some() { 1 } else { 0 }; table.with(Style::rounded()); + + // Insert detail rows as Panels (spanning rows that don't + // affect column width calculation). + // Panel::horizontal(n, text) inserts a new row at position n. + // We insert after the data row that precedes each detail row. + let mut panels_inserted = 0; for &row_idx in detail_rows { - let actual_row = row_idx + header_offset; - table.with( - Modify::new(Cell::new(actual_row, 0)).with(ColumnSpan::new(col_count)), - ); + let detail_text = &rows[row_idx][0]; + if !detail_text.is_empty() { + // Count non-detail rows before this detail row + let data_rows_before = + (0..row_idx).filter(|i| !detail_rows.contains(i)).count(); + // Insert position: after the last data row + header + previously inserted panels + let pos = data_rows_before + header_offset + panels_inserted; + table.with(Panel::horizontal(pos, detail_text)); + panels_inserted += 1; + } } + table.with(BorderCorrection {}); - table.with(Width::increase(w)); - table.with(Width::wrap(w).priority(PriorityMax::left())); + // Fixed proportional column widths via per-column Modify + let col_count = headers + .as_ref() + .map(|h| h.len()) + .or_else(|| rows.first().map(|r| r.len())) + .unwrap_or(1); + // Overhead: borders (col_count + 1 chars) + padding (2 per col) + let overhead = (col_count + 1) + (col_count * 2); + let available = w.saturating_sub(overhead); + if col_count == 3 { + // 40% / 30% / 30% + let c0 = available * 40 / 100; + let c1 = available * 30 / 100; + let c2 = available - c0 - c1; + table.with(Modify::new(Column::from(0)).with(Width::wrap(c0))); + table.with(Modify::new(Column::from(1)).with(Width::wrap(c1))); + table.with(Modify::new(Column::from(2)).with(Width::wrap(c2))); + table.with(Modify::new(Column::from(0)).with(Width::increase(c0))); + table.with(Modify::new(Column::from(1)).with(Width::increase(c1))); + table.with(Modify::new(Column::from(2)).with(Width::increase(c2))); + } else { + // Fallback: distribute evenly + let each = available / col_count.max(1); + for i in 0..col_count { + table.with(Modify::new(Column::from(i)).with(Width::wrap(each))); + table.with(Modify::new(Column::from(i)).with(Width::increase(each))); + } + } + table } - } + }; let rendered = table.to_string(); if indent > 0 { From 81733d17f1352901f2dcfc7c089810ccb61609d4 Mon Sep 17 00:00:00 2001 From: edoch Date: Tue, 24 Mar 2026 23:45:01 +0100 Subject: [PATCH 20/35] =?UTF-8?q?refactor:=20redesign=20init=20output=20?= =?UTF-8?q?=E2=80=94=20skip=20default=20constraints,=20no=20headers?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Extract has_non_default_constraints() and field_detail_lines() helpers. Fields with only allowed: ** and no required/nullable skip the detail row entirely. Remove column headers (Field/Type/Files) — column meaning is obvious from context, consistent with info and other commands. Co-Authored-By: Claude --- src/outcome/commands/init.rs | 101 ++++++++++++++++++++++------------- 1 file changed, 64 insertions(+), 37 deletions(-) diff --git a/src/outcome/commands/init.rs b/src/outcome/commands/init.rs index ca11705..158c0d9 100644 --- a/src/outcome/commands/init.rs +++ b/src/outcome/commands/init.rs @@ -20,11 +20,50 @@ pub struct InitOutcome { pub dry_run: bool, } +/// Check if a field has only default constraints (allowed: ** only, not nullable, no required). +fn has_non_default_constraints(field: &DiscoveredField) -> bool { + let has_required = field.required.as_ref().is_some_and(|r| !r.is_empty()); + let has_non_default_allowed = field + .allowed + .as_ref() + .is_some_and(|a| !(a.len() == 1 && a[0] == "**")); + has_required || has_non_default_allowed || field.nullable || !field.hints.is_empty() +} + +/// Build detail lines for a field's constraints (only non-default values). +fn field_detail_lines(field: &DiscoveredField) -> Vec { + let mut lines = Vec::new(); + if let Some(ref req) = field.required { + if !req.is_empty() { + lines.push(" required:".to_string()); + for g in req { + lines.push(format!(" {g}")); + } + } + } + if let Some(ref allowed) = field.allowed { + // Skip if only "**" (the default) + if !(allowed.len() == 1 && allowed[0] == "**") { + lines.push(" allowed:".to_string()); + for g in allowed { + lines.push(format!(" {g}")); + } + } + } + if field.nullable { + lines.push(" nullable: true".to_string()); + } + if !field.hints.is_empty() { + lines.push(format!(" hints: {}", format_hints(&field.hints))); + } + lines +} + impl Render for InitOutcome { fn render(&self) -> Vec { let mut blocks = vec![]; - // One-liner + // Summary line let field_summary = if self.fields.is_empty() { "no fields found".to_string() } else { @@ -36,44 +75,32 @@ impl Render for InitOutcome { format_file_count(self.files_scanned) ))); - // Per-field record tables + // Per-field tables for field in &self.fields { - let mut detail_lines = Vec::new(); - if let Some(ref req) = field.required { - if !req.is_empty() { - detail_lines.push(" required:".to_string()); - for g in req { - detail_lines.push(format!(" - \"{g}\"")); - } - } - } - if let Some(ref allowed) = field.allowed { - detail_lines.push(" allowed:".to_string()); - for g in allowed { - detail_lines.push(format!(" - \"{g}\"")); - } - } - if field.nullable { - detail_lines.push(" nullable: true".to_string()); - } - if !field.hints.is_empty() { - detail_lines.push(format!(" hints: {}", format_hints(&field.hints))); - } + let data_row = vec![ + field.name.clone(), + field.field_type.clone(), + format!("{}/{}", field.files_found, field.total_files), + ]; - blocks.push(Block::Table { - headers: None, - rows: vec![ - vec![ - format!("\"{}\"", field.name), - field.field_type.clone(), - format!("{}/{}", field.files_found, field.total_files), - ], - vec![detail_lines.join("\n"), String::new(), String::new()], - ], - style: TableStyle::Record { - detail_rows: vec![1], - }, - }); + if has_non_default_constraints(field) { + let detail = field_detail_lines(field).join("\n"); + blocks.push(Block::Table { + headers: None, + rows: vec![data_row, vec![detail, String::new(), String::new()]], + style: TableStyle::Record { + detail_rows: vec![1], + }, + }); + } else { + blocks.push(Block::Table { + headers: None, + rows: vec![data_row], + style: TableStyle::Record { + detail_rows: vec![], + }, + }); + } } // Footer From a7f4b099e3f29eb66344c8e55ba6ae502e5ef71d Mon Sep 17 00:00:00 2001 From: edoch Date: Sat, 28 Mar 2026 22:12:14 +0100 Subject: [PATCH 21/35] refactor: add KeyValue table style, redesign init output MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add TableStyle::KeyValue { title } — two-column key-value table with item name on top border (LineText), horizontal separators between all rows (Style::modern), fixed 50/50 column widths (per-column Modify). Redesign init Render impl: each field rendered as a KeyValue table with explicit properties (type, files, nullable, required, allowed). No abbreviations, all values shown including defaults. Co-Authored-By: Claude --- src/block.rs | 8 +++- src/outcome/commands/init.rs | 91 ++++++++++++------------------------ src/render.rs | 37 ++++++++++++++- 3 files changed, 73 insertions(+), 63 deletions(-) diff --git a/src/block.rs b/src/block.rs index f4fbbb8..18911d4 100644 --- a/src/block.rs +++ b/src/block.rs @@ -34,12 +34,18 @@ pub enum Block { pub enum TableStyle { /// No internal horizontal separators. For compact summary tables. Compact, - /// Detail rows span all columns (via ColumnSpan in text formatter). + /// Detail rows span all columns (via Panel in text formatter). /// For per-item record tables with expandable detail. Record { /// Zero-based row indices that should span all columns as detail rows. detail_rows: Vec, }, + /// Two-column key-value table with item name on top border. + /// Horizontal separators between all rows. Fixed 50/50 column widths. + KeyValue { + /// Item name displayed on the top border. + title: String, + }, } /// Trait for types that render themselves as a sequence of blocks. diff --git a/src/outcome/commands/init.rs b/src/outcome/commands/init.rs index 158c0d9..97a02a1 100644 --- a/src/outcome/commands/init.rs +++ b/src/outcome/commands/init.rs @@ -20,45 +20,6 @@ pub struct InitOutcome { pub dry_run: bool, } -/// Check if a field has only default constraints (allowed: ** only, not nullable, no required). -fn has_non_default_constraints(field: &DiscoveredField) -> bool { - let has_required = field.required.as_ref().is_some_and(|r| !r.is_empty()); - let has_non_default_allowed = field - .allowed - .as_ref() - .is_some_and(|a| !(a.len() == 1 && a[0] == "**")); - has_required || has_non_default_allowed || field.nullable || !field.hints.is_empty() -} - -/// Build detail lines for a field's constraints (only non-default values). -fn field_detail_lines(field: &DiscoveredField) -> Vec { - let mut lines = Vec::new(); - if let Some(ref req) = field.required { - if !req.is_empty() { - lines.push(" required:".to_string()); - for g in req { - lines.push(format!(" {g}")); - } - } - } - if let Some(ref allowed) = field.allowed { - // Skip if only "**" (the default) - if !(allowed.len() == 1 && allowed[0] == "**") { - lines.push(" allowed:".to_string()); - for g in allowed { - lines.push(format!(" {g}")); - } - } - } - if field.nullable { - lines.push(" nullable: true".to_string()); - } - if !field.hints.is_empty() { - lines.push(format!(" hints: {}", format_hints(&field.hints))); - } - lines -} - impl Render for InitOutcome { fn render(&self) -> Vec { let mut blocks = vec![]; @@ -75,32 +36,40 @@ impl Render for InitOutcome { format_file_count(self.files_scanned) ))); - // Per-field tables + // Per-field key-value tables for field in &self.fields { - let data_row = vec![ - field.name.clone(), - field.field_type.clone(), - format!("{}/{}", field.files_found, field.total_files), + let mut rows = vec![ + vec!["type".into(), field.field_type.clone()], + vec![ + "files".into(), + format!("{} out of {}", field.files_found, field.total_files), + ], + vec!["nullable".into(), field.nullable.to_string()], ]; - if has_non_default_constraints(field) { - let detail = field_detail_lines(field).join("\n"); - blocks.push(Block::Table { - headers: None, - rows: vec![data_row, vec![detail, String::new(), String::new()]], - style: TableStyle::Record { - detail_rows: vec![1], - }, - }); - } else { - blocks.push(Block::Table { - headers: None, - rows: vec![data_row], - style: TableStyle::Record { - detail_rows: vec![], - }, - }); + let req_val = match &field.required { + Some(r) if !r.is_empty() => r.join("\n"), + _ => "(none)".into(), + }; + rows.push(vec!["required".into(), req_val]); + + let allow_val = match &field.allowed { + Some(a) if !a.is_empty() => a.join("\n"), + _ => "**".into(), + }; + rows.push(vec!["allowed".into(), allow_val]); + + if !field.hints.is_empty() { + rows.push(vec!["hints".into(), format_hints(&field.hints)]); } + + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::KeyValue { + title: field.name.clone(), + }, + }); } // Footer diff --git a/src/render.rs b/src/render.rs index 0270d35..77846cb 100644 --- a/src/render.rs +++ b/src/render.rs @@ -5,7 +5,11 @@ //! format means writing one function here — no command code changes needed. use tabled::settings::{ - object::Column, style::Style, themes::BorderCorrection, width::Width, Modify, Panel, + object::{Column, Rows}, + style::{HorizontalLine, LineText, Style}, + themes::BorderCorrection, + width::Width, + Modify, Panel, }; use crate::block::{Block, TableStyle}; @@ -112,6 +116,37 @@ fn format_text_block(block: &Block, out: &mut String, indent: usize) { } table } + TableStyle::KeyValue { title } => { + let mut builder = Builder::default(); + for row in rows { + builder.push_record(row.iter().map(String::as_str)); + } + let mut table = builder.build(); + + let w = term_width(); + let available = w.saturating_sub(7); // 3 borders + 4 padding + let half = available / 2; + + // modern() has horizontal lines between ALL rows + table.with(Style::modern()); + + // Fixed 50/50 column widths + table.with( + Modify::new(Column::from(0)).with(Width::increase(half)), + ); + table.with(Modify::new(Column::from(0)).with(Width::wrap(half))); + table.with( + Modify::new(Column::from(1)).with(Width::increase(half)), + ); + table.with(Modify::new(Column::from(1)).with(Width::wrap(half))); + + // Item name on top border + table.with( + LineText::new(format!(" {title} "), Rows::first()).offset(1), + ); + + table + } }; let rendered = table.to_string(); From 6c1eb65b1b554b553a32883dfc817082072371a2 Mon Sep 17 00:00:00 2001 From: edoch Date: Sat, 28 Mar 2026 22:12:27 +0100 Subject: [PATCH 22/35] docs: update README output examples to key-value table format Update init, check, and search output examples to match the new KeyValue table design. Add scripts/test_kv_table.rs prototype. Co-Authored-By: Claude --- README.md | 56 +++++++++++++++++++++++------ scripts/test_kv_table.rs | 78 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 123 insertions(+), 11 deletions(-) create mode 100644 scripts/test_kv_table.rs diff --git a/README.md b/README.md index 8f514eb..ab18c0b 100644 --- a/README.md +++ b/README.md @@ -93,14 +93,34 @@ mdvs scans every file, extracts frontmatter, and infers which fields belong wher ``` Initialized 5 files — 7 field(s) - - "title" String 5/5 required everywhere - "draft" Boolean 2/5 only in blog/ - "tags" String[] 1/5 only in blog/ - "role" String 2/5 required in team/ - "email" String 1/5 only in team/ - "date" String 1/5 only in meetings/ - "attendees" String[] 1/5 only in meetings/ +┌ title ────────────┬───────────────────┐ +│ type │ String │ +├───────────────────┼───────────────────┤ +│ files │ 5 out of 5 │ +├───────────────────┼───────────────────┤ +│ required │ ** │ +├───────────────────┼───────────────────┤ +│ allowed │ ** │ +└───────────────────┴───────────────────┘ +┌ draft ────────────┬───────────────────┐ +│ type │ Boolean │ +├───────────────────┼───────────────────┤ +│ files │ 2 out of 5 │ +├───────────────────┼───────────────────┤ +│ required │ (none) │ +├───────────────────┼───────────────────┤ +│ allowed │ blog/** │ +└───────────────────┴───────────────────┘ +┌ role ─────────────┬───────────────────┐ +│ type │ String │ +├───────────────────┼───────────────────┤ +│ files │ 2 out of 5 │ +├───────────────────┼───────────────────┤ +│ required │ team/** │ +├───────────────────┼───────────────────┤ +│ allowed │ team/** │ +└───────────────────┴───────────────────┘ + ... ``` `draft` belongs in `blog/`. `role` belongs in `team/`. The directory structure is the schema. @@ -123,7 +143,12 @@ mdvs check notes/ ``` ``` -1 violation — "role" MissingRequired in team/charlie.md +Checked 7 files — 1 violation(s) +┌ role ─────────────┬───────────────────┐ +│ kind │ Missing required │ +├───────────────────┼───────────────────┤ +│ files │ team/charlie.md │ +└───────────────────┴───────────────────┘ ``` `charlie.md` is missing `role` — but `new-post.md` isn't flagged. mdvs knows `role` belongs in `team/`, not in `blog/`. @@ -135,8 +160,17 @@ mdvs search "weekly sync" notes/ ``` ``` -1 meetings/weekly.md 0.82 -2 team/alice.md 0.45 +Searched "weekly sync" — 2 hits +┌ #1 ───────────────┬───────────────────┐ +│ file │ meetings/weekly.md│ +├───────────────────┼───────────────────┤ +│ score │ 0.820 │ +└───────────────────┴───────────────────┘ +┌ #2 ───────────────┬───────────────────┐ +│ file │ team/alice.md │ +├───────────────────┼───────────────────┤ +│ score │ 0.450 │ +└───────────────────┴───────────────────┘ ``` Filter with SQL on frontmatter fields: diff --git a/scripts/test_kv_table.rs b/scripts/test_kv_table.rs new file mode 100644 index 0000000..c957fe7 --- /dev/null +++ b/scripts/test_kv_table.rs @@ -0,0 +1,78 @@ +#!/usr/bin/env -S cargo +nightly -Zscript +--- +[dependencies] +tabled = "0.20" +terminal_size = "0.4" +--- + +use tabled::builder::Builder; +use tabled::settings::{ + object::{Column, Rows}, + style::{HorizontalLine, LineText, Style}, + width::Width, + Modify, +}; + +fn term_width() -> usize { + terminal_size::terminal_size() + .map(|(terminal_size::Width(w), _)| w as usize) + .unwrap_or(80) +} + +fn main() { + let w = term_width(); + let available = w.saturating_sub(7); + let half = available / 2; + + // Field with constraints + let mut b = Builder::default(); + b.push_record(["type", "Array(String)"]); + b.push_record(["files", "9 out of 43"]); + b.push_record(["nullable", "true"]); + b.push_record(["required", "meetings/all-hands/**\nprojects/alpha/meetings/**\nprojects/beta/meetings/**"]); + b.push_record(["allowed", "meetings/**\nprojects/alpha/meetings/**\nprojects/beta/meetings/**"]); + + let sep = HorizontalLine::inherit(Style::modern()); + let mut table = b.build(); + table.with( + Style::rounded().horizontals([ + (1, sep.clone()), + (2, sep.clone()), + (3, sep.clone()), + (4, sep.clone()), + ]) + ); + table.with(Modify::new(Column::from(0)).with(Width::increase(half))); + table.with(Modify::new(Column::from(0)).with(Width::wrap(half))); + table.with(Modify::new(Column::from(1)).with(Width::increase(half))); + table.with(Modify::new(Column::from(1)).with(Width::wrap(half))); + table.with(LineText::new(" action_items ", Rows::first()).offset(1)); + + println!("{table}"); + println!(); + + // Simple field + let mut b2 = Builder::default(); + b2.push_record(["type", "String"]); + b2.push_record(["files", "43 out of 43"]); + b2.push_record(["nullable", "false"]); + b2.push_record(["required", "(none)"]); + b2.push_record(["allowed", "**"]); + + let mut table2 = b2.build(); + table2.with( + Style::rounded().horizontals([ + (1, sep.clone()), + (2, sep.clone()), + (3, sep.clone()), + (4, sep.clone()), + ]) + ); + table2.with(Modify::new(Column::from(0)).with(Width::increase(half))); + table2.with(Modify::new(Column::from(0)).with(Width::wrap(half))); + table2.with(Modify::new(Column::from(1)).with(Width::increase(half))); + table2.with(Modify::new(Column::from(1)).with(Width::wrap(half))); + table2.with(LineText::new(" title ", Rows::first()).offset(1)); + + println!("{table2}"); +} From 8763b4b2d9c4ff46b712dc056c43c0cbca62600d Mon Sep 17 00:00:00 2001 From: edoch Date: Sat, 28 Mar 2026 23:07:12 +0100 Subject: [PATCH 23/35] refactor: redesign info output, tweak KeyValue rendering - Info: KeyValue tables for index metadata + per-field details, with "Index:" and "N fields:" section labels - KeyValue: 1/3 + 2/3 column widths, blank line between tables, skip LineText when title is empty - Init: blank line before first field table Co-Authored-By: Claude --- src/outcome/commands/info.rs | 76 +++++++++++++++++++++--------------- src/outcome/commands/init.rs | 3 ++ src/render.rs | 33 ++++++++-------- 3 files changed, 65 insertions(+), 47 deletions(-) diff --git a/src/outcome/commands/info.rs b/src/outcome/commands/info.rs index 448efbd..a53cfd9 100644 --- a/src/outcome/commands/info.rs +++ b/src/outcome/commands/info.rs @@ -25,6 +25,7 @@ impl Render for InfoOutcome { fn render(&self) -> Vec { let mut blocks = vec![]; + // Summary line let one_liner = match &self.index { Some(idx) => format!( "{} files, {} fields, {} chunks", @@ -35,58 +36,71 @@ impl Render for InfoOutcome { None => format!("{} files, {} fields", self.files_on_disk, self.fields.len()), }; blocks.push(Block::Line(one_liner)); + blocks.push(Block::Line(String::new())); + // Index metadata if let Some(idx) = &self.index { let rev = idx.revision.as_deref().unwrap_or("none"); let rows = vec![ - vec!["model:".into(), idx.model.clone()], - vec!["revision:".into(), rev.to_string()], - vec!["chunk size:".into(), idx.chunk_size.to_string()], - vec!["built:".into(), idx.built_at.clone()], - vec!["config:".into(), idx.config_status.clone()], + vec!["model".into(), idx.model.clone()], + vec!["revision".into(), rev.to_string()], + vec!["chunk size".into(), idx.chunk_size.to_string()], + vec!["built".into(), idx.built_at.clone()], + vec!["config".into(), idx.config_status.clone()], vec![ - "files:".into(), - format!("{}/{}", idx.files_indexed, idx.files_on_disk), + "files".into(), + format!("{} out of {}", idx.files_indexed, idx.files_on_disk), ], ]; + blocks.push(Block::Line("Index:".into())); blocks.push(Block::Table { headers: None, rows, - style: TableStyle::Compact, + style: TableStyle::KeyValue { + title: String::new(), + }, }); } + // Per-field KeyValue tables + if !self.fields.is_empty() { + blocks.push(Block::Line(format!("{} fields:", self.fields.len()))); + } for f in &self.fields { - let count_str = match (f.count, f.total_files) { - (Some(c), Some(t)) => format!("{c}/{t}"), + let files_str = match (f.count, f.total_files) { + (Some(c), Some(t)) => format!("{c} out of {t}"), _ => String::new(), }; - let mut detail_lines = Vec::new(); - if !f.required.is_empty() { - detail_lines.push(" required:".to_string()); - for g in &f.required { - detail_lines.push(format!(" - \"{g}\"")); - } - } - detail_lines.push(" allowed:".to_string()); - for g in &f.allowed { - detail_lines.push(format!(" - \"{g}\"")); - } - if f.nullable { - detail_lines.push(" nullable: true".to_string()); - } + + let mut rows = vec![ + vec!["type".into(), f.field_type.clone()], + vec!["files".into(), files_str], + vec!["nullable".into(), f.nullable.to_string()], + ]; + + let req_val = if f.required.is_empty() { + "(none)".into() + } else { + f.required.join("\n") + }; + rows.push(vec!["required".into(), req_val]); + + let allow_val = if f.allowed.is_empty() { + "**".into() + } else { + f.allowed.join("\n") + }; + rows.push(vec!["allowed".into(), allow_val]); + if !f.hints.is_empty() { - detail_lines.push(format!(" hints: {}", format_hints(&f.hints))); + rows.push(vec!["hints".into(), format_hints(&f.hints)]); } blocks.push(Block::Table { headers: None, - rows: vec![ - vec![format!("\"{}\"", f.name), f.field_type.clone(), count_str], - vec![detail_lines.join("\n"), String::new(), String::new()], - ], - style: TableStyle::Record { - detail_rows: vec![1], + rows, + style: TableStyle::KeyValue { + title: f.name.clone(), }, }); } diff --git a/src/outcome/commands/init.rs b/src/outcome/commands/init.rs index 97a02a1..637854d 100644 --- a/src/outcome/commands/init.rs +++ b/src/outcome/commands/init.rs @@ -37,6 +37,9 @@ impl Render for InitOutcome { ))); // Per-field key-value tables + if !self.fields.is_empty() { + blocks.push(Block::Line(String::new())); + } for field in &self.fields { let mut rows = vec![ vec!["type".into(), field.field_type.clone()], diff --git a/src/render.rs b/src/render.rs index 77846cb..8961835 100644 --- a/src/render.rs +++ b/src/render.rs @@ -6,7 +6,7 @@ use tabled::settings::{ object::{Column, Rows}, - style::{HorizontalLine, LineText, Style}, + style::{LineText, Style}, themes::BorderCorrection, width::Width, Modify, Panel, @@ -37,7 +37,7 @@ fn format_text_block(block: &Block, out: &mut String, indent: usize) { rows, style, } => { - let mut table = match style { + let table = match style { TableStyle::Compact => { let mut builder = Builder::default(); if let Some(hdrs) = headers { @@ -125,31 +125,29 @@ fn format_text_block(block: &Block, out: &mut String, indent: usize) { let w = term_width(); let available = w.saturating_sub(7); // 3 borders + 4 padding - let half = available / 2; + let col0 = available / 3; + let col1 = available - col0; // modern() has horizontal lines between ALL rows table.with(Style::modern()); - // Fixed 50/50 column widths - table.with( - Modify::new(Column::from(0)).with(Width::increase(half)), - ); - table.with(Modify::new(Column::from(0)).with(Width::wrap(half))); - table.with( - Modify::new(Column::from(1)).with(Width::increase(half)), - ); - table.with(Modify::new(Column::from(1)).with(Width::wrap(half))); + // Fixed 1/3 and 2/3 column widths + table.with(Modify::new(Column::from(0)).with(Width::increase(col0))); + table.with(Modify::new(Column::from(0)).with(Width::wrap(col0))); + table.with(Modify::new(Column::from(1)).with(Width::increase(col1))); + table.with(Modify::new(Column::from(1)).with(Width::wrap(col1))); - // Item name on top border - table.with( - LineText::new(format!(" {title} "), Rows::first()).offset(1), - ); + // Item name on top border (skip if empty) + if !title.is_empty() { + table.with(LineText::new(format!(" {title} "), Rows::first()).offset(1)); + } table } }; let rendered = table.to_string(); + let extra_newline = matches!(style, TableStyle::KeyValue { .. }); if indent > 0 { for line in rendered.lines() { out.push_str(&prefix); @@ -160,6 +158,9 @@ fn format_text_block(block: &Block, out: &mut String, indent: usize) { out.push_str(&rendered); out.push('\n'); } + if extra_newline { + out.push('\n'); + } } Block::Section { label, children } => { out.push_str(&prefix); From dd0bd159c2babdf9a904d62ed6fb81610f146498 Mon Sep 17 00:00:00 2001 From: edoch Date: Sat, 28 Mar 2026 23:14:44 +0100 Subject: [PATCH 24/35] docs: add TODO-0140 (global --dry-run flag) Co-Authored-By: Claude --- docs/spec/todos/TODO-0140.md | 29 +++++++++++++++++++++++++++++ docs/spec/todos/index.md | 1 + 2 files changed, 30 insertions(+) create mode 100644 docs/spec/todos/TODO-0140.md diff --git a/docs/spec/todos/TODO-0140.md b/docs/spec/todos/TODO-0140.md new file mode 100644 index 0000000..f49ad41 --- /dev/null +++ b/docs/spec/todos/TODO-0140.md @@ -0,0 +1,29 @@ +--- +id: 140 +title: Global --dry-run flag +status: todo +priority: medium +created: 2026-03-28 +depends_on: [] +blocks: [] +--- + +# TODO-0140: Global --dry-run flag + +## Summary + +Make `--dry-run` a global CLI flag (`#[arg(global = true)]`) available to all commands. Commands that write (init, update, build, clean) respect it. Read-only commands (check, search, info) ignore it silently. + +## Current state + +- `init` has `--dry-run` (per-command flag) +- `update` has `--dry-run` (per-command flag) +- `build` and `clean` don't have it + +## Changes + +1. Move `--dry-run` from init/update per-command args to the global `Cli` struct +2. Pass it down to all commands +3. Implement for `build`: validate + classify but skip embedding and index write +4. Implement for `clean`: show what would be deleted without deleting +5. Read-only commands: ignore the flag (no behavior change) diff --git a/docs/spec/todos/index.md b/docs/spec/todos/index.md index 55b50a3..64a3856 100644 --- a/docs/spec/todos/index.md +++ b/docs/spec/todos/index.md @@ -141,3 +141,4 @@ | [0137](TODO-0137.md) | Flatten Step tree into steps + result structure | done | high | 2026-03-21 | | [0138](TODO-0138.md) | Remove enum variant wrapper from JSON output | done | high | 2026-03-23 | | [0139](TODO-0139.md) | Unify fail helpers across commands | done | low | 2026-03-23 | +| [0140](TODO-0140.md) | Global --dry-run flag | todo | medium | 2026-03-28 | From e7ce7fa0e71ad080732ed8ec88f07f0e5e34a13d Mon Sep 17 00:00:00 2001 From: edoch Date: Sat, 28 Mar 2026 23:28:46 +0100 Subject: [PATCH 25/35] refactor: redesign update output to KeyValue style MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Added fields use same format as init (type, files, nullable, required, allowed). Changed fields show each aspect as "old → new". Removed fields show previously allowed globs. Section labels "Added:", "Changed:", "Removed:" before each group. Co-Authored-By: Claude --- src/outcome/commands/update.rs | 145 +++++++++++++++++---------------- 1 file changed, 75 insertions(+), 70 deletions(-) diff --git a/src/outcome/commands/update.rs b/src/outcome/commands/update.rs index 1662989..126a744 100644 --- a/src/outcome/commands/update.rs +++ b/src/outcome/commands/update.rs @@ -32,6 +32,7 @@ impl Render for UpdateOutcome { fn render(&self) -> Vec { let mut blocks = vec![]; + // Summary line let total_changes = self.added.len() + self.changed.len() + self.removed.len(); let summary = if total_changes == 0 { "no changes".to_string() @@ -48,84 +49,88 @@ impl Render for UpdateOutcome { return blocks; } - for field in &self.added { - let mut detail_lines = Vec::new(); - if let Some(ref globs) = field.allowed { - detail_lines.push(" found in:".to_string()); - for g in globs { - detail_lines.push(format!(" - \"{g}\"")); - } - } - if field.nullable { - detail_lines.push(" nullable: true".to_string()); - } - if !field.hints.is_empty() { - detail_lines.push(format!(" hints: {}", format_hints(&field.hints))); - } - blocks.push(Block::Table { - headers: None, - rows: vec![ + // Added fields — same KeyValue format as init + if !self.added.is_empty() { + blocks.push(Block::Line(String::new())); + blocks.push(Block::Line(format!("Added ({}):", self.added.len()))); + for field in &self.added { + let mut rows = vec![ + vec!["type".into(), field.field_type.clone()], vec![ - format!("\"{}\"", field.name), - "added".to_string(), - field.field_type.clone(), + "files".into(), + format!("{} out of {}", field.files_found, field.total_files), ], - vec![detail_lines.join("\n"), String::new(), String::new()], - ], - style: TableStyle::Record { - detail_rows: vec![1], - }, - }); - } + vec!["nullable".into(), field.nullable.to_string()], + ]; - for field in &self.changed { - let mut rows = vec![vec![ - "field".into(), - "aspect".into(), - "old".into(), - "new".into(), - ]]; - for (i, change) in field.changes.iter().enumerate() { - let name_col = if i == 0 { - format!("\"{}\"", field.name) - } else { - String::new() + let req_val = match &field.required { + Some(r) if !r.is_empty() => r.join("\n"), + _ => "(none)".into(), }; - let (old, new) = change.format_old_new(); - rows.push(vec![name_col, change.label().to_string(), old, new]); + rows.push(vec!["required".into(), req_val]); + + let allow_val = match &field.allowed { + Some(a) if !a.is_empty() => a.join("\n"), + _ => "**".into(), + }; + rows.push(vec!["allowed".into(), allow_val]); + + if !field.hints.is_empty() { + rows.push(vec!["hints".into(), format_hints(&field.hints)]); + } + + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::KeyValue { + title: field.name.clone(), + }, + }); } - blocks.push(Block::Table { - headers: None, - rows, - style: TableStyle::Compact, - }); } - for field in &self.removed { - let detail = match &field.allowed { - Some(globs) => { - let mut lines = vec![" previously in:".to_string()]; - for g in globs { - lines.push(format!(" - \"{g}\"")); + // Changed fields — one row per changed aspect with old → new + if !self.changed.is_empty() { + blocks.push(Block::Line(String::new())); + blocks.push(Block::Line(format!("Changed ({}):", self.changed.len()))); + for field in &self.changed { + let rows: Vec> = field + .changes + .iter() + .map(|c| { + let (old, new) = c.format_old_new(); + vec![c.label().to_string(), format!("{old} \u{2192} {new}")] + }) + .collect(); + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::KeyValue { + title: field.name.clone(), + }, + }); + } + } + + // Removed fields + if !self.removed.is_empty() { + blocks.push(Block::Line(String::new())); + blocks.push(Block::Line(format!("Removed ({}):", self.removed.len()))); + for field in &self.removed { + let rows = match &field.allowed { + Some(globs) if !globs.is_empty() => { + vec![vec!["previously allowed".into(), globs.join("\n")]] } - lines.join("\n") - } - None => String::new(), - }; - blocks.push(Block::Table { - headers: None, - rows: vec![ - vec![ - format!("\"{}\"", field.name), - "removed".to_string(), - String::new(), - ], - vec![detail, String::new(), String::new()], - ], - style: TableStyle::Record { - detail_rows: vec![1], - }, - }); + _ => vec![vec!["status".into(), "removed".into()]], + }; + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::KeyValue { + title: field.name.clone(), + }, + }); + } } blocks From daf77c059a81128aa51f19cffc5cfbbc7378383c Mon Sep 17 00:00:00 2001 From: edoch Date: Sun, 29 Mar 2026 12:30:06 +0200 Subject: [PATCH 26/35] refactor: redesign check output to KeyValue style Violations as KeyValue tables with human-readable kind names (Missing required, Wrong type, Not allowed, Null value not allowed). New fields as KeyValue tables with status and file count. Section labels "Violations (N):" and "New fields (N):" before each group. Co-Authored-By: Claude --- src/outcome/commands/check.rs | 125 ++++++++++++++++++---------------- 1 file changed, 68 insertions(+), 57 deletions(-) diff --git a/src/outcome/commands/check.rs b/src/outcome/commands/check.rs index d13ee5e..18e5956 100644 --- a/src/outcome/commands/check.rs +++ b/src/outcome/commands/check.rs @@ -16,10 +16,21 @@ pub struct CheckOutcome { pub new_fields: Vec, } +/// Human-readable violation kind name. +fn kind_display(kind: &ViolationKind) -> &'static str { + match kind { + ViolationKind::MissingRequired => "Missing required", + ViolationKind::WrongType => "Wrong type", + ViolationKind::Disallowed => "Not allowed", + ViolationKind::NullNotAllowed => "Null value not allowed", + } +} + impl Render for CheckOutcome { fn render(&self) -> Vec { let mut blocks = vec![]; + // Summary line let violation_part = if self.violations.is_empty() { "no violations".to_string() } else { @@ -35,67 +46,67 @@ impl Render for CheckOutcome { format_file_count(self.files_checked), ))); - for v in &self.violations { - let kind_str = match v.kind { - ViolationKind::MissingRequired => "MissingRequired", - ViolationKind::WrongType => "WrongType", - ViolationKind::Disallowed => "Disallowed", - ViolationKind::NullNotAllowed => "NullNotAllowed", - }; - let detail_text = v - .files - .iter() - .map(|f| match &f.detail { - Some(d) => format!(" - \"{}\" ({d})", f.path.display()), - None => format!(" - \"{}\"", f.path.display()), - }) - .collect::>() - .join("\n"); + // Violations section + if !self.violations.is_empty() { + blocks.push(Block::Line(String::new())); + blocks.push(Block::Line(format!( + "Violations ({}):", + self.violations.len() + ))); + for v in &self.violations { + let files_str = v + .files + .iter() + .map(|f| match &f.detail { + Some(d) => format!("{} ({})", f.path.display(), d), + None => f.path.display().to_string(), + }) + .collect::>() + .join("\n"); - blocks.push(Block::Table { - headers: None, - rows: vec![ - vec![ - format!("\"{}\"", v.field), - kind_str.to_string(), - format_file_count(v.files.len()), - ], - vec![detail_text, String::new(), String::new()], - ], - style: TableStyle::Record { - detail_rows: vec![1], - }, - }); + let rows = vec![ + vec!["kind".into(), kind_display(&v.kind).into()], + vec!["rule".into(), v.rule.clone()], + vec!["files".into(), files_str], + ]; + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::KeyValue { + title: v.field.clone(), + }, + }); + } } - for nf in &self.new_fields { - let detail_text = match &nf.files { - Some(files) => files - .iter() - .map(|p| format!(" - \"{}\"", p.display())) - .collect::>() - .join("\n"), - None => String::new(), - }; - let mut rows = vec![vec![ - format!("\"{}\"", nf.name), - "new".to_string(), - format_file_count(nf.files_found), - ]]; - if !detail_text.is_empty() { - rows.push(vec![detail_text, String::new(), String::new()]); + // New fields section + if !self.new_fields.is_empty() { + blocks.push(Block::Line(String::new())); + blocks.push(Block::Line(format!( + "New fields ({}):", + self.new_fields.len() + ))); + for nf in &self.new_fields { + let found_in = match &nf.files { + Some(files) if !files.is_empty() => files + .iter() + .map(|p| p.display().to_string()) + .collect::>() + .join("\n"), + _ => format_file_count(nf.files_found), + }; + let rows = vec![ + vec!["status".into(), "new (not in mdvs.toml)".into()], + vec!["found in".into(), found_in], + ]; + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::KeyValue { + title: nf.name.clone(), + }, + }); } - blocks.push(Block::Table { - headers: None, - rows: rows.clone(), - style: if rows.len() > 1 { - TableStyle::Record { - detail_rows: vec![1], - } - } else { - TableStyle::Compact - }, - }); } blocks From 0552f454d72f4b75d707d60333a68b6bae55cb72 Mon Sep 17 00:00:00 2001 From: edoch Date: Sun, 29 Mar 2026 13:03:32 +0200 Subject: [PATCH 27/35] =?UTF-8?q?refactor:=20fill=20text=20output=20gaps?= =?UTF-8?q?=20=E2=80=94=20match=20JSON=20fields=20exactly?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Info: add Config section with scan_glob and ignored_fields, shown even when no index exists. Update: add unchanged count to summary line ("37 unchanged"). Ensures text output renders the same data as compact JSON. Co-Authored-By: Claude --- src/outcome/commands/info.rs | 18 ++++++++++++++++++ src/outcome/commands/update.rs | 7 ++++++- 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/src/outcome/commands/info.rs b/src/outcome/commands/info.rs index a53cfd9..1719b22 100644 --- a/src/outcome/commands/info.rs +++ b/src/outcome/commands/info.rs @@ -38,6 +38,24 @@ impl Render for InfoOutcome { blocks.push(Block::Line(one_liner)); blocks.push(Block::Line(String::new())); + // Config section (always shown) + let ignored_str = if self.ignored_fields.is_empty() { + "(none)".into() + } else { + self.ignored_fields.join("\n") + }; + blocks.push(Block::Line("Config:".into())); + blocks.push(Block::Table { + headers: None, + rows: vec![ + vec!["scan glob".into(), self.scan_glob.clone()], + vec!["ignored fields".into(), ignored_str], + ], + style: TableStyle::KeyValue { + title: String::new(), + }, + }); + // Index metadata if let Some(idx) = &self.index { let rev = idx.revision.as_deref().unwrap_or("none"); diff --git a/src/outcome/commands/update.rs b/src/outcome/commands/update.rs index 126a744..591f1a8 100644 --- a/src/outcome/commands/update.rs +++ b/src/outcome/commands/update.rs @@ -39,9 +39,14 @@ impl Render for UpdateOutcome { } else { format!("{total_changes} field(s) changed") }; + let unchanged_suffix = if self.unchanged > 0 { + format!(" ({} unchanged)", self.unchanged) + } else { + String::new() + }; let dry_run_suffix = if self.dry_run { " (dry run)" } else { "" }; blocks.push(Block::Line(format!( - "Scanned {} — {summary}{dry_run_suffix}", + "Scanned {} — {summary}{unchanged_suffix}{dry_run_suffix}", format_file_count(self.files_scanned) ))); From 730fbb1198bbd610b8532eae2ca193a092b5fde8 Mon Sep 17 00:00:00 2001 From: edoch Date: Sun, 29 Mar 2026 13:45:25 +0200 Subject: [PATCH 28/35] refactor: redesign build output to KeyValue style All 12 JSON fields rendered as key-value rows in a single table. Lists (embedded_files, removed_files, new_fields) use multi-line cells. Empty lists show "(none)". Co-Authored-By: Claude --- src/outcome/commands/build.rs | 118 +++++++++++++++------------------- 1 file changed, 51 insertions(+), 67 deletions(-) diff --git a/src/outcome/commands/build.rs b/src/outcome/commands/build.rs index e2b5736..b986cca 100644 --- a/src/outcome/commands/build.rs +++ b/src/outcome/commands/build.rs @@ -47,21 +47,7 @@ impl Render for BuildOutcome { fn render(&self) -> Vec { let mut blocks = vec![]; - // New fields (shown before stats) - for nf in &self.new_fields { - blocks.push(Block::Line(format!( - " new field: {} ({})", - nf.name, - format_file_count(nf.files_found) - ))); - } - if !self.new_fields.is_empty() { - blocks.push(Block::Line( - "Run 'mdvs update' to incorporate new fields.".into(), - )); - } - - // One-liner + // Summary line let rebuild_suffix = if self.full_rebuild { " (full rebuild)" } else { @@ -72,63 +58,61 @@ impl Render for BuildOutcome { format_file_count(self.files_total), format_chunk_count(self.chunks_total) ))); + blocks.push(Block::Line(String::new())); - // Record tables per category with file-by-file detail - if self.files_embedded > 0 { - let detail = self - .embedded_files + // All JSON fields as key-value rows + let new_fields_str = if self.new_fields.is_empty() { + "(none)".into() + } else { + self.new_fields .iter() - .map(|f| format!(" - \"{}\" ({})", f.filename, format_chunk_count(f.chunks))) + .map(|nf| format!("{} ({})", nf.name, format_file_count(nf.files_found))) .collect::>() - .join("\n"); - blocks.push(Block::Table { - headers: None, - rows: vec![ - vec![ - "embedded".to_string(), - format_file_count(self.files_embedded), - format_chunk_count(self.chunks_embedded), - ], - vec![detail, String::new(), String::new()], - ], - style: TableStyle::Record { - detail_rows: vec![1], - }, - }); - } - if self.files_unchanged > 0 { - blocks.push(Block::Table { - headers: None, - rows: vec![vec![ - "unchanged".to_string(), - format_file_count(self.files_unchanged), - format_chunk_count(self.chunks_unchanged), - ]], - style: TableStyle::Compact, - }); - } - if self.files_removed > 0 { - let detail = self - .removed_files + .join("\n") + }; + + let embedded_files_str = if self.embedded_files.is_empty() { + "(none)".into() + } else { + self.embedded_files .iter() - .map(|f| format!(" - \"{}\" ({})", f.filename, format_chunk_count(f.chunks))) + .map(|f| format!("{} ({})", f.filename, format_chunk_count(f.chunks))) .collect::>() - .join("\n"); - blocks.push(Block::Table { - headers: None, - rows: vec![ - vec![ - "removed".to_string(), - format_file_count(self.files_removed), - format_chunk_count(self.chunks_removed), - ], - vec![detail, String::new(), String::new()], - ], - style: TableStyle::Record { - detail_rows: vec![1], - }, - }); - } + .join("\n") + }; + + let removed_files_str = if self.removed_files.is_empty() { + "(none)".into() + } else { + self.removed_files + .iter() + .map(|f| format!("{} ({})", f.filename, format_chunk_count(f.chunks))) + .collect::>() + .join("\n") + }; + + let rows = vec![ + vec!["full rebuild".into(), self.full_rebuild.to_string()], + vec!["files total".into(), self.files_total.to_string()], + vec!["files embedded".into(), self.files_embedded.to_string()], + vec!["files unchanged".into(), self.files_unchanged.to_string()], + vec!["files removed".into(), self.files_removed.to_string()], + vec!["chunks total".into(), self.chunks_total.to_string()], + vec!["chunks embedded".into(), self.chunks_embedded.to_string()], + vec!["chunks unchanged".into(), self.chunks_unchanged.to_string()], + vec!["chunks removed".into(), self.chunks_removed.to_string()], + vec!["new fields".into(), new_fields_str], + vec!["embedded files".into(), embedded_files_str], + vec!["removed files".into(), removed_files_str], + ]; + + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::KeyValue { + title: String::new(), + }, + }); blocks } From 50293e96b1ca0ff97c8ade7b0b4b060071fc2bf9 Mon Sep 17 00:00:00 2001 From: edoch Date: Sun, 29 Mar 2026 13:45:59 +0200 Subject: [PATCH 29/35] refactor: redesign search output to KeyValue style Top-level table for query/model/limit. Per-hit KeyValue tables with #N title, showing file, score, lines, text. All JSON fields represented. Co-Authored-By: Claude --- src/outcome/commands/search.rs | 64 ++++++++++++++-------------------- 1 file changed, 27 insertions(+), 37 deletions(-) diff --git a/src/outcome/commands/search.rs b/src/outcome/commands/search.rs index 689b15a..1ea11c8 100644 --- a/src/outcome/commands/search.rs +++ b/src/outcome/commands/search.rs @@ -22,62 +22,52 @@ impl Render for SearchOutcome { fn render(&self) -> Vec { let mut blocks = vec![]; + // Summary line let hit_word = if self.hits.len() == 1 { "hit" } else { "hits" }; blocks.push(Block::Line(format!( "Searched \"{}\" — {} {hit_word}", self.query, self.hits.len() ))); + blocks.push(Block::Line(String::new())); - if self.hits.is_empty() { - return blocks; - } + // Top-level fields + blocks.push(Block::Table { + headers: None, + rows: vec![ + vec!["query".into(), self.query.clone()], + vec!["model".into(), self.model_name.clone()], + vec!["limit".into(), self.limit.to_string()], + ], + style: TableStyle::KeyValue { + title: String::new(), + }, + }); - // Per-hit record tables with chunk text + // Per-hit KeyValue tables for (i, hit) in self.hits.iter().enumerate() { - let idx = format!("{}", i + 1); - let path = format!("\"{}\"", hit.filename); - let score = format!("{:.3}", hit.score); + let mut rows = vec![ + vec!["file".into(), hit.filename.clone()], + vec!["score".into(), format!("{:.3}", hit.score)], + ]; - let detail = match (&hit.chunk_text, hit.start_line, hit.end_line) { - (Some(text), Some(start), Some(end)) => { - let indented: String = text - .lines() - .map(|l| format!(" {l}")) - .collect::>() - .join("\n"); - format!(" lines {start}-{end}:\n{indented}") - } - (None, Some(start), Some(end)) => format!(" lines {start}-{end}"), - _ => String::new(), - }; + if let (Some(start), Some(end)) = (hit.start_line, hit.end_line) { + rows.push(vec!["lines".into(), format!("{start}-{end}")]); + } - let mut rows = vec![vec![idx, path, score]]; - if !detail.is_empty() { - rows.push(vec![detail, String::new(), String::new()]); + if let Some(ref text) = hit.chunk_text { + rows.push(vec!["text".into(), text.trim().to_string()]); } blocks.push(Block::Table { headers: None, - rows: rows.clone(), - style: if rows.len() > 1 { - TableStyle::Record { - detail_rows: vec![1], - } - } else { - TableStyle::Compact + rows, + style: TableStyle::KeyValue { + title: format!("#{}", i + 1), }, }); } - // Footer - blocks.push(Block::Line(format!( - "{} {hit_word} | model: \"{}\" | limit: {}", - self.hits.len(), - self.model_name, - self.limit, - ))); - blocks } } From ab6b493d74e1c4f0a4e2163a573b8dc4cb506faa Mon Sep 17 00:00:00 2001 From: edoch Date: Sun, 29 Mar 2026 13:46:17 +0200 Subject: [PATCH 30/35] docs: add TODO-0141 (global --quiet flag) Co-Authored-By: Claude --- docs/spec/todos/TODO-0141.md | 28 ++++++++++++++++++++++++++++ docs/spec/todos/index.md | 1 + 2 files changed, 29 insertions(+) create mode 100644 docs/spec/todos/TODO-0141.md diff --git a/docs/spec/todos/TODO-0141.md b/docs/spec/todos/TODO-0141.md new file mode 100644 index 0000000..3fbe038 --- /dev/null +++ b/docs/spec/todos/TODO-0141.md @@ -0,0 +1,28 @@ +--- +id: 141 +title: Global --quiet flag to suppress output on success +status: todo +priority: medium +created: 2026-03-29 +depends_on: [] +blocks: [] +--- + +# TODO-0141: Global --quiet flag to suppress output on success + +## Summary + +Add a global `--quiet` / `-q` flag that suppresses all output when a command succeeds. On error or violations, output is shown normally. Useful for CI/scripting where only the exit code matters. + +## Behavior + +- Success + quiet: no output, exit 0 +- Violations + quiet: show output (violations need to be visible), exit 1 +- Error + quiet: show output (errors need to be visible), exit 2 +- Quiet + verbose: verbose wins (or error — conflicting flags) + +## Implementation + +1. Add `#[arg(global = true, short = 'q', long)]` `quiet: bool` to `Cli` struct +2. In main.rs dispatch: if `quiet && !failed && !violations` → skip printing +3. Conflicts with `--verbose` — clap can enforce mutual exclusivity via `conflicts_with` diff --git a/docs/spec/todos/index.md b/docs/spec/todos/index.md index 64a3856..71485e3 100644 --- a/docs/spec/todos/index.md +++ b/docs/spec/todos/index.md @@ -142,3 +142,4 @@ | [0138](TODO-0138.md) | Remove enum variant wrapper from JSON output | done | high | 2026-03-23 | | [0139](TODO-0139.md) | Unify fail helpers across commands | done | low | 2026-03-23 | | [0140](TODO-0140.md) | Global --dry-run flag | todo | medium | 2026-03-28 | +| [0141](TODO-0141.md) | Global --quiet flag to suppress output on success | todo | medium | 2026-03-29 | From 0bdfe7d4ff9bb89a5b4229dfb87b417acd691cbc Mon Sep 17 00:00:00 2001 From: edoch Date: Sun, 29 Mar 2026 15:47:40 +0200 Subject: [PATCH 31/35] refactor: redesign clean output to KeyValue style All 4 JSON fields (removed, path, files_removed, size) rendered as key-value rows. Update related test assertions for new block count. Co-Authored-By: Claude --- src/outcome/commands/clean.rs | 46 +++++++++++++++++++++-------------- src/outcome/mod.rs | 2 +- src/step.rs | 4 +-- 3 files changed, 31 insertions(+), 21 deletions(-) diff --git a/src/outcome/commands/clean.rs b/src/outcome/commands/clean.rs index d7f6be8..1965673 100644 --- a/src/outcome/commands/clean.rs +++ b/src/outcome/commands/clean.rs @@ -4,8 +4,8 @@ use std::path::PathBuf; use serde::Serialize; -use crate::block::{Block, Render}; -use crate::output::{format_file_count, format_size}; +use crate::block::{Block, Render, TableStyle}; +use crate::output::format_size; /// Full outcome for the clean command. #[derive(Debug, Serialize)] @@ -22,21 +22,35 @@ pub struct CleanOutcome { impl Render for CleanOutcome { fn render(&self) -> Vec { + let mut blocks = vec![]; + + // Summary line if self.removed { - vec![ - Block::Line(format!("Cleaned \"{}\"", self.path.display())), - Block::Line(format!( - "{} | {}", - format_file_count(self.files_removed), - format_size(self.size_bytes), - )), - ] + blocks.push(Block::Line(format!("Cleaned \"{}\"", self.path.display()))); } else { - vec![Block::Line(format!( + blocks.push(Block::Line(format!( "Nothing to clean — \"{}\" does not exist", self.path.display() - ))] + ))); } + blocks.push(Block::Line(String::new())); + + // All JSON fields as key-value rows + let rows = vec![ + vec!["removed".into(), self.removed.to_string()], + vec!["path".into(), self.path.display().to_string()], + vec!["files removed".into(), self.files_removed.to_string()], + vec!["size".into(), format_size(self.size_bytes)], + ]; + blocks.push(Block::Table { + headers: None, + rows, + style: TableStyle::KeyValue { + title: String::new(), + }, + }); + + blocks } } @@ -53,15 +67,12 @@ mod tests { size_bytes: 1024, }; let blocks = outcome.render(); - assert_eq!(blocks.len(), 2); match &blocks[0] { Block::Line(s) => assert_eq!(s, "Cleaned \".mdvs\""), _ => panic!("expected Line"), } - match &blocks[1] { - Block::Line(s) => assert!(s.contains("2 files") && s.contains("1.0 KB")), - _ => panic!("expected Line"), - } + // Table is present + assert!(blocks.iter().any(|b| matches!(b, Block::Table { .. }))); } #[test] @@ -73,7 +84,6 @@ mod tests { size_bytes: 0, }; let blocks = outcome.render(); - assert_eq!(blocks.len(), 1); match &blocks[0] { Block::Line(s) => assert!(s.contains("Nothing to clean")), _ => panic!("expected Line"), diff --git a/src/outcome/mod.rs b/src/outcome/mod.rs index 07295a3..cbbb3da 100644 --- a/src/outcome/mod.rs +++ b/src/outcome/mod.rs @@ -133,7 +133,7 @@ mod tests { size_bytes: 100, }); let blocks = outcome.render(); - assert_eq!(blocks.len(), 2); + assert_eq!(blocks.len(), 3); // summary line + empty line + table } #[test] diff --git a/src/step.rs b/src/step.rs index 7280dd6..2d0e814 100644 --- a/src/step.rs +++ b/src/step.rs @@ -401,7 +401,7 @@ mod tests { elapsed_ms: 15, }; let blocks = result.render_verbose(); - assert_eq!(blocks.len(), 3); // step line + 2 clean lines + assert_eq!(blocks.len(), 4); // step line + summary + empty line + table match &blocks[0] { Block::Line(s) => assert!(s.contains("Scan") && s.contains("(10ms)")), _ => panic!("expected Line"), @@ -427,7 +427,7 @@ mod tests { elapsed_ms: 15, }; let blocks = result.render_compact(); - assert_eq!(blocks.len(), 2); // 2 clean lines, no step line + assert_eq!(blocks.len(), 3); // summary + empty line + table, no step line } #[test] From b21c2727f2c1159e20b32bfa13411dd937e27e93 Mon Sep 17 00:00:00 2001 From: edoch Date: Sun, 29 Mar 2026 15:48:33 +0200 Subject: [PATCH 32/35] docs: add TODO-0142 (fix chunk line numbers to exclude frontmatter) Chunk start_line/end_line reference the full file including frontmatter. When read back, YAML frontmatter appears in chunk text. Line numbers should be offset by frontmatter line count. Co-Authored-By: Claude --- docs/spec/todos/TODO-0142.md | 43 ++++++++++++++++++++++++++++++++++++ docs/spec/todos/index.md | 1 + 2 files changed, 44 insertions(+) create mode 100644 docs/spec/todos/TODO-0142.md diff --git a/docs/spec/todos/TODO-0142.md b/docs/spec/todos/TODO-0142.md new file mode 100644 index 0000000..4b8cd73 --- /dev/null +++ b/docs/spec/todos/TODO-0142.md @@ -0,0 +1,43 @@ +--- +id: 142 +title: Fix chunk line numbers to exclude frontmatter +status: todo +priority: high +created: 2026-03-29 +depends_on: [] +blocks: [] +--- + +# TODO-0142: Fix chunk line numbers to exclude frontmatter + +## Summary + +Chunk `start_line`/`end_line` values reference the full file (including frontmatter). When `read_lines()` in search.rs reads chunk text using these line numbers, frontmatter YAML is included in the output. The line numbers should be offset to reference the body only (after the `---` closing delimiter). + +## Problem + +The chunker (`text-splitter`) operates on the body text extracted by `gray_matter`. It produces line numbers relative to the body. But when these are stored in `chunks.parquet`, they're stored as-is — not offset by the frontmatter's line count. When a chunk starts at the beginning of the body, `start_line` is 1, which in the full file is the `---` line. + +Example: `projects/archived/gamma/post-mortem.md` has `start_line: 1, end_line: 11`. The chunk text read back includes the frontmatter: +``` +--- +title: "Project Gamma — Post-Mortem" +status: archived +... +``` + +## Fix options + +1. **Offset during chunking**: When building chunks, add the frontmatter line count to `start_line`/`end_line` before storing. The stored values then reference the full file correctly. + +2. **Offset during read**: In `read_lines()`, detect and skip the frontmatter before reading. But this is fragile — the frontmatter might have changed since build time. + +3. **Store body-relative lines + frontmatter offset**: Store `start_line`/`end_line` as body-relative, plus a `frontmatter_lines` field. Convert when reading. + +Option 1 is simplest and most correct — the stored line numbers should reference the actual file. + +## Files to investigate + +- `src/cmd/build.rs` — `embed_file()` function, where `start_line`/`end_line` are set +- `src/index/chunk.rs` — `Chunks::new()`, how line numbers are computed +- `src/cmd/search.rs` — `read_lines()`, where chunk text is read back diff --git a/docs/spec/todos/index.md b/docs/spec/todos/index.md index 71485e3..14c6268 100644 --- a/docs/spec/todos/index.md +++ b/docs/spec/todos/index.md @@ -143,3 +143,4 @@ | [0139](TODO-0139.md) | Unify fail helpers across commands | done | low | 2026-03-23 | | [0140](TODO-0140.md) | Global --dry-run flag | todo | medium | 2026-03-28 | | [0141](TODO-0141.md) | Global --quiet flag to suppress output on success | todo | medium | 2026-03-29 | +| [0142](TODO-0142.md) | Fix chunk line numbers to exclude frontmatter | todo | high | 2026-03-29 | From 9870ca5540b42c433a3ffc176c56d89fc8b45d71 Mon Sep 17 00:00:00 2001 From: edoch Date: Sun, 29 Mar 2026 15:58:18 +0200 Subject: [PATCH 33/35] fix: offset chunk line numbers by frontmatter length Add body_line_offset to ScannedFile, computed during scan as the difference between raw file lines and body lines. Apply offset in embed_file() so chunk start_line/end_line reference the full file, not the body-only text. Fixes chunk text in search results including YAML frontmatter when a chunk starts at the beginning of the document body. Existing indexes need --force rebuild to pick up the fix. Co-Authored-By: Claude --- src/cmd/build.rs | 5 +++-- src/discover/infer.rs | 1 + src/discover/scan.rs | 6 ++++++ 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/src/cmd/build.rs b/src/cmd/build.rs index 7a50f1c..46f3728 100644 --- a/src/cmd/build.rs +++ b/src/cmd/build.rs @@ -165,8 +165,8 @@ async fn embed_file( chunk_id: uuid::Uuid::new_v4().to_string(), file_id: file_id.to_string(), chunk_index: chunk.chunk_index as i32, - start_line: chunk.start_line as i32, - end_line: chunk.end_line as i32, + start_line: (chunk.start_line + file.body_line_offset) as i32, + end_line: (chunk.end_line + file.body_line_offset) as i32, embedding, }) .collect() @@ -1846,6 +1846,7 @@ mod tests { path: std::path::PathBuf::from(path), data: None, content: body.to_string(), + body_line_offset: 0, }) .collect(), } diff --git a/src/discover/infer.rs b/src/discover/infer.rs index f4bce57..71f995d 100644 --- a/src/discover/infer.rs +++ b/src/discover/infer.rs @@ -446,6 +446,7 @@ mod tests { path: PathBuf::from(path), data, content: content.to_string(), + body_line_offset: 0, } } diff --git a/src/discover/scan.rs b/src/discover/scan.rs index 94cdc8c..9641f76 100644 --- a/src/discover/scan.rs +++ b/src/discover/scan.rs @@ -36,6 +36,9 @@ pub struct ScannedFile { pub data: Option, /// Markdown body (after frontmatter extraction), trimmed. pub content: String, + /// Number of lines before the body in the original file (frontmatter + delimiters). + /// Used to offset chunk line numbers so they reference the full file. + pub body_line_offset: usize, } /// Collection of scanned markdown files from a directory walk. @@ -115,6 +118,7 @@ impl ScannedFiles { path: rel_path, data: None, content: raw.trim().to_string(), + body_line_offset: 0, }); continue; }; @@ -154,11 +158,13 @@ impl ScannedFiles { } let content = parsed.content.trim().to_string(); + let body_line_offset = raw.lines().count().saturating_sub(content.lines().count()); files.push(ScannedFile { path: rel_path, data, content, + body_line_offset, }); } From e7c383c145cf23f5e3a61004e98a660306b53adf Mon Sep 17 00:00:00 2001 From: edoch Date: Sun, 29 Mar 2026 17:47:55 +0200 Subject: [PATCH 34/35] docs: update all book output examples to KeyValue table format Co-Authored-By: Claude --- book/src/commands/build.md | 91 +++++++++------ book/src/commands/check.md | 97 ++++++++-------- book/src/commands/clean.md | 37 +++++- book/src/commands/info.md | 131 ++++++++++++---------- book/src/commands/init.md | 120 +++++++++++--------- book/src/commands/search.md | 114 ++++++++++++------- book/src/commands/update.md | 88 +++++++-------- book/src/getting-started.md | 211 +++++++++++++++++++++-------------- book/src/search-guide.md | 82 ++++++++++---- docs/spec/todos/TODO-0114.md | 126 ++++++++++++++------- 10 files changed, 670 insertions(+), 427 deletions(-) diff --git a/book/src/commands/build.md b/book/src/commands/build.md index 890846f..ae92b22 100644 --- a/book/src/commands/build.md +++ b/book/src/commands/build.md @@ -83,25 +83,36 @@ On the first build (no existing `.mdvs/`), `--force` is never needed. ### Compact (default) -Incremental build with one new file: - -``` -Built index — 44 files, 60 chunks - -╭──────────────────────────┬─────────────────────────┬─────────────────────────╮ -│ embedded │ 1 file │ 1 chunk │ -│ unchanged │ 43 files │ 59 chunks │ -╰──────────────────────────┴─────────────────────────┴─────────────────────────╯ -``` - -When nothing needs embedding: +When nothing needs embedding (incremental build, all files unchanged): ``` Built index — 43 files, 59 chunks -╭──────────────────────────┬─────────────────────────┬─────────────────────────╮ -│ unchanged │ 43 files │ 59 chunks │ -╰──────────────────────────┴─────────────────────────┴─────────────────────────╯ +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ full rebuild │ false │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files total │ 43 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files embedded │ 0 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files unchanged │ 43 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files removed │ 0 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ chunks total │ 59 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ chunks embedded │ 0 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ chunks unchanged │ 59 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ chunks removed │ 0 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ new fields │ (none) │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ embedded files │ (none) │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ removed files │ (none) │ +└──────────────────────────┴───────────────────────────────────────────────────┘ ``` When violations are found, the build aborts: @@ -112,29 +123,39 @@ Build aborted — 6 violation(s) found. Run `mdvs check` for details. ### Verbose (`-v`) +Verbose output adds pipeline timing lines before the result: + ``` -Read config: example_kb/mdvs.toml -Scan: 44 files -Validate: 44 files — no violations -Classify: 44 files (full rebuild) -Load model: "minishlab/potion-base-8M" (256d) -Embed: 44 files (60 chunks) -Write index: 44 files, 60 chunks - -Built index — 44 files, 60 chunks (full rebuild) - -╭─────────────────────────┬─────────────────────────┬──────────────────────────╮ -│ embedded │ 44 files │ 60 chunks │ -├─────────────────────────┴─────────────────────────┴──────────────────────────┤ -│ - "README.md" (7 chunks) │ -│ - "blog/drafts/grant-ideas.md" (2 chunks) │ -│ - "blog/drafts/upcoming-talk.md" (1 chunk) │ -│ ... │ -│ - "scratch.md" (1 chunk) │ -╰──────────────────────────────────────────────────────────────────────────────╯ +Read config: example_kb/mdvs.toml (4ms) +Scan: 43 files (4ms) +Infer: 37 field(s) (0ms) +Validate: 43 files — no violations (87ms) +Classify: 43 files (full rebuild) (0ms) +Load model: minishlab/potion-base-8M (24ms) +Embed: 43 files, 59 chunks (12ms) +Write index: 43 files, 59 chunks (1ms) +Built index — 43 files, 59 chunks (full rebuild) + +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ full rebuild │ true │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files total │ 43 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files embedded │ 43 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files unchanged │ 0 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ ... │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ embedded files │ README.md (7 chunks) │ +│ │ blog/drafts/grant-ideas.md (2 chunks) │ +│ │ ... │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ removed files │ (none) │ +└──────────────────────────┴───────────────────────────────────────────────────┘ ``` -Verbose output shows each pipeline step with its result, and expands embedded files with per-file chunk counts. +The key-value table is identical in both modes — verbose only adds the step lines showing processing times. When files are embedded, the `embedded files` row lists each file with its chunk count. ## Exit codes diff --git a/book/src/commands/check.md b/book/src/commands/check.md index d3168fa..e112a2f 100644 --- a/book/src/commands/check.md +++ b/book/src/commands/check.md @@ -48,60 +48,69 @@ mdvs check example_kb Checked 43 files — no violations ``` -When violations are found: +When violations are found, each violation is shown as a key-value table with the field name, violation kind, the violated rule, and the affected files: ``` -Checked 43 files — 3 violation(s), 1 new field(s) - -╭──────────────────────────┬─────────────────────────────┬─────────────────────╮ -│ "drift_rate" │ NullNotAllowed │ 1 file │ -│ "priority" │ WrongType │ 2 files │ -│ "title" │ MissingRequired │ 6 files │ -╰──────────────────────────┴─────────────────────────────┴─────────────────────╯ - -╭──────────────────────────────┬─────────────────────┬─────────────────────────╮ -│ "algorithm" │ new │ 2 files │ -╰──────────────────────────────┴─────────────────────┴─────────────────────────╯ +Checked 43 files — 3 violation(s) + +Violations (3): +┌ drift_rate ──────────────┬───────────────────────────────────────────────────┐ +│ kind │ Null value not allowed │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ rule │ not nullable │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ projects/alpha/notes/experiment-2.md │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ priority ────────────────┬───────────────────────────────────────────────────┐ +│ kind │ Wrong type │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ rule │ type Integer │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ projects/beta/notes/initial-findings.md (got Stri │ +│ │ ng) │ +│ │ projects/beta/overview.md (got String) │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ title ───────────────────┬───────────────────────────────────────────────────┐ +│ kind │ Missing required │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ rule │ required in ["**"] │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ README.md │ +│ │ lab-values.md │ +│ │ reference/glossary.md │ +│ │ reference/quick-start.md │ +│ │ reference/tools.md │ +│ │ scratch.md │ +└──────────────────────────┴───────────────────────────────────────────────────┘ ``` -Each violation row shows the field name, violation kind, and how many files are affected. New fields appear in a separate table below. +`WrongType` violations include the actual type in parentheses (e.g., `got String`). ### Verbose (`-v`) +Verbose output adds pipeline timing lines before the result: + ``` -Checked 43 files — 3 violation(s), 1 new field(s) - -╭────────────────────────────┬────────────────────────────┬────────────────────╮ -│ "drift_rate" │ NullNotAllowed │ 1 file │ -├────────────────────────────┴────────────────────────────┴────────────────────┤ -│ - "projects/alpha/notes/experiment-2.md" │ -╰──────────────────────────────────────────────────────────────────────────────╯ -╭────────────────────────────┬─────────────────────────┬───────────────────────╮ -│ "priority" │ WrongType │ 2 files │ -├────────────────────────────┴─────────────────────────┴───────────────────────┤ -│ - "projects/beta/notes/initial-findings.md" (got String) │ -│ - "projects/beta/overview.md" (got String) │ -╰──────────────────────────────────────────────────────────────────────────────╯ -╭───────────────────────┬───────────────────────────────┬──────────────────────╮ -│ "title" │ MissingRequired │ 6 files │ -├───────────────────────┴───────────────────────────────┴──────────────────────┤ -│ - "README.md" │ -│ - "lab-values.md" │ -│ - "reference/glossary.md" │ -│ - "reference/quick-start.md" │ -│ - "reference/tools.md" │ -│ - "scratch.md" │ -╰──────────────────────────────────────────────────────────────────────────────╯ - -╭──────────────────────────────┬─────────────────────┬─────────────────────────╮ -│ "algorithm" │ new │ 2 files │ -├──────────────────────────────┴─────────────────────┴─────────────────────────┤ -│ - "projects/beta/notes/initial-findings.md" │ -│ - "projects/beta/notes/replication.md" │ -╰──────────────────────────────────────────────────────────────────────────────╯ +Read config: example_kb/mdvs.toml (3ms) +Scan: 43 files (2ms) +Validate: 43 files — 3 violation(s) (78ms) +Checked 43 files — 3 violation(s) + +Violations (3): +┌ drift_rate ──────────────┬───────────────────────────────────────────────────┐ +│ kind │ Null value not allowed │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ rule │ not nullable │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ projects/alpha/notes/experiment-2.md │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +... ``` -Verbose output expands each violation into a record with the offending file paths. `WrongType` violations include the actual type in parentheses (e.g., `got String`). +The violation tables are identical in both modes — verbose only adds the step lines showing processing times. ## Exit codes diff --git a/book/src/commands/clean.md b/book/src/commands/clean.md index f67a004..043f190 100644 --- a/book/src/commands/clean.md +++ b/book/src/commands/clean.md @@ -32,26 +32,53 @@ mdvs clean example_kb ``` Cleaned "example_kb/.mdvs" + +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ removed │ true │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ path │ example_kb/.mdvs │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files removed │ 2 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ size │ 113.7 KB │ +└──────────────────────────┴───────────────────────────────────────────────────┘ ``` When there's nothing to clean: ``` Nothing to clean — "example_kb/.mdvs" does not exist + +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ removed │ false │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ path │ example_kb/.mdvs │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files removed │ 0 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ size │ 0 B │ +└──────────────────────────┴───────────────────────────────────────────────────┘ ``` ### Verbose (`-v`) -``` -Delete index: "example_kb/.mdvs" (2 files, 113.6 KB) +Verbose output adds pipeline timing lines before the result: +``` +Delete index: example_kb/.mdvs (2 files, 113.8 KB) (0ms) Cleaned "example_kb/.mdvs" -2 files | 113.6 KB +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ removed │ true │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ path │ example_kb/.mdvs │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files removed │ 2 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ size │ 113.8 KB │ +└──────────────────────────┴───────────────────────────────────────────────────┘ ``` -Verbose output shows the file count and total size of the deleted directory. - ## Exit codes | Code | Meaning | diff --git a/book/src/commands/info.md b/book/src/commands/info.md index bdb5993..026852d 100644 --- a/book/src/commands/info.md +++ b/book/src/commands/info.md @@ -30,83 +30,102 @@ Use it to check which fields are configured, whether the index is up to date, or mdvs info example_kb ``` +The output is organized into sections: Config, Index (if built), and one key-value table per field. Only a few fields are shown here: + ``` 43 files, 37 fields, 59 chunks -╭──────────────────────────────┬───────────────────────────────────────────────╮ -│ model: │ minishlab/potion-base-8M │ -│ config: │ match │ -│ files: │ 43/43 │ -╰──────────────────────────────┴───────────────────────────────────────────────╯ - -╭──────────────┬───────────────┬───────────────┬───────────────┬───────────────╮ -│ "title" │ String │ required: "bl │ allowed: "blo │ │ -│ │ │ og/**", ... │ g/**", ... │ │ -│ "tags" │ String[] │ required: "bl │ allowed: "blo │ │ -│ │ │ og/published/ │ g/**", ... │ │ -│ │ │ **", ... │ │ │ -│ "draft" │ Boolean │ required: "bl │ allowed: "blo │ │ -│ │ │ og/**" │ g/**" │ │ -│ "drift_rate" │ Float? │ required: "pr │ allowed: "pro │ │ -│ │ │ ojects/alpha/ │ jects/alpha/n │ │ -│ │ │ notes/**" │ otes/**" │ │ -│ ... │ │ │ │ │ -╰──────────────┴───────────────┴───────────────┴───────────────┴───────────────╯ +Config: +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ scan glob │ ** │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ ignored fields │ (none) │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +Index: +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ model │ minishlab/potion-base-8M │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ revision │ none │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ chunk size │ 1024 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ built │ 2026-03-29T15:22:21.347671+00:00 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ config │ match │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ 43 out of 43 │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +37 fields: +┌ action_items ────────────┬───────────────────────────────────────────────────┐ +│ type │ String[] │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ 9 out of 43 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ nullable │ false │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ required │ meetings/all-hands/** │ +│ │ projects/alpha/meetings/** │ +│ │ projects/beta/meetings/** │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ allowed │ meetings/** │ +│ │ projects/alpha/meetings/** │ +│ │ projects/beta/meetings/** │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +... + +┌ drift_rate ──────────────┬───────────────────────────────────────────────────┐ +│ type │ Float │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ 3 out of 43 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ nullable │ true │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ required │ projects/alpha/notes/** │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ allowed │ projects/alpha/notes/** │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +... ``` -The summary line shows files on disk, field count, and chunk count. The index block shows the embedding model, whether the config matches the index (`match` or `changed`), and how many files are indexed vs on disk. The field table lists every `[[fields.field]]` entry with its type, required patterns, and allowed patterns. +The `config` row shows `match` when `mdvs.toml` matches the index metadata, or `changed` when the config has been modified since the last build. The `files` row shows indexed files vs files on disk. When no index has been built: ``` 43 files, 37 fields + +Config: +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ scan glob │ ** │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ ignored fields │ (none) │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +37 fields: +... ``` -The index block is omitted and the summary shows only files and fields. +The Index section is omitted and the summary shows only files and fields (no chunk count). ### Verbose (`-v`) -``` -Read config: example_kb/mdvs.toml -Scan: 43 files -Read index: 43 files, 59 chunks +Verbose output adds pipeline timing lines before the result: +``` +Read config: example_kb/mdvs.toml (2ms) +Scan: 43 files (3ms) +Read index: 43 files, 59 chunks (2ms) 43 files, 37 fields, 59 chunks -╭────────────────────────────┬─────────────────────────────────────────────────╮ -│ model: │ minishlab/potion-base-8M │ -│ revision: │ none │ -│ chunk size: │ 1024 │ -│ built: │ 2026-03-13T22:46:02.902129+00:00 │ -│ config: │ match │ -│ files: │ 43/43 │ -╰────────────────────────────┴─────────────────────────────────────────────────╯ - -╭────────────────────────────────┬────────────────────────┬────────────────────╮ -│ "action_items" │ String[] │ 9/43 │ -├────────────────────────────────┴────────────────────────┴────────────────────┤ -│ required: │ -│ - "meetings/all-hands/**" │ -│ - "projects/alpha/meetings/**" │ -│ - "projects/beta/meetings/**" │ -│ allowed: │ -│ - "meetings/**" │ -│ - "projects/alpha/meetings/**" │ -│ - "projects/beta/meetings/**" │ -╰──────────────────────────────────────────────────────────────────────────────╯ -╭──────────────────────────────┬────────────────────────┬──────────────────────╮ -│ "drift_rate" │ Float? │ 3/43 │ -├──────────────────────────────┴────────────────────────┴──────────────────────┤ -│ required: │ -│ - "projects/alpha/notes/**" │ -│ allowed: │ -│ - "projects/alpha/notes/**" │ -│ nullable: true │ -╰──────────────────────────────────────────────────────────────────────────────╯ +Config: ... ``` -Verbose output adds pipeline steps, the full index details (revision, chunk size, build timestamp), and expands each field into a record showing its glob patterns. The count column (e.g., `9/43`) shows how many scanned files contain the field. +The tables are identical in both modes — verbose only adds the step lines showing processing times. ## Exit codes diff --git a/book/src/commands/init.md b/book/src/commands/init.md index 8f0669d..eb686a0 100644 --- a/book/src/commands/init.md +++ b/book/src/commands/init.md @@ -46,68 +46,90 @@ Both re-infer the schema from scratch, but they differ in scope: mdvs init example_kb ``` +Each discovered field is shown as its own key-value table with the field name on the top border. Only a few fields are shown here — the full output includes all 37: + ``` Initialized 43 files — 37 field(s) -╭─────────────────────┬───────────────────────┬───────┬────────────────────────╮ -│ "action_items" │ String[] │ 9/43 │ │ -│ "algorithm" │ String │ 2/43 │ │ -│ "ambient_humidity" │ Float │ 1/43 │ │ -│ ... │ │ │ │ -│ "drift_rate" │ Float? │ 3/43 │ │ -│ ... │ │ │ │ -│ "lab section" │ String │ 4/43 │ use "field name" in -- │ -│ │ │ │ where │ -│ ... │ │ │ │ -│ "title" │ String │ 37/43 │ │ -│ "wavelength_nm" │ Float │ 3/43 │ │ -╰─────────────────────┴───────────────────────┴───────┴────────────────────────╯ +┌ action_items ────────────┬───────────────────────────────────────────────────┐ +│ type │ String[] │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ 9 out of 43 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ nullable │ false │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ required │ meetings/all-hands/** │ +│ │ projects/alpha/meetings/** │ +│ │ projects/beta/meetings/** │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ allowed │ meetings/** │ +│ │ projects/alpha/meetings/** │ +│ │ projects/beta/meetings/** │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +... + +┌ drift_rate ──────────────┬───────────────────────────────────────────────────┐ +│ type │ Float │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ 3 out of 43 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ nullable │ true │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ required │ projects/alpha/notes/** │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ allowed │ projects/alpha/notes/** │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +... + +┌ title ───────────────────┬───────────────────────────────────────────────────┐ +│ type │ String │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ 37 out of 43 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ nullable │ false │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ required │ blog/** │ +│ │ meetings/** │ +│ │ people/** │ +│ │ projects/** │ +│ │ reference/protocols/** │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ allowed │ blog/** │ +│ │ meetings/** │ +│ │ people/** │ +│ │ projects/** │ +│ │ reference/protocols/** │ +└──────────────────────────┴───────────────────────────────────────────────────┘ Initialized mdvs in 'example_kb' ``` -Each row shows the field name, inferred type, how many files contain it (e.g., `9/43`), and optional hints for `--where` syntax (see [Search Guide](../search-guide.md) for details on quoting and escaping). The `?` suffix on a type (e.g., `Float?`) means the field is nullable. +Each table shows the inferred type, file count, nullable status, and inferred `required`/`allowed` glob patterns. Fields with special characters in their name (e.g., `lab section`) include a `hints` row with `--where` syntax advice (see [Search Guide](../search-guide.md)). ### Verbose (`-v`) +Verbose output adds pipeline timing lines before the result: + ```bash mdvs init example_kb -v ``` ``` +Scan: 43 files (5ms) +Infer: 37 field(s) (0ms) +Write config: example_kb/mdvs.toml (0ms) Initialized 43 files — 37 field(s) -╭────────────────────────────────┬────────────────────────┬────────────────────╮ -│ "action_items" │ String[] │ 9/43 │ -├────────────────────────────────┴────────────────────────┴────────────────────┤ -│ required: │ -│ - "meetings/all-hands/**" │ -│ - "projects/alpha/meetings/**" │ -│ - "projects/beta/meetings/**" │ -│ allowed: │ -│ - "meetings/**" │ -│ - "projects/alpha/meetings/**" │ -│ - "projects/beta/meetings/**" │ -╰──────────────────────────────────────────────────────────────────────────────╯ -╭───────────────────────────────────┬─────────────────────┬────────────────────╮ -│ "ambient_humidity" │ Float │ 1/43 │ -├───────────────────────────────────┴─────────────────────┴────────────────────┤ -│ allowed: │ -│ - "projects/alpha/notes/**" │ -╰──────────────────────────────────────────────────────────────────────────────╯ -╭──────────────────────────────┬──────────────────────────┬────────────────────╮ -│ "drift_rate" │ Float? │ 3/43 │ -├──────────────────────────────┴──────────────────────────┴────────────────────┤ -│ required: │ -│ - "projects/alpha/notes/**" │ -│ allowed: │ -│ - "projects/alpha/notes/**" │ -│ nullable: true │ -╰──────────────────────────────────────────────────────────────────────────────╯ +┌ action_items ────────────┬───────────────────────────────────────────────────┐ +│ type │ String[] │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ 9 out of 43 │ ... ``` -Verbose output shows each field as a record with its `required` and `allowed` glob patterns. Fields with `required = []` omit the required line. Nullable fields show `nullable: true`. +The field tables are identical in both modes — verbose only adds the step lines showing processing times. ## Examples @@ -123,23 +145,13 @@ Nothing is written — the output shows the same discovery table, followed by `( ### Exclude bare files -By default, files without frontmatter are included in the scan. This affects field counts — a bare file at the root means `title` appears in 37/43 files instead of 37/37: +By default, files without frontmatter are included in the scan. This affects field counts — a bare file at the root means `title` appears in `37 out of 43` files instead of `37 out of 37`: ```bash mdvs init example_kb --dry-run --force --ignore-bare-files ``` -``` -Initialized 37 files — 37 field(s) (dry run) - -╭─────────────────────┬───────────────────────┬───────┬────────────────────────╮ -│ ... │ │ │ │ -│ "title" │ String │ 37/37 │ │ -│ ... │ │ │ │ -╰─────────────────────┴───────────────────────┴───────┴────────────────────────╯ -``` - -With `--ignore-bare-files`, only 37 files are scanned and `title` becomes 37/37. This also affects the inferred `required` patterns — without bare files diluting the counts, more fields can be required in broader paths. +With `--ignore-bare-files`, only 37 files are scanned. The `files` row for `title` becomes `37 out of 37`. This also affects the inferred `required` patterns — without bare files diluting the counts, more fields can be required in broader paths. ## Errors diff --git a/book/src/commands/search.md b/book/src/commands/search.md index 6262ae4..3d6615f 100644 --- a/book/src/commands/search.md +++ b/book/src/commands/search.md @@ -77,25 +77,58 @@ See [Search Guide](../search-guide.md) for the full `--where` reference, includi ### Compact (default) ```bash -mdvs search "experiment" example_kb +mdvs search "experiment" example_kb -n 3 ``` +A header table shows the query metadata, followed by one key-value table per hit numbered `#1`, `#2`, etc. Each hit includes the file, similarity score, line range, and the best-matching chunk text: + ``` -Searched "experiment" — 10 hits - -╭───────────┬────────────────────────────────────────────────────┬─────────────╮ -│ 1 │ "projects/archived/gamma/lessons-learned.md" │ 0.487 │ -│ 2 │ "blog/published/2031/founding-story.md" │ 0.470 │ -│ 3 │ "projects/archived/gamma/post-mortem.md" │ 0.457 │ -│ 4 │ "projects/alpha/notes/experiment-3.md" │ 0.420 │ -│ 5 │ "blog/drafts/grant-ideas.md" │ 0.406 │ -│ ... │ │ │ -╰───────────┴────────────────────────────────────────────────────┴─────────────╯ -``` +Searched "experiment" — 3 hits -Each row shows rank, filename, and cosine similarity score. +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ query │ experiment │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ model │ minishlab/potion-base-8M │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ limit │ 3 │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ #1 ──────────────────────┬───────────────────────────────────────────────────┐ +│ file │ projects/archived/gamma/lessons-learned.md │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ score │ 0.487 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ lines │ 26-28 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ text │ ## On REMO │ +│ │ │ +│ │ REMO's environmental monitoring data from the out │ +│ │ door tests was the most useful output of the enti │ +│ │ re project. ... │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ #2 ──────────────────────┬───────────────────────────────────────────────────┐ +│ file │ blog/published/2031/founding-story.md │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ score │ 0.470 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ lines │ 21-21 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ text │ We are a small lab and we intend to stay small... │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ #3 ──────────────────────┬───────────────────────────────────────────────────┐ +│ file │ projects/archived/gamma/post-mortem.md │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ score │ 0.457 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ lines │ 11-21 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ text │ # Project Gamma — Post-Mortem ... │ +└──────────────────────────┴───────────────────────────────────────────────────┘ +``` -With `--where` filtering: +With `--where` filtering, only files matching the SQL clause are included: ```bash mdvs search "experiment" example_kb --where "status = 'active'" -n 5 @@ -104,47 +137,44 @@ mdvs search "experiment" example_kb --where "status = 'active'" -n 5 ``` Searched "experiment" — 3 hits -╭───────────────┬──────────────────────────────────────────┬───────────────────╮ -│ 1 │ "projects/alpha/overview.md" │ 0.391 │ -│ 2 │ "projects/beta/overview.md" │ 0.358 │ -│ 3 │ "projects/alpha/budget.md" │ 0.001 │ -╰───────────────┴──────────────────────────────────────────┴───────────────────╯ +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ query │ experiment │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ model │ minishlab/potion-base-8M │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ limit │ 5 │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ #1 ──────────────────────┬───────────────────────────────────────────────────┐ +│ file │ projects/alpha/overview.md │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ score │ 0.391 │ +... ``` ### Verbose (`-v`) +Verbose output adds pipeline timing lines before the result: + ```bash mdvs search "experiment" example_kb -v -n 3 ``` ``` +Read config: example_kb/mdvs.toml (2ms) +Scan: 43 files (2ms) +... +Load model: minishlab/potion-base-8M (22ms) +Embed query: "experiment" (0ms) +Execute search: 3 hits (5ms) Searched "experiment" — 3 hits -╭──────────┬─────────────────────────────────────────────────────┬─────────────╮ -│ 1 │ "projects/archived/gamma/lessons-learned.md" │ 0.487 │ -├──────────┴─────────────────────────────────────────────────────┴─────────────┤ -│ lines 17-19: │ -│ │ -│ ## On Timelines │ -╰──────────────────────────────────────────────────────────────────────────────╯ -╭────────────┬─────────────────────────────────────────────────┬───────────────╮ -│ 2 │ "blog/published/2031/founding-story.md" │ 0.470 │ -├────────────┴─────────────────────────────────────────────────┴───────────────┤ -│ lines 11-11: │ -│ # How Prismatiq Started │ -╰──────────────────────────────────────────────────────────────────────────────╯ -╭───────────┬──────────────────────────────────────────────────┬───────────────╮ -│ 3 │ "projects/archived/gamma/post-mortem.md" │ 0.457 │ -├───────────┴──────────────────────────────────────────────────┴───────────────┤ -│ lines 1-11: │ -│ --- │ -│ title: "Project Gamma — Post-Mortem" │ -│ ... │ -╰──────────────────────────────────────────────────────────────────────────────╯ -3 hits | model: "minishlab/potion-base-8M" | limit: 10 +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ query │ experiment │ +... ``` -Verbose output expands each result into a record showing the best-matching chunk text with its line range. The footer shows total hits, model name, and limit. +The hit tables are identical in both modes — verbose only adds the step lines showing processing times. ## Exit codes diff --git a/book/src/commands/update.md b/book/src/commands/update.md index 2d8ff18..c24a7dd 100644 --- a/book/src/commands/update.md +++ b/book/src/commands/update.md @@ -54,78 +54,68 @@ All other config sections (`[scan]`, `[embedding_model]`, `[chunking]`, `[search When the schema is already up to date: ``` -Scanned 43 files — no changes (dry run) +Scanned 43 files — no changes (37 unchanged) (dry run) ``` -When new fields are discovered: +When new fields are discovered, they appear in an "Added" section with the same key-value format as [init](./init.md): ``` -Scanned 44 files — 1 field(s) changed (dry run) +Scanned 44 files — 1 field(s) changed (37 unchanged) (dry run) -╭────────────────────────┬───────────────────┬───────────────────┬─────────────╮ -│ "category" │ added │ String │ │ -╰────────────────────────┴───────────────────┴───────────────────┴─────────────╯ +Added (1): +┌ category ────────────────┬───────────────────────────────────────────────────┐ +│ type │ String │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ 3 out of 44 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ nullable │ false │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ required │ (none) │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ allowed │ projects/alpha/notes/** │ +└──────────────────────────┴───────────────────────────────────────────────────┘ ``` -When `--reinfer` detects a type change: +When `--reinfer` detects a type change, the "Changed" section shows old and new values with an arrow: ``` -Scanned 44 files — 2 field(s) changed (dry run) - -╭────────────────────────┬───────────────────┬───────────────────┬─────────────╮ -│ "category" │ added │ String │ │ -╰────────────────────────┴───────────────────┴───────────────────┴─────────────╯ -╭───────────────────────────────────────────┬──────────────────────────────────╮ -│ "drift_rate" │ type │ -╰───────────────────────────────────────────┴──────────────────────────────────╯ +Scanned 43 files — 1 field(s) changed (36 unchanged) + +Changed (1): +┌ drift_rate ──────────────┬───────────────────────────────────────────────────┐ +│ type │ Float → String │ +└──────────────────────────┴───────────────────────────────────────────────────┘ ``` -When a reinferred field no longer exists: +When a reinferred field no longer exists in any file: ``` -Scanned 43 files — 1 field(s) changed (dry run) +Scanned 43 files — 1 field(s) changed (36 unchanged) -╭────────────────────────────────────────┬─────────────────────────────────────╮ -│ "category" │ removed │ -╰────────────────────────────────────────┴─────────────────────────────────────╯ +Removed (1): +┌ category ────────────────┬───────────────────────────────────────────────────┐ +│ previously allowed │ projects/alpha/notes/** │ +└──────────────────────────┴───────────────────────────────────────────────────┘ ``` ### Verbose (`-v`) -Added fields show the inferred path patterns: - -``` -Scanned 44 files — 1 field(s) changed (dry run) - -╭─────────────────────────────┬───────────────────────┬────────────────────────╮ -│ "category" │ added │ String │ -├─────────────────────────────┴───────────────────────┴────────────────────────┤ -│ found in: │ -│ - "projects/alpha/notes/**" │ -╰──────────────────────────────────────────────────────────────────────────────╯ -``` - -Changed fields show old and new values for each aspect that differs: +Verbose output adds pipeline timing lines before the result: ``` -╭────────────────────────┬──────────────────┬────────────────┬─────────────────╮ -│ field │ aspect │ old │ new │ -│ "drift_rate" │ type │ Float │ String │ -╰────────────────────────┴──────────────────┴────────────────┴─────────────────╯ -``` +Read config: example_kb/mdvs.toml (2ms) +Scan: 44 files (3ms) +Infer: 38 field(s) (0ms) +Write config: example_kb/mdvs.toml (1ms) +Scanned 44 files — 1 field(s) changed (37 unchanged) -Removed fields show where they were previously allowed: - -``` -╭──────────────────────────────┬───────────────────────────┬───────────────────╮ -│ "category" │ removed │ │ -├──────────────────────────────┴───────────────────────────┴───────────────────┤ -│ previously in: │ -│ - "projects/**" │ -╰──────────────────────────────────────────────────────────────────────────────╯ +Added (1): +┌ category ────────────────┬───────────────────────────────────────────────────┐ +│ type │ String │ +... ``` -Verbose output also shows the pipeline steps before the result (Read config, Scan, Infer, Write config, etc.). +The field tables are identical in both modes — verbose only adds the step lines showing processing times. ## Exit codes diff --git a/book/src/getting-started.md b/book/src/getting-started.md index 7f5c5b8..686090b 100644 --- a/book/src/getting-started.md +++ b/book/src/getting-started.md @@ -27,56 +27,58 @@ Run `mdvs init` on the example directory: mdvs init example_kb ``` -mdvs scans every markdown file, extracts frontmatter, and infers a typed schema: +mdvs scans every markdown file, extracts frontmatter, and infers a typed schema. Each discovered field is shown as its own key-value table: ``` Initialized 43 files — 37 field(s) -╭─────────────────────┬───────────────────────┬───────┬────────────────────────╮ -│ "action_items" │ String[] │ 9/43 │ │ -│ "algorithm" │ String │ 2/43 │ │ -│ "ambient_humidity" │ Float │ 1/43 │ │ -│ "approved_by" │ String │ 4/43 │ │ -│ "attendees" │ String[] │ 10/43 │ │ -│ "author" │ String │ 18/43 │ │ -│ "author's_note" │ String │ 3/43 │ ' → '' in --where │ -│ "calibration" │ {adjusted: {intensity │ 2/43 │ │ -│ │ : Float, wavelength: │ │ │ -│ │ Float}, baseline: {in │ │ │ -│ │ tensity: Float, notes │ │ │ -│ │ : String, wavelength: │ │ │ -│ │ Float}} │ │ │ -│ "commission_date" │ String │ 1/43 │ │ -│ "convergence_ms" │ Integer │ 1/43 │ │ -│ "dataset" │ String │ 2/43 │ │ -│ "date" │ String │ 17/43 │ │ -│ "draft" │ Boolean │ 8/43 │ │ -│ "drift_rate" │ Float? │ 3/43 │ │ -│ "duration_minutes" │ Integer │ 10/43 │ │ -│ "email" │ String │ 4/43 │ │ -│ "equipment_id" │ String │ 2/43 │ │ -│ "firmware_version" │ String │ 1/43 │ │ -│ "joined" │ String │ 5/43 │ │ -│ "lab section" │ String │ 4/43 │ use "field name" in -- │ -│ │ │ │ where │ -│ "last_reviewed" │ String │ 4/43 │ │ -│ "notes"v2"" │ Boolean │ 1/43 │ " → "" in --where │ -│ "observation_notes" │ String │ 1/43 │ │ -│ "priority" │ String │ 7/43 │ │ -│ "project" │ String │ 4/43 │ │ -│ "publications" │ Integer │ 2/43 │ │ -│ "review_score" │ String? │ 1/43 │ │ -│ "role" │ String │ 5/43 │ │ -│ "sample_count" │ Integer │ 3/43 │ │ -│ "sensor_type" │ String │ 3/43 │ │ -│ "specialization" │ String │ 2/43 │ │ -│ "status" │ String │ 17/43 │ │ -│ "tags" │ String[] │ 16/43 │ │ -│ "title" │ String │ 37/43 │ │ -│ "unit_id" │ String │ 1/43 │ │ -│ "version" │ String │ 4/43 │ │ -│ "wavelength_nm" │ Float │ 3/43 │ │ -╰─────────────────────┴───────────────────────┴───────┴────────────────────────╯ +┌ draft ───────────────────┬───────────────────────────────────────────────────┐ +│ type │ Boolean │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ 8 out of 43 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ nullable │ false │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ required │ blog/** │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ allowed │ blog/** │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +... + +┌ sensor_type ─────────────┬───────────────────────────────────────────────────┐ +│ type │ String │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ 3 out of 43 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ nullable │ false │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ required │ projects/alpha/notes/** │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ allowed │ projects/alpha/notes/** │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +... + +┌ title ───────────────────┬───────────────────────────────────────────────────┐ +│ type │ String │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ 37 out of 43 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ nullable │ false │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ required │ blog/** │ +│ │ meetings/** │ +│ │ people/** │ +│ │ projects/** │ +│ │ reference/protocols/** │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ allowed │ blog/** │ +│ │ meetings/** │ +│ │ people/** │ +│ │ projects/** │ +│ │ reference/protocols/** │ +└──────────────────────────┴───────────────────────────────────────────────────┘ Initialized mdvs in 'example_kb' ``` @@ -87,7 +89,7 @@ That command did three things: 2. **Inferred** 37 typed fields — strings, integers, floats, booleans, arrays, even a nested object (`calibration`) 3. **Wrote** `mdvs.toml` with the inferred schema -Notice the third column: `draft` appears in 8/43 files — all in `blog/`. `sensor_type` in 3/43 — all in `projects/alpha/notes/`. mdvs captured not just the types, but *where* each field belongs. Run `mdvs init example_kb -v` to see the full path patterns. +Notice the `files` row: `draft` appears in 8 out of 43 files — all in `blog/`. `sensor_type` in 3 out of 43 — all in `projects/alpha/notes/`. mdvs captured not just the types, but *where* each field belongs, via the `required` and `allowed` glob patterns. Here's what a field definition looks like in `mdvs.toml`: @@ -136,24 +138,52 @@ mdvs check example_kb ``` Checked 43 files — 4 violation(s) -╭───────────────────────────────┬───────────────────────────┬──────────────────╮ -│ "convergence_ms" │ WrongType │ 1 file │ -│ "drift_rate" │ NullNotAllowed │ 1 file │ -│ "firmware_version" │ Disallowed │ 1 file │ -│ "observation_notes" │ MissingRequired │ 2 files │ -╰───────────────────────────────┴───────────────────────────┴──────────────────╯ +Violations (4): +┌ convergence_ms ──────────┬───────────────────────────────────────────────────┐ +│ kind │ Wrong type │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ rule │ type Boolean │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ projects/beta/notes/initial-findings.md (got Inte │ +│ │ ger) │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ drift_rate ──────────────┬───────────────────────────────────────────────────┐ +│ kind │ Null value not allowed │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ rule │ not nullable │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ projects/alpha/notes/experiment-2.md │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ firmware_version ────────┬───────────────────────────────────────────────────┐ +│ kind │ Not allowed │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ rule │ allowed in ["people/interns/**"] │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ people/remo.md │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ observation_notes ───────┬───────────────────────────────────────────────────┐ +│ kind │ Missing required │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ rule │ required in ["projects/alpha/notes/**"] │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ files │ projects/alpha/notes/experiment-1.md │ +│ │ projects/alpha/notes/experiment-2.md │ +└──────────────────────────┴───────────────────────────────────────────────────┘ ``` Four violation types, each catching a different kind of problem: | Violation | Meaning | |---|---| -| `MissingRequired` | A file in a required path is missing the field | -| `WrongType` | The value doesn't match the declared type | -| `NullNotAllowed` | The field is present but `null`, and `nullable` is `false` | -| `Disallowed` | The field appears in a file outside its `allowed` paths | +| `Missing required` | A file in a required path is missing the field | +| `Wrong type` | The value doesn't match the declared type | +| `Null value not allowed` | The field is present but `null`, and `nullable` is `false` | +| `Not allowed` | The field appears in a file outside its `allowed` paths | -This is the compact output — it groups violations by field. Add `-v` for verbose output showing every affected file and the specific value that caused the violation. See [check](./commands/check.md) for the full reference. +Each violation table shows the field name, the kind of violation, the violated rule, and the affected files. See [check](./commands/check.md) for the full reference. Revert your changes to `mdvs.toml` before continuing (or re-run `mdvs init example_kb --force` to regenerate it). @@ -168,29 +198,36 @@ mdvs search "calibration" example_kb ``` ``` -Built index — 43 files, 59 chunks (full rebuild) - -╭─────────────────────────┬─────────────────────────┬──────────────────────────╮ -│ embedded │ 43 files │ 59 chunks │ -╰─────────────────────────┴─────────────────────────┴──────────────────────────╯ - Searched "calibration" — 10 hits -╭────────────┬──────────────────────────────────────────────────┬──────────────╮ -│ 1 │ "projects/alpha/meetings/2031-06-15.md" │ 0.585 │ -│ 2 │ "projects/alpha/meetings/2031-10-10.md" │ 0.501 │ -│ 3 │ "projects/alpha/notes/experiment-1.md" │ 0.478 │ -│ 4 │ "blog/drafts/upcoming-talk.md" │ 0.470 │ -│ 5 │ "blog/published/2032/q1/new-equipment.md" │ 0.466 │ -│ 6 │ "meetings/all-hands/2032-01.md" │ 0.465 │ -│ 7 │ "projects/alpha/overview.md" │ 0.462 │ -│ 8 │ "projects/beta/overview.md" │ 0.449 │ -│ 9 │ "reference/tools.md" │ 0.445 │ -│ 10 │ "people/remo.md" │ 0.437 │ -╰────────────┴──────────────────────────────────────────────────┴──────────────╯ +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ query │ calibration │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ model │ minishlab/potion-base-8M │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ limit │ 10 │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ #1 ──────────────────────┬───────────────────────────────────────────────────┐ +│ file │ projects/alpha/meetings/2031-06-15.md │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ score │ 0.585 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ lines │ 14-22 │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ text │ # Alpha Kickoff — Calibration Campaign ... │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ #2 ──────────────────────┬───────────────────────────────────────────────────┐ +│ file │ projects/alpha/meetings/2031-10-10.md │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ score │ 0.501 │ +... + +... ``` -Results are ranked by semantic similarity — not keyword matching. The score column is cosine similarity (higher means more similar). +Results are ranked by semantic similarity — not keyword matching. The `score` is cosine similarity (higher means more similar). The `text` row shows the best-matching chunk from each file. ### Filtering with `--where` @@ -203,11 +240,21 @@ mdvs search "quantum" example_kb --where "status = 'active'" ``` Searched "quantum" — 3 hits -╭───────────────┬──────────────────────────────────────────┬───────────────────╮ -│ 1 │ "projects/beta/overview.md" │ 0.123 │ -│ 2 │ "projects/alpha/overview.md" │ 0.101 │ -│ 3 │ "projects/alpha/budget.md" │ 0.055 │ -╰───────────────┴──────────────────────────────────────────┴───────────────────╯ +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ query │ quantum │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ model │ minishlab/potion-base-8M │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ limit │ 10 │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ #1 ──────────────────────┬───────────────────────────────────────────────────┐ +│ file │ projects/beta/overview.md │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ score │ 0.123 │ +... + +... ``` Only files with `status: active` in their frontmatter are included. The `--where` clause supports any SQL expression — boolean logic, comparisons, array functions, and more. See the [Search Guide](./search-guide.md) for the full syntax. diff --git a/book/src/search-guide.md b/book/src/search-guide.md index da63a9f..521602c 100644 --- a/book/src/search-guide.md +++ b/book/src/search-guide.md @@ -28,10 +28,21 @@ mdvs search "experiment" --where "wavelength_nm BETWEEN 600 AND 800" ``` Searched "experiment" — 2 hits -╭────────────┬─────────────────────────────────────────────────┬───────────────╮ -│ 1 │ "projects/alpha/notes/experiment-3.md" │ 0.420 │ -│ 2 │ "projects/alpha/notes/experiment-1.md" │ 0.356 │ -╰────────────┴─────────────────────────────────────────────────┴───────────────╯ +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ query │ experiment │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ model │ minishlab/potion-base-8M │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ limit │ 10 │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ #1 ──────────────────────┬───────────────────────────────────────────────────┐ +│ file │ projects/alpha/notes/experiment-3.md │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ score │ 0.420 │ +... + +... ``` ### Boolean @@ -69,13 +80,23 @@ mdvs search "calibration" --where "array_has(tags, 'calibration')" ``` ``` -Searched "calibration" — 3 hits +Searched "calibration" — 4 hits + +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ query │ calibration │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ model │ minishlab/potion-base-8M │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ limit │ 10 │ +└──────────────────────────┴───────────────────────────────────────────────────┘ -╭────────────┬─────────────────────────────────────────────────┬───────────────╮ -│ 1 │ "projects/alpha/notes/experiment-1.md" │ 0.478 │ -│ 2 │ "projects/alpha/overview.md" │ 0.462 │ -│ 3 │ "projects/alpha/notes/experiment-3.md" │ 0.424 │ -╰────────────┴─────────────────────────────────────────────────┴───────────────╯ +┌ #1 ──────────────────────┬───────────────────────────────────────────────────┐ +│ file │ projects/alpha/notes/experiment-1.md │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ score │ 0.478 │ +... + +... ``` The SQL-standard `ANY` syntax also works: @@ -107,13 +128,23 @@ mdvs search "experiment" --where "filepath LIKE 'projects/alpha/%'" ``` ``` -Searched "experiment" — 3 hits +Searched "experiment" — 8 hits + +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ query │ experiment │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ model │ minishlab/potion-base-8M │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ limit │ 10 │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ #1 ──────────────────────┬───────────────────────────────────────────────────┐ +│ file │ projects/alpha/notes/experiment-3.md │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ score │ 0.420 │ +... -╭────────────┬─────────────────────────────────────────────────┬───────────────╮ -│ 1 │ "projects/alpha/notes/experiment-3.md" │ 0.420 │ -│ 2 │ "projects/alpha/overview.md" │ 0.391 │ -│ 3 │ "projects/alpha/meetings/2031-08-20.md" │ 0.386 │ -╰────────────┴─────────────────────────────────────────────────┴───────────────╯ +... ``` File paths are stored as relative paths (e.g., `projects/alpha/notes/experiment-1.md`), so use `LIKE` with `%` for path prefix matching: @@ -140,10 +171,21 @@ mdvs search "sensor" --where "calibration['baseline']['wavelength'] > 600" ``` Searched "sensor" — 2 hits -╭────────────┬─────────────────────────────────────────────────┬───────────────╮ -│ 1 │ "projects/alpha/notes/experiment-2.md" │ 0.414 │ -│ 2 │ "projects/alpha/notes/experiment-1.md" │ 0.362 │ -╰────────────┴─────────────────────────────────────────────────┴───────────────╯ +┌──────────────────────────┬───────────────────────────────────────────────────┐ +│ query │ sensor │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ model │ minishlab/potion-base-8M │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ limit │ 10 │ +└──────────────────────────┴───────────────────────────────────────────────────┘ + +┌ #1 ──────────────────────┬───────────────────────────────────────────────────┐ +│ file │ projects/alpha/notes/experiment-2.md │ +├──────────────────────────┼───────────────────────────────────────────────────┤ +│ score │ 0.414 │ +... + +... ``` The top-level field name (`calibration`) can be used bare. Only the nested access needs brackets: diff --git a/docs/spec/todos/TODO-0114.md b/docs/spec/todos/TODO-0114.md index d841288..0e82c30 100644 --- a/docs/spec/todos/TODO-0114.md +++ b/docs/spec/todos/TODO-0114.md @@ -1,6 +1,6 @@ --- id: 114 -title: "Auto-generate CLI output examples in mdBook with mdbook-cmdrun" +title: "Auto-generate CLI output examples in mdBook" status: todo priority: medium created: 2026-03-17 @@ -8,45 +8,91 @@ depends_on: [] blocks: [] --- -# TODO-0114: Auto-generate CLI output examples in mdBook with mdbook-cmdrun +# TODO-0114: Auto-generate CLI output examples in mdBook ## Summary -Use `mdbook-cmdrun` to auto-generate CLI output examples in the book by running `mdvs` commands against `example_kb/` at build time. This keeps book examples always in sync with actual tool output. - -## Details - -### Setup - -- Add `mdbook-cmdrun` as a preprocessor in `book.toml` -- Install `mdbook-cmdrun` in the GitHub Pages deploy workflow (TODO-0095) - -### Usage - -Replace hand-written output blocks in book pages with `cmdrun` directives: - -```markdown - -``` - -During `mdbook build`, the preprocessor runs the command and injects stdout into the page. - -### Scope - -- Identify all book pages with CLI output examples -- Replace static output with `cmdrun` directives where feasible -- Some examples may need to stay static (e.g., error examples, hypothetical output for features not yet implemented) - -### Considerations - -- `example_kb/` must be accessible from the book build directory — may need to adjust the working directory or use relative paths -- Build time increases since commands run during `mdbook build` -- CI workflow needs `mdvs` binary available (build it before `mdbook build`) -- Commands that require a built index (search, info) need a prior `mdvs build` step in CI -- Consider a `scripts/book-setup.sh` that runs `mdvs init` + `mdvs build` on `example_kb/` before `mdbook build` - -### Files - -- `book.toml` — add `mdbook-cmdrun` preprocessor config -- `book/src/commands/*.md` — replace static output with `cmdrun` directives -- `.github/workflows/deploy-book.yml` — install `mdbook-cmdrun`, build `mdvs`, run setup before `mdbook build` +Keep book output examples in sync with actual CLI behavior. A generation script runs `mdvs` commands against `example_kb/`, saves stdout to `book/src/generated/`, and book pages pull them in with `{{#include generated/...}}`. + +## Approach + +**Include-file pattern** (not mdbook-cmdrun): a shell script generates output files, book pages include them. Simpler, no extra preprocessor dependency, works with plain `mdbook build`. + +### Generated files + +- Live in `book/src/generated/` (committed, so local `mdbook build` works without running the script) +- Naming: `-.txt` (e.g. `init-compact.txt`, `check-violations.txt`, `search-where.txt`) +- Script is idempotent — re-running produces identical output if nothing changed +- PRs that change output show diffs in the generated files + +### Error examples + +Some pages show violation/error output. Two options: +1. **Fixture directory** (`book/fixtures/`) with markdown files + mdvs.toml designed to trigger specific violations +2. **Temp manipulation** in the script (copy example_kb, break something, capture output, clean up) + +Option 1 is more maintainable — the fixtures are version-controlled and reviewable. + +### Embedding model in CI + +Search/build/info-with-index examples need the model (~30MB). CI caches it between runs. Validation-layer commands (init, check, update, clean) don't need it. + +## Pages with output examples + +| Page | Commands shown | +|------|---------------| +| `commands/init.md` | init compact, verbose, dry-run | +| `commands/check.md` | check clean, with violations, verbose | +| `commands/build.md` | build compact, verbose, incremental | +| `commands/search.md` | search basic, --where, verbose | +| `commands/update.md` | update compact, verbose, dry-run | +| `commands/info.md` | info compact, verbose | +| `commands/clean.md` | clean compact, verbose | +| `introduction.md` | init | +| `getting-started.md` | init, check, search | +| `search-guide.md` | search with various --where filters | +| `concepts/validation.md` | check with violations | +| `recipes/obsidian.md` | init, check | + +## Plan + +### Wave 1: Infrastructure +- [ ] Create `scripts/generate-book-examples.sh` +- [ ] Create `book/src/generated/` directory +- [ ] Create `book/fixtures/` with markdown files + mdvs.toml for error examples +- [ ] Generate validation-layer outputs (init, check clean, check violations, update, clean, info without index) +- [ ] Generate search-layer outputs (build, search, info with index) +- [ ] Verify all generated files match current CLI output + +### Wave 2: Update command pages +- [ ] `commands/init.md` — replace static blocks with `{{#include}}` +- [ ] `commands/check.md` +- [ ] `commands/build.md` +- [ ] `commands/search.md` +- [ ] `commands/update.md` +- [ ] `commands/info.md` +- [ ] `commands/clean.md` +- [ ] Verify `mdbook build` produces correct HTML + +### Wave 3: Update non-command pages +- [ ] `introduction.md` +- [ ] `getting-started.md` +- [ ] `search-guide.md` +- [ ] `concepts/validation.md` +- [ ] `recipes/obsidian.md` +- [ ] Verify `mdbook build` produces correct HTML + +### Wave 4: CI integration +- [ ] Update `.github/workflows/book.yml` to build mdvs binary +- [ ] Add model download + caching step +- [ ] Run `scripts/generate-book-examples.sh` before `mdbook build` +- [ ] Optional: CI check that generated files are up-to-date (fail if script produces different output than committed) + +## Files + +- `scripts/generate-book-examples.sh` — generation script +- `book/src/generated/*.txt` — generated output files (committed) +- `book/fixtures/` — error fixture files +- `book/src/commands/*.md` — updated to use `{{#include}}` +- `book/src/introduction.md`, `getting-started.md`, etc. — updated +- `.github/workflows/book.yml` — CI integration From a32a0ea92111b1ea6b95178a3e676de4fb43eabb Mon Sep 17 00:00:00 2001 From: edoch Date: Sun, 29 Mar 2026 17:48:03 +0200 Subject: [PATCH 35/35] chore: add book serve target to Justfile Co-Authored-By: Claude --- Justfile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Justfile b/Justfile index b0a01e4..da22bec 100644 --- a/Justfile +++ b/Justfile @@ -1,2 +1,5 @@ mdvs *args: ./target/release/mdvs {{args}} + +book: + mdbook serve book/ --open