Skip to content

refactor: redesign output layer — Step tree, KeyValue tables, pipeline cleanup#21

Merged
edochi merged 35 commits intomainfrom
refactor/step-tree
Mar 29, 2026
Merged

refactor: redesign output layer — Step tree, KeyValue tables, pipeline cleanup#21
edochi merged 35 commits intomainfrom
refactor/step-tree

Conversation

@edochi
Copy link
Copy Markdown
Owner

@edochi edochi commented Mar 29, 2026

Summary

Complete rewrite of the output and pipeline layers:

  • Step tree architecture: Replaced ad-hoc command output with CommandResult { steps, result, elapsed_ms }. Compact/verbose is now a rendering choice, not a type choice — ~1,300 lines of CompactOutcome boilerplate deleted.
  • Pipeline cleanup: Inlined all src/pipeline/ wrappers into commands, then deleted the directory. Each command now reads config once, scans once, passes data forward — no more redundant reads.
  • KeyValue table output: All 7 commands redesigned to use consistent two-column key-value tables (tabled with Style::modern(), field names on top border via LineText, 1/3 + 2/3 column widths).
  • Bug fixes: Chunk line numbers now offset by frontmatter length (TODO-0142), NullNotAllowed dedup fix, field sorting in mdvs.toml.
  • Book updated: All 38 static output examples across 9 book pages updated to match the new format.

35 commits, 87 files changed. 315 tests passing, clippy clean.

Test plan

  • cargo test — 315 tests passing
  • cargo clippy — clean
  • mdbook build book/ — builds without errors
  • mdvs init example_kb --forcemdvs build --forcemdvs search "experiment" — full pipeline works
  • mdvs check example_kb — no violations
  • Verified chunk text no longer includes frontmatter (TODO-0142 fix)
  • Compact vs verbose output verified for all 7 commands
  • JSON output (-o json) unchanged in structure

edochi and others added 30 commits March 19, 2026 15:01
TODO-0119 defines the unified Step<O> tree architecture that replaces
the current three-layer output model (ProcessOutput + Result +
CommandOutput). Reviewed through 6 rounds of design analysis.

TODOs 0120-0131 break the implementation into phased work:
infrastructure types, command conversions, and cleanup.
TODOs 0132-0133 track deferred macro automation.

Supersedes TODO-0110 (recursive output architecture).
TODOs 0122, 0125-0131 now have checklists tracking incremental
progress. Outcome variants are added per-command conversion phase.
main.rs dispatch updated per-command. Pipeline deletion tracked
per-module.
New modules for the unified Step<O> tree architecture (TODO-0119):
- step.rs: generic Step<O>, StepOutcome<O>, StepError, has_failed/has_violations,
  custom Serialize, Render impls, migration helpers (from_pipeline_result)
- block.rs: Block enum (Line/Table/Section), TableStyle, Render trait
- render.rs: format_text and format_markdown shared formatters
- outcome/: Outcome + CompactOutcome enums with all leaf and command
  outcome structs, compact counterparts, From impls, and Render impls

Implements TODOs 0120-0124 and outcome structs from TODO-0122.
Rewrite all commands to return Step<Outcome> instead of *CommandOutput:
- clean, info, check, init, update, build, search
- main.rs dispatch uses Step rendering + to_compact()
- Delete all *ProcessOutput, *CommandOutput, *Result structs
- Delete CommandOutput trait and format_json_compact from output.rs
- Add BuildFileDetail, FieldViolationCompact, NewFieldCompact,
  DiscoveredFieldCompact, ChangedFieldCompact, RemovedFieldCompact
  to output.rs
- Add Clone derives on FieldViolation, ViolatingFile, NewField

Net -913 lines across commands. Implements TODOs 0125-0130.
…eDetail

- Delete src/pipeline/delete_index.rs (clean now calls core functions directly)
- Move BuildFileDetail from pipeline/write_index.rs to output.rs
- Update imports in pipeline/classify.rs, pipeline/embed.rs, outcome/commands/build.rs

First pipeline module deleted. Part of TODO-0131 wave 1-2.
Mark TODOs 0120-0130 as done. Update TODO-0122 incremental checklists.
Restructure TODO-0131 into 7 waves for full pipeline cleanup.
…elpers

All 7 commands now call core functions directly (MdvsToml::read,
ScannedFiles::scan, check::validate, Backend methods, Embedder::load,
etc.) instead of going through pipeline wrappers.

- Waves 2-6 of TODO-0131: info, init, update, check, build, search
  all pipeline-free
- classify_files() and embed_file() logic moved to build.rs
- validate_where_clause() and IndexData moved to search.rs
- Migration helpers removed from step.rs (from_pipeline_result,
  convert_error_kind, has_failed_step compat method)
- Skipped padding removed from all error paths (TODO-0135)
- Auto-update output attached as substep in check.rs

Co-Authored-By: Claude <noreply@anthropic.com>
All pipeline wrapper modules removed — no command imports from
pipeline anymore. Deletes 12 files (1,318 lines): mod.rs, classify.rs,
embed.rs, execute_search.rs, infer.rs, load_model.rs, read_config.rs,
read_index.rs, scan.rs, validate.rs, write_config.rs, write_index.rs.

Co-Authored-By: Claude <noreply@anthropic.com>
TODO-0134: Step tree post-migration cleanup — unused imports,
missing unit tests for moved private functions.

TODO-0135: Skipped padding removal design notes — documents the
architectural decision and remaining variable-length substep concerns.

Co-Authored-By: Claude <noreply@anthropic.com>
Commands no longer call other commands' run() as black boxes.
Instead, they share data (config, scanned files, embedder) directly.

- check: inlines update logic (infer + write config) after scan,
  reuses scanned files for validate. Now sync (no async needed).
- build: extracts build_core() shared function, inlines auto-update
  after scan. mutate_config made pub(crate).
- search: calls build_core() directly instead of build::run(),
  reuses embedder from build for query embedding.

Before: check = 3 config reads + 2 scans; search = 5+ reads + 3+ scans + 2 model loads.
After: each command reads config once, scans once, loads model once.

Co-Authored-By: Claude <noreply@anthropic.com>
TODO-0136 tracks the three waves of inlining subcommand logic to
eliminate redundant config reads and directory scans.

Justfile provides a convenience wrapper for running mdvs against
example_kb via the release binary.

Co-Authored-By: Claude <noreply@anthropic.com>
…lable fields

check_required_fields() emitted NullNotAllowed for null values on
required non-nullable fields, but check_field_values() already catches
this case. Result: same violation appeared twice in output.

Fix: check_required_fields() now only emits MissingRequired (field
absent entirely). Null-on-non-nullable is handled exclusively by
check_field_values().

Adds test: null_on_required_non_nullable_produces_single_violation.

Co-Authored-By: Claude <noreply@anthropic.com>
Simplify the recursive Step<O> tree into a flat steps + result
structure. Deletes CompactOutcome (~1,000 lines), fixes double
outcome nesting in JSON, makes compact/verbose a rendering choice
rather than a type choice.

Co-Authored-By: Claude <noreply@anthropic.com>
Replace recursive Step<O> { substeps, outcome: StepOutcome<O> } with
flat CommandResult { steps: Vec<StepEntry>, result, elapsed_ms }.

- Delete CompactOutcome enum + all 20 compact structs + From impls +
  compact Render impls (~1,300 lines removed)
- Delete compact output types from output.rs
- New types: CommandResult, StepEntry, ProcessStep, FailedStep
- Compact = command outcome only; verbose = steps + command outcome
- Compact JSON is bare result struct; verbose JSON has steps array
- No double "outcome" nesting in JSON
- Errors force verbose output

Co-Authored-By: Claude <noreply@anthropic.com>
- 0131: pipeline cleanup complete (all 7 waves)
- 0132: subsumed by 0137 (CompactOutcome deleted, no macro needed)
- 0135: subsumed by 0137 (flat structure has no padding concept)
- 0136: inline auto-update/auto-build complete (all 3 waves)
- 0137: flatten Step tree complete (both waves)

Co-Authored-By: Claude <noreply@anthropic.com>
Add #[serde(untagged)] so Outcome variants serialize their inner
struct directly without the variant name wrapper. JSON output
changes from { "Check": { ... } } to { "files_checked": 43, ... }.

Co-Authored-By: Claude <noreply@anthropic.com>
- Sort [[fields.field]] alphabetically by name in MdvsToml::write()
- Add 19 unit tests for private functions: validate_where_clause,
  read_lines, has_violations, type_matches, matches_any_glob
- Remove unused test imports in update.rs
- Unify per-command fail helpers into shared CommandResult::failed()
  and CommandResult::failed_from_steps() constructors
- Delete 7 per-command helpers (fail_from_last, fail_early,
  fail_from_last_substep, fail_msg)

Co-Authored-By: Claude <noreply@anthropic.com>
… update 0100

Step tree architecture complete:
- 0119 (umbrella), 0122 (outcome enums): done
- 0133 (macro): subsumed by 0139 (shared fail helpers)
- 0134 (post-migration cleanup): done
- 0138 (untagged JSON): done
- 0139 (unify fail helpers): done
- 0100 (text output redesign): updated with current state

Co-Authored-By: Claude <noreply@anthropic.com>
Replace ColumnSpan with Panel::horizontal for Record-style detail
rows. Panel inserts spanning rows without inflating column width
calculation. Apply fixed 40/30/30 column proportions via per-column
Modify + Width::wrap/increase.

Fixes inconsistent column widths across per-field tables.

Co-Authored-By: Claude <noreply@anthropic.com>
Extract has_non_default_constraints() and field_detail_lines() helpers.
Fields with only allowed: ** and no required/nullable skip the detail
row entirely. Remove column headers (Field/Type/Files) — column
meaning is obvious from context, consistent with info and other
commands.

Co-Authored-By: Claude <noreply@anthropic.com>
Add TableStyle::KeyValue { title } — two-column key-value table with
item name on top border (LineText), horizontal separators between all
rows (Style::modern), fixed 50/50 column widths (per-column Modify).

Redesign init Render impl: each field rendered as a KeyValue table
with explicit properties (type, files, nullable, required, allowed).
No abbreviations, all values shown including defaults.

Co-Authored-By: Claude <noreply@anthropic.com>
Update init, check, and search output examples to match the new
KeyValue table design. Add scripts/test_kv_table.rs prototype.

Co-Authored-By: Claude <noreply@anthropic.com>
- Info: KeyValue tables for index metadata + per-field details,
  with "Index:" and "N fields:" section labels
- KeyValue: 1/3 + 2/3 column widths, blank line between tables,
  skip LineText when title is empty
- Init: blank line before first field table

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Added fields use same format as init (type, files, nullable,
required, allowed). Changed fields show each aspect as "old → new".
Removed fields show previously allowed globs. Section labels
"Added:", "Changed:", "Removed:" before each group.

Co-Authored-By: Claude <noreply@anthropic.com>
Violations as KeyValue tables with human-readable kind names
(Missing required, Wrong type, Not allowed, Null value not allowed).
New fields as KeyValue tables with status and file count. Section
labels "Violations (N):" and "New fields (N):" before each group.

Co-Authored-By: Claude <noreply@anthropic.com>
Info: add Config section with scan_glob and ignored_fields, shown
even when no index exists.

Update: add unchanged count to summary line ("37 unchanged").

Ensures text output renders the same data as compact JSON.

Co-Authored-By: Claude <noreply@anthropic.com>
All 12 JSON fields rendered as key-value rows in a single table.
Lists (embedded_files, removed_files, new_fields) use multi-line
cells. Empty lists show "(none)".

Co-Authored-By: Claude <noreply@anthropic.com>
Top-level table for query/model/limit. Per-hit KeyValue tables
with #N title, showing file, score, lines, text. All JSON fields
represented.

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
edochi and others added 5 commits March 29, 2026 15:47
All 4 JSON fields (removed, path, files_removed, size) rendered as
key-value rows. Update related test assertions for new block count.

Co-Authored-By: Claude <noreply@anthropic.com>
Chunk start_line/end_line reference the full file including
frontmatter. When read back, YAML frontmatter appears in chunk
text. Line numbers should be offset by frontmatter line count.

Co-Authored-By: Claude <noreply@anthropic.com>
Add body_line_offset to ScannedFile, computed during scan as the
difference between raw file lines and body lines. Apply offset in
embed_file() so chunk start_line/end_line reference the full file,
not the body-only text.

Fixes chunk text in search results including YAML frontmatter
when a chunk starts at the beginning of the document body.

Existing indexes need --force rebuild to pick up the fix.

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
@edochi edochi merged commit 612e33b into main Mar 29, 2026
6 checks passed
@edochi edochi deleted the refactor/step-tree branch March 29, 2026 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant