From 795ceeed62fd346d172242a100e90dec0e0daedc Mon Sep 17 00:00:00 2001 From: David Sillman Date: Mon, 16 Mar 2026 12:03:08 -0400 Subject: [PATCH 1/5] fix: update copilot instructions --- .github/copilot-instructions.md | 51 +++++++++++++++++++-------------- 1 file changed, 30 insertions(+), 21 deletions(-) diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 85635c7..a6e15f7 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -2,7 +2,7 @@ ## Project Overview -**yaml-reference** is a Python library that extends `ruamel.yaml` with cross-file YAML composition using custom tags (`!reference`, `!reference-all`, `!flatten`, `!merge`). It's built to be a reference implementation of the [yaml-reference-specs](https://github.com/dsillman2000/yaml-reference-specs) specification. +**yaml-reference** is a Python library that extends `ruamel.yaml` with cross-file YAML composition using custom tags (`!reference`, `!reference-all`, `!flatten`, `!merge`, `!ignore`). It's built to be a reference implementation of the [yaml-reference-specs](https://github.com/dsillman2000/yaml-reference-specs) specification. ## Build, Test, and Lint @@ -62,19 +62,21 @@ uv build The library is structured in two key parts: ### Core Module (`yaml_reference/__init__.py`) -- **Reference & ReferenceAll classes**: Represent the `!reference` and `!reference-all` YAML tags as Python objects -- **parse_yaml_with_references()**: Parses YAML, returning Reference/ReferenceAll objects without resolving them (one layer only) -- **load_yaml_with_references()**: Fully recursively resolves all references, returning a complete Python dict -- **Flatten & Merge classes**: Represent `!flatten` and `!merge` tag logic -- **YAML loader setup**: Registers custom constructors with `ruamel.yaml.YAML` for each tag +- **Reference & ReferenceAll classes**: Represent the `!reference` and `!reference-all` YAML tags as Python objects, supporting both mapping form and scalar shorthand (`!reference path/to/file.yml`, `!reference-all glob/*.yml`) +- **Ignore, Flatten, and Merge classes**: Represent `!ignore`, `!flatten`, and `!merge` tag logic +- **parse_yaml_with_references()**: Parses YAML and preserves composition tags as Python objects without resolving cross-file references +- **load_yaml_with_references()**: Fully resolves references, then prunes ignored content, flattens sequences, and merges mappings to produce the final Python data structure +- **Helper transforms**: `prune_ignores()`, `flatten_sequences()`, and `merge_mappings()` implement the post-resolution evaluation pipeline +- **YAML loader setup**: Registers custom constructors with `ruamel.yaml.YAML` for each supported tag ### CLI Module (`yaml_reference/cli.py`) -- Simple entry point that calls the core loading functions +- Simple entry point that calls the core loading functions for YAML containing any supported composition tags - Outputs JSON to stdout (compatible with spec tests) - Takes optional `--allow` flag for path restrictions ### Test Structure (`tests/unit/`) - `test_reference.py`: Tests for `!reference` and `!reference-all` tag resolution +- `test_ignore.py`: Tests for `!ignore` parsing and pruning behavior - `test_flatten.py`: Tests for `!flatten` tag behavior - `test_merge.py`: Tests for `!merge` tag behavior - `conftest.py`: Pytest fixtures and test utilities @@ -83,33 +85,40 @@ The library is structured in two key parts: ### Security-First Path Handling 1. **Relative paths only**: All references must use relative paths (e.g., `path: "config/db.yaml"`). Absolute paths raise `ValueError`. -2. **Path restriction by default**: References can only access files in the same directory or subdirectories (no `..` to escape). Use `allow_paths` parameter to explicitly allow other directory trees. +2. **Path restriction by default**: The referencing file's parent directory is always allowed. Use `allow_paths` to explicitly allow additional directory trees. 3. **Security invariant**: Disallowed files are **never opened or read into memory**. Path filtering happens before file I/O. -4. **Silent omission (for `!reference-all`)**: When a glob pattern matches files outside allowed paths, those files are silently dropped from results and the function returns `rc=0` (not an error). +4. **Silent omission (for `!reference-all`)**: When a glob pattern matches files outside allowed paths, those files are silently dropped from results. Empty or fully filtered globs resolve to `[]` rather than an error. ### YAML Tag Implementation Pattern Each custom tag follows this pattern: 1. Define a class with `yaml_tag` attribute -2. Implement `@classmethod from_yaml(cls, constructor, node)` to parse from YAML +2. Implement `@classmethod from_yaml(cls, constructor, node)` to parse from YAML, handling scalar, mapping, or sequence nodes as needed 3. Register constructor with the YAML loader in `__init__.py` 4. The class instance persists through `parse_yaml_with_references()`, allowing layer-by-layer resolution +### Reference Tag Forms +1. **Scalar shorthand is supported**: `!reference path/to/file.yml` and `!reference-all glob/*.yml` are valid when only `path` or `glob` is needed. +2. **Mapping form is still required for optional fields**: Use mappings such as `{ path: "file.yml", anchor: "section" }` when specifying `anchor`. + ### Reference Resolution Order -1. **Circular reference detection** occurs during recursive resolution by tracking a "resolution stack" +1. **Circular reference detection** occurs during recursive resolution by tracking visited file paths 2. **Anchors** (optional parameter): If specified, extract only the anchored section from the referenced file -3. **Recursive expansion**: `load_yaml_with_references()` recursively expands all tags, applying `!flatten` and `!merge` logic as it encounters them +3. **Recursive expansion**: `load_yaml_with_references()` recursively resolves `!reference` and `!reference-all` first +4. **Ignore pruning**: `!ignore` content is removed after full reference resolution so ignored values from referenced files can remove their parent keys or list items +5. **Post-processing**: `!flatten` is evaluated after ignore pruning, and `!merge` is evaluated last ### Error Handling -- **ValueError** for spec violations: absolute paths, circular references, invalid anchors +- **ValueError** for spec violations: absolute paths, circular references, invalid anchors, malformed merge contents - **FileNotFoundError** for missing referenced files -- **Glob errors**: Return empty list `[]` if glob matches no files (silent omission) +- **PermissionError** for disallowed `!reference` targets +- **Glob behavior**: `!reference-all` returns `[]` when a glob matches no files or when all matches are filtered out by path restrictions ### Spec Compliance Testing The project tests against `yaml-reference-specs`, a Go-based reference implementation. The spec tests verify: -- Correct expansion of all four tags +- Correct expansion of all supported tags - Proper error detection (bad paths, missing files, circular refs) - Path restriction enforcement -- Edge cases like empty globs and nested composition +- Edge cases like empty globs, ignored content, shorthand reference syntax, and nested composition Run with: `make spec-test` or `scripts/spec-test.sh` @@ -127,15 +136,15 @@ Install hooks with: `pre-commit install` ### Adding a new tag type 1. Create a class in `yaml_reference/__init__.py` with `yaml_tag` attribute and `from_yaml()` classmethod 2. Register the constructor after the class definition -3. Add resolution logic (handle in recursive expansion) +3. Add resolution or post-processing logic in the appropriate stage (`_recursively_resolve_references()`, `prune_ignores()`, `flatten_sequences()`, or `merge_mappings()`) 4. Write tests in `tests/unit/test_*.py` following existing patterns 5. Update README.md with usage example ### Debugging a reference resolution issue -1. Use `parse_yaml_with_references()` to see raw Reference objects before resolution -2. Add print statements or use a debugger to trace the `_resolve_references()` recursive calls -3. Check the resolution stack to verify circular reference detection is working -4. Run a specific test with `-v` flag to see detailed assertion output +1. Use `parse_yaml_with_references()` to inspect raw `Reference`, `ReferenceAll`, `Ignore`, `Flatten`, and `Merge` objects before evaluation +2. Trace `_recursively_resolve_references()` to debug cross-file expansion and circular reference handling +3. Check the post-processing stages in order: `prune_ignores()`, then `flatten_sequences()`, then `merge_mappings()` +4. Run the most specific unit test with `-v` flag to see detailed assertion output ### Updating error messages Ensure error messages follow this pattern: include the problematic value, the path of the file where the error occurred, and the specific constraint violated. This helps spec tests verify proper error handling. From 6783608f585c030a05f370edee1ccda8d7747d58 Mon Sep 17 00:00:00 2001 From: David Sillman Date: Mon, 16 Mar 2026 12:56:08 -0400 Subject: [PATCH 2/5] feat: Enhance multi-document handling with new tests and error handling. Copilot largely authored it. --- README.md | 9 ++ tests/unit/test_flatten.py | 15 +++ tests/unit/test_merge.py | 20 ++++ tests/unit/test_multidocument.py | 34 ++++++ tests/unit/test_reference.py | 74 ++++++++++++ yaml_reference/__init__.py | 192 ++++++++++++++++++++++++++----- yaml_reference/cli.py | 6 + 7 files changed, 323 insertions(+), 27 deletions(-) create mode 100644 tests/unit/test_multidocument.py diff --git a/README.md b/README.md index e1b6866..7793b59 100644 --- a/README.md +++ b/README.md @@ -97,6 +97,15 @@ networks: !reference-all { glob: "networks/*.yaml" } Use the mapping form when you need optional arguments such as `anchor`; use the scalar shorthand when you only need `path` or `glob`. +### Multi-Document YAML + +yaml-reference distinguishes between a single YAML document whose root value is a sequence and a YAML file that contains multiple documents separated by `---`. + +- `!reference` requires the target file to contain exactly one YAML document. If the referenced file contains multiple documents, loading fails with a `ValueError`. +- `!reference-all` expands matched files document-by-document. A single-document file contributes one list element, while a multi-document file contributes one element per document in document order. +- When `anchor` is used with `!reference-all`, the anchored value is extracted from every document in each matched file, preserving file order and then document order. +- If the root input file contains multiple documents, `load_yaml_with_references()` returns a Python list with one resolved output element per document. Root documents tagged with `!ignore` are omitted entirely. + ### The `!ignore` Tag The `!ignore` tag marks YAML content that should be parsed but omitted from the final resolved output. The most common use case is a hidden section of reusable anchors that should remain available for aliases elsewhere in the document without being emitted in the resolved result. diff --git a/tests/unit/test_flatten.py b/tests/unit/test_flatten.py index 5bbebcb..31f5407 100644 --- a/tests/unit/test_flatten.py +++ b/tests/unit/test_flatten.py @@ -130,6 +130,21 @@ def test_flatten_combined_with_reference_all(stage_files): assert data["data"] == [1, 2, 3, 4, 5, 6] +def test_flatten_combined_with_multi_document_reference_all(stage_files): + files = { + "main.yml": """ +data: !flatten + - !reference-all { glob: ./entries.yml } +""", + "entries.yml": "---\n- [1, 2]\n---\n- [3, 4]\n", + } + stg = stage_files(files) + + data = load_yaml_with_references(stg / "main.yml") + + assert data["data"] == [1, 2, 3, 4] + + def test_parse_flatten_tag(stage_files): """Test that !flatten tags are parsed correctly without resolution.""" files = { diff --git a/tests/unit/test_merge.py b/tests/unit/test_merge.py index fe6c082..df90f89 100644 --- a/tests/unit/test_merge.py +++ b/tests/unit/test_merge.py @@ -169,3 +169,23 @@ def test_flatten_and_merge(stage_files): stg = stage_files(files) data = load_yaml_with_references(stg / "test.yml") assert data["result"] == [{"a": 2}, {"b": 2, "c": 3}] + + +def test_merge_combined_with_multi_document_reference_all(stage_files): + files = { + "test.yml": """ +result: !merge + - {base: true, version: 1} + - !reference-all { glob: ./patches.yml } +""", + "patches.yml": "---\nversion: 2\n---\nfeature: enabled\n", + } + stg = stage_files(files) + + data = load_yaml_with_references(stg / "test.yml") + + assert data["result"] == { + "base": True, + "version": 2, + "feature": "enabled", + } diff --git a/tests/unit/test_multidocument.py b/tests/unit/test_multidocument.py new file mode 100644 index 0000000..867dcfc --- /dev/null +++ b/tests/unit/test_multidocument.py @@ -0,0 +1,34 @@ +from yaml_reference import load_yaml_with_references + + +def test_multi_document_root_file_loads_as_array(stage_files): + files = { + "root.yml": """ +--- +service: !reference { path: ./service.yml } +--- +ignored_only: !ignore true +--- !ignore +drop_me: true +--- +items: !flatten + - !reference-all { glob: ./entries.yml } +--- +config: !merge + - {a: 1} + - !reference-all { glob: ./patches.yml } +""", + "service.yml": "name: api\n", + "entries.yml": "---\n- [1, 2]\n---\n- [3, 4]\n", + "patches.yml": "---\na: 2\n---\nb: 3\n", + } + stg = stage_files(files) + + data = load_yaml_with_references(stg / "root.yml") + + assert data == [ + {"service": {"name": "api"}}, + {}, + {"items": [1, 2, 3, 4]}, + {"config": {"a": 2, "b": 3}}, + ] diff --git a/tests/unit/test_reference.py b/tests/unit/test_reference.py index 6e5c3bd..53c86c8 100644 --- a/tests/unit/test_reference.py +++ b/tests/unit/test_reference.py @@ -57,6 +57,20 @@ def test_reference_load_shorthand(stage_files): assert data["contents"]["inner"] == "inner_value" +def test_reference_rejects_multi_document_target(stage_files): + files = { + "test.yml": "contents: !reference { path: ./multi.yml }", + "multi.yml": "---\nvalue: 1\n---\nvalue: 2\n", + } + stg = stage_files(files) + + with pytest.raises( + ValueError, + match="contains multiple YAML documents and cannot be used with !reference", + ): + load_yaml_with_references(stg / "test.yml") + + def test_reference_all_load(stage_files): files = { "test.yml": "hello: world\ncontents: !reference-all { glob: ./chapters/*.yml }", @@ -103,6 +117,66 @@ def test_reference_all_load_shorthand(stage_files): assert {"chapter_value": 3} in data["contents"] +def test_reference_all_expands_multi_document_file(stage_files): + files = { + "test.yml": "contents: !reference-all { glob: ./multi.yml }", + "multi.yml": "---\nvalue: 1\n---\nvalue: 2\n", + } + stg = stage_files(files) + + data = load_yaml_with_references(stg / "test.yml") + + assert data["contents"] == [{"value": 1}, {"value": 2}] + + +def test_reference_all_mixed_single_and_multi_document_order(stage_files): + files = { + "test.yml": "contents: !reference-all { glob: ./parts/*.yml }", + "parts/a.yml": "value: a\n", + "parts/b.yml": "---\nvalue: b1\n---\nvalue: b2\n", + "parts/c.yml": "value: c\n", + } + stg = stage_files(files) + + data = load_yaml_with_references(stg / "test.yml") + + assert data["contents"] == [ + {"value": "a"}, + {"value": "b1"}, + {"value": "b2"}, + {"value": "c"}, + ] + + +def test_reference_all_skips_ignored_root_documents_in_multi_document_file(stage_files): + files = { + "test.yml": "contents: !reference-all { glob: ./multi.yml }", + "multi.yml": "--- !ignore\nignored: true\n---\nvalue: kept\n", + } + stg = stage_files(files) + + data = load_yaml_with_references(stg / "test.yml") + + assert data["contents"] == [{"value": "kept"}] + + +def test_reference_all_anchor_extracts_from_every_document(stage_files): + files = { + "test.yml": "contents: !reference-all { glob: ./parts/*.yml, anchor: item }", + "parts/a.yml": "---\nroot: &item {value: 1}\n---\nroot: &item {value: 2}\n", + "parts/b.yml": "root: &item {value: 3}\n", + } + stg = stage_files(files) + + data = load_yaml_with_references(stg / "test.yml") + + assert data["contents"] == [ + {"value": 1}, + {"value": 2}, + {"value": 3}, + ] + + def test_parse_references(stage_files): files = { "test.yml": "inner: !reference { path: next/open.yml }\n", diff --git a/yaml_reference/__init__.py b/yaml_reference/__init__.py index 16945f9..7257f94 100644 --- a/yaml_reference/__init__.py +++ b/yaml_reference/__init__.py @@ -1,6 +1,7 @@ import io import os from collections import defaultdict +from dataclasses import dataclass from pathlib import Path from typing import IO, Any, Optional, Sequence, Union @@ -252,9 +253,32 @@ def from_yaml(cls, constructor, node): return cls(seq) +@dataclass +class MultiDocument: + documents: list[Any] + is_multi_document: bool + + def __repr__(self): + return ( + "MultiDocument(" + f"documents={self.documents!r}, is_multi_document={self.is_multi_document!r}" + ")" + ) + + PathLike = Union[str, Path, os.PathLike] +def _build_yaml_loader() -> YAML: + yaml = YAML(typ="safe") + yaml.register_class(Reference) + yaml.register_class(ReferenceAll) + yaml.register_class(Flatten) + yaml.register_class(Merge) + yaml.register_class(Ignore) + return yaml + + def _check_file_path(path: PathLike, allow_paths: Sequence[PathLike]) -> Path: if not isinstance(path, Path): path = Path(path) @@ -273,11 +297,35 @@ def _check_file_path(path: PathLike, allow_paths: Sequence[PathLike]) -> Path: raise PermissionError(f"File '{path}' is not allowed.") -def _extract_anchor_from_parser_events(yaml: YAML, stream: IO, anchor: str) -> Any: +def _collect_document_event_streams(yaml: YAML, stream: IO) -> list[list[events.Event]]: + document_streams = [] + current_document = None + for event in yaml.parse(stream): + if isinstance(event, events.DocumentStartEvent): + current_document = [events.StreamStartEvent(), event] + elif isinstance(event, events.DocumentEndEvent): + if current_document is None: + current_document = [ + events.StreamStartEvent(), + events.DocumentStartEvent(), + ] + current_document.append(event) + current_document.append(events.StreamEndEvent()) + document_streams.append(current_document) + current_document = None + elif not isinstance(event, (events.StreamStartEvent, events.StreamEndEvent)): + if current_document is not None: + current_document.append(event) + return document_streams + + +def _extract_anchor_from_parser_events( + yaml: YAML, parsed_events: Sequence[events.Event], anchor: str +) -> Any: anchor_lookup = dict() level_lookup = defaultdict(int) _nonzero_keys = lambda dd: [key for key, value in dd.items() if value > 0] # noqa: E731 - for event in yaml.parse(stream): + for event in parsed_events: if ( hasattr(event, "anchor") and event.anchor is not None @@ -360,10 +408,47 @@ def _resolve_aliases(my_events: list[events.Event]) -> list[events.Event]: ) raise ValueError(msg) strio.seek(0) - document = yaml.load(strio) + document = _build_yaml_loader().load(strio) return document +def _parse_yaml_documents( + file_path: PathLike, + anchor: Optional[str] = None, + allow_paths: Sequence[PathLike] = [], +) -> MultiDocument: + if not allow_paths: + allow_paths = [Path(file_path).parent.absolute()] + path: Path = _check_file_path(file_path, allow_paths=allow_paths) + + if anchor is None: + yaml = _build_yaml_loader() + with path.open("r") as f: + parsed_documents = list(yaml.load_all(f)) + else: + yaml = _build_yaml_loader() + with path.open("r") as f: + document_streams = _collect_document_event_streams(yaml, f) + if not document_streams: + raise ValueError(f"Anchor '{anchor}' not found in the YAML document.") + parsed_documents = [ + _extract_anchor_from_parser_events(yaml, document_stream, anchor) + for document_stream in document_streams + ] + + if not parsed_documents: + parsed_documents = [None] + + parsed_documents = [ + _recursively_attribute_location_to_references(document, path) + for document in parsed_documents + ] + return MultiDocument( + documents=parsed_documents, + is_multi_document=len(parsed_documents) > 1, + ) + + def parse_yaml_with_references( file_path: PathLike, anchor: Optional[str] = None, @@ -386,29 +471,25 @@ def parse_yaml_with_references( ValueError: If the file is not a valid YAML file. """ - if not allow_paths: - allow_paths = [Path(file_path).parent.absolute()] - path: Path = _check_file_path(file_path, allow_paths=allow_paths) - - yaml = YAML(typ="safe") - yaml.register_class(Reference) - yaml.register_class(ReferenceAll) - yaml.register_class(Flatten) - yaml.register_class(Merge) - yaml.register_class(Ignore) - - if not anchor: - with path.open("r") as f: - parsed = yaml.load(f) - else: - with path.open("r") as f: - parsed = _extract_anchor_from_parser_events(yaml, f, anchor) - - parsed = _recursively_attribute_location_to_references(parsed, path) + parsed = _parse_yaml_documents( + file_path, + anchor=anchor, + allow_paths=allow_paths, + ) + if not parsed.is_multi_document and len(parsed.documents) == 1: + return parsed.documents[0] return parsed def _recursively_attribute_location_to_references(data: Any, base_path: Path): + if isinstance(data, MultiDocument): + return MultiDocument( + documents=[ + _recursively_attribute_location_to_references(item, base_path) + for item in data.documents + ], + is_multi_document=data.is_multi_document, + ) if isinstance(data, Flatten): return Flatten( sequence=[ @@ -514,6 +595,17 @@ def _recursively_resolve_references( if visited_paths is None: visited_paths = set() + if isinstance(data, MultiDocument): + return MultiDocument( + documents=[ + _recursively_resolve_references( + item, allow_paths=allow_paths, visited_paths=visited_paths + ) + for item in data.documents + ], + is_multi_document=data.is_multi_document, + ) + if isinstance(data, Flatten): return Flatten( sequence=[ @@ -547,11 +639,18 @@ def _recursively_resolve_references( # Check for circular reference and track path _check_and_track_path(abs_path, visited_paths) - parsed = parse_yaml_with_references( + parsed = _parse_yaml_documents( abs_path, anchor=data.anchor, allow_paths=allow_paths ) + + if len(parsed.documents) != 1: + visited_paths.remove(abs_path) + raise ValueError( + f"Referenced file '{abs_path}' contains multiple YAML documents and cannot be used with !reference." + ) + resolved = _recursively_resolve_references( - parsed, allow_paths=allow_paths, visited_paths=visited_paths + parsed.documents[0], allow_paths=allow_paths, visited_paths=visited_paths ) # Remove current path from visited set after processing @@ -587,13 +686,16 @@ def _recursively_resolve_references( # Check for circular reference and track path _check_and_track_path(path, visited_paths) - parsed = parse_yaml_with_references( + parsed = _parse_yaml_documents( path, anchor=data.anchor, allow_paths=allow_paths ) resolved = _recursively_resolve_references( parsed, allow_paths=allow_paths, visited_paths=visited_paths ) - resolved_items.append(resolved) + if isinstance(resolved, MultiDocument): + resolved_items.extend(resolved.documents) + else: + resolved_items.append(resolved) # Remove current path from visited set after processing visited_paths.remove(path) @@ -623,6 +725,11 @@ def flatten_sequences(data: Any) -> Any: Given an object which may contain Flatten(...) objects which was parsed from a YAML document containing !flatten tags, return the object without any Flatten(...) objects, but having flattened all sequences marked with them. """ + if isinstance(data, MultiDocument): + return MultiDocument( + documents=[flatten_sequences(item) for item in data.documents], + is_multi_document=data.is_multi_document, + ) if isinstance(data, Flatten): return data.flattened() if isinstance(data, Merge): @@ -641,6 +748,11 @@ def merge_mappings(data: Any) -> Any: Given an object which may contain Merge(...) objects which was parsed from a YAML document containing !merge tags, return the object without any Merge(...) objects, but having merged all mappings marked with them. """ + if isinstance(data, MultiDocument): + return MultiDocument( + documents=[merge_mappings(item) for item in data.documents], + is_multi_document=data.is_multi_document, + ) if isinstance(data, Merge): return merge_mappings(data.merged()) if isinstance(data, list): @@ -658,6 +770,23 @@ def prune_ignores(data: Any) -> Any: removed from the list. If an Ignore(...) object is found as a value in a dict, the key-value pair is removed from the dict. If an Ignore(...) object is found as a value which is not in a list or dict, it is replaced with None. """ + if isinstance(data, MultiDocument): + if not data.is_multi_document: + if not data.documents: + return MultiDocument(documents=[None], is_multi_document=False) + return MultiDocument( + documents=[prune_ignores(data.documents[0])], + is_multi_document=False, + ) + + pruned_documents = [] + for item in data.documents: + if isinstance(item, Ignore): + continue + pruned_item = prune_ignores(item) + if pruned_item is not None: + pruned_documents.append(pruned_item) + return MultiDocument(documents=pruned_documents, is_multi_document=True) if isinstance(data, Ignore): return None if isinstance(data, Flatten): @@ -715,7 +844,7 @@ def load_yaml_with_references( allow_paths = [] allow_paths += [Path(file_path).parent.absolute()] path = _check_file_path(file_path, allow_paths=allow_paths) - parsed = parse_yaml_with_references(path, allow_paths=allow_paths) + parsed = _parse_yaml_documents(path, allow_paths=allow_paths) # Initialize visited paths with the root file to detect self-references visited_paths = {path.resolve()} @@ -732,6 +861,14 @@ def load_yaml_with_references( pruned = prune_ignores(resolved) flattened = flatten_sequences(pruned) merged = merge_mappings(flattened) + if isinstance(merged, MultiDocument): + if merged.is_multi_document: + return merged.documents + if not merged.documents: + return None + if len(merged.documents) == 1: + return merged.documents[0] + return None return merged @@ -742,6 +879,7 @@ def load_yaml_with_references( "Flatten", "merge_mappings", "Merge", + "MultiDocument", "prune_ignores", "Ignore", ] diff --git a/yaml_reference/cli.py b/yaml_reference/cli.py index 1ec01a9..a10d71e 100644 --- a/yaml_reference/cli.py +++ b/yaml_reference/cli.py @@ -33,6 +33,12 @@ def compile_main(input_file: str, allow_paths: list[str] = []): file=sys.stderr, ) sys.exit(1) + except (FileNotFoundError, ValueError) as err: + print( + f'Error: Failed to compile "{input_path}":\n{err}', + file=sys.stderr, + ) + sys.exit(1) json.dump(data, sys.stdout, sort_keys=True, indent=2) From 98653460d1de0f9e954790daf6e344b04d46144c Mon Sep 17 00:00:00 2001 From: David Sillman Date: Mon, 16 Mar 2026 13:09:12 -0400 Subject: [PATCH 3/5] Apply suggestions from code review Ensure unit tests all still continue to pass. Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --- yaml_reference/__init__.py | 10 ++++++---- yaml_reference/cli.py | 3 ++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/yaml_reference/__init__.py b/yaml_reference/__init__.py index 7257f94..26624c8 100644 --- a/yaml_reference/__init__.py +++ b/yaml_reference/__init__.py @@ -415,7 +415,7 @@ def _resolve_aliases(my_events: list[events.Event]) -> list[events.Event]: def _parse_yaml_documents( file_path: PathLike, anchor: Optional[str] = None, - allow_paths: Sequence[PathLike] = [], + allow_paths: Optional[Sequence[PathLike]] = None, ) -> MultiDocument: if not allow_paths: allow_paths = [Path(file_path).parent.absolute()] @@ -452,7 +452,7 @@ def _parse_yaml_documents( def parse_yaml_with_references( file_path: PathLike, anchor: Optional[str] = None, - allow_paths: Sequence[PathLike] = [], + allow_paths: Optional[Sequence[PathLike]] = None, ) -> Any: """ Interface method for reading a YAML file into memory which contains references. References are not resolved in the @@ -781,11 +781,13 @@ def prune_ignores(data: Any) -> Any: pruned_documents = [] for item in data.documents: + # For multi-document streams, only omit documents explicitly tagged !ignore. + # Preserve documents that prune to None (e.g., explicit null/empty documents) + # so that document count and ordering remain stable. if isinstance(item, Ignore): continue pruned_item = prune_ignores(item) - if pruned_item is not None: - pruned_documents.append(pruned_item) + pruned_documents.append(pruned_item) return MultiDocument(documents=pruned_documents, is_multi_document=True) if isinstance(data, Ignore): return None diff --git a/yaml_reference/cli.py b/yaml_reference/cli.py index a10d71e..6fda404 100644 --- a/yaml_reference/cli.py +++ b/yaml_reference/cli.py @@ -2,6 +2,7 @@ import sys from pathlib import Path +from ruamel.yaml.error import YAMLError from yaml_reference import load_yaml_with_references @@ -33,7 +34,7 @@ def compile_main(input_file: str, allow_paths: list[str] = []): file=sys.stderr, ) sys.exit(1) - except (FileNotFoundError, ValueError) as err: + except (FileNotFoundError, ValueError, YAMLError) as err: print( f'Error: Failed to compile "{input_path}":\n{err}', file=sys.stderr, From 0d7a1d661598c46a9946c3c63ebb6af7121de485 Mon Sep 17 00:00:00 2001 From: David Sillman Date: Mon, 16 Mar 2026 13:18:51 -0400 Subject: [PATCH 4/5] fix: vscode yaml custom tags setting --- .vscode/settings.json | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/.vscode/settings.json b/.vscode/settings.json index f15c4f2..e083273 100644 --- a/.vscode/settings.json +++ b/.vscode/settings.json @@ -1,8 +1,13 @@ { "yaml.customTags": [ "!reference mapping", + "!reference scalar", "!reference-all mapping", + "!reference-all scalar", "!flatten sequence", - "!merge sequence" + "!merge sequence", + "!ignore scalar", + "!ignore mapping", + "!ignore sequence" ] } From 1e8fe0d827bbff88cf49ffa539c9278c12c2ad41 Mon Sep 17 00:00:00 2001 From: David Sillman Date: Thu, 26 Mar 2026 19:13:05 -0400 Subject: [PATCH 5/5] fix: update readme badge to 0.2.9-0 --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 7793b59..e22163e 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ uv add yaml-reference ``` ## Spec -![Spec Status](https://img.shields.io/badge/spec%20v0.2.8--1-passing-brightgreen?link=https%3A%2F%2Fgithub.com%2Fdsillman2000%2Fyaml-reference-specs%2Ftree%2Fv0.2.8-1) +![Spec Status](https://img.shields.io/badge/spec%20v0.2.9--0-passing-brightgreen?link=https%3A%2F%2Fgithub.com%2Fdsillman2000%2Fyaml-reference-specs%2Ftree%2Fv0.2.9-0) This Python library implements the YAML specification for cross-file references and YAML composition in YAML files using tags `!reference`, `!reference-all`, `!flatten`, `!merge`, and `!ignore` as defined in the [yaml-reference-specs project](https://github.com/dsillman2000/yaml-reference-specs).