diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 85635c7..a6e15f7 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -2,7 +2,7 @@ ## Project Overview -**yaml-reference** is a Python library that extends `ruamel.yaml` with cross-file YAML composition using custom tags (`!reference`, `!reference-all`, `!flatten`, `!merge`). It's built to be a reference implementation of the [yaml-reference-specs](https://github.com/dsillman2000/yaml-reference-specs) specification. +**yaml-reference** is a Python library that extends `ruamel.yaml` with cross-file YAML composition using custom tags (`!reference`, `!reference-all`, `!flatten`, `!merge`, `!ignore`). It's built to be a reference implementation of the [yaml-reference-specs](https://github.com/dsillman2000/yaml-reference-specs) specification. ## Build, Test, and Lint @@ -62,19 +62,21 @@ uv build The library is structured in two key parts: ### Core Module (`yaml_reference/__init__.py`) -- **Reference & ReferenceAll classes**: Represent the `!reference` and `!reference-all` YAML tags as Python objects -- **parse_yaml_with_references()**: Parses YAML, returning Reference/ReferenceAll objects without resolving them (one layer only) -- **load_yaml_with_references()**: Fully recursively resolves all references, returning a complete Python dict -- **Flatten & Merge classes**: Represent `!flatten` and `!merge` tag logic -- **YAML loader setup**: Registers custom constructors with `ruamel.yaml.YAML` for each tag +- **Reference & ReferenceAll classes**: Represent the `!reference` and `!reference-all` YAML tags as Python objects, supporting both mapping form and scalar shorthand (`!reference path/to/file.yml`, `!reference-all glob/*.yml`) +- **Ignore, Flatten, and Merge classes**: Represent `!ignore`, `!flatten`, and `!merge` tag logic +- **parse_yaml_with_references()**: Parses YAML and preserves composition tags as Python objects without resolving cross-file references +- **load_yaml_with_references()**: Fully resolves references, then prunes ignored content, flattens sequences, and merges mappings to produce the final Python data structure +- **Helper transforms**: `prune_ignores()`, `flatten_sequences()`, and `merge_mappings()` implement the post-resolution evaluation pipeline +- **YAML loader setup**: Registers custom constructors with `ruamel.yaml.YAML` for each supported tag ### CLI Module (`yaml_reference/cli.py`) -- Simple entry point that calls the core loading functions +- Simple entry point that calls the core loading functions for YAML containing any supported composition tags - Outputs JSON to stdout (compatible with spec tests) - Takes optional `--allow` flag for path restrictions ### Test Structure (`tests/unit/`) - `test_reference.py`: Tests for `!reference` and `!reference-all` tag resolution +- `test_ignore.py`: Tests for `!ignore` parsing and pruning behavior - `test_flatten.py`: Tests for `!flatten` tag behavior - `test_merge.py`: Tests for `!merge` tag behavior - `conftest.py`: Pytest fixtures and test utilities @@ -83,33 +85,40 @@ The library is structured in two key parts: ### Security-First Path Handling 1. **Relative paths only**: All references must use relative paths (e.g., `path: "config/db.yaml"`). Absolute paths raise `ValueError`. -2. **Path restriction by default**: References can only access files in the same directory or subdirectories (no `..` to escape). Use `allow_paths` parameter to explicitly allow other directory trees. +2. **Path restriction by default**: The referencing file's parent directory is always allowed. Use `allow_paths` to explicitly allow additional directory trees. 3. **Security invariant**: Disallowed files are **never opened or read into memory**. Path filtering happens before file I/O. -4. **Silent omission (for `!reference-all`)**: When a glob pattern matches files outside allowed paths, those files are silently dropped from results and the function returns `rc=0` (not an error). +4. **Silent omission (for `!reference-all`)**: When a glob pattern matches files outside allowed paths, those files are silently dropped from results. Empty or fully filtered globs resolve to `[]` rather than an error. ### YAML Tag Implementation Pattern Each custom tag follows this pattern: 1. Define a class with `yaml_tag` attribute -2. Implement `@classmethod from_yaml(cls, constructor, node)` to parse from YAML +2. Implement `@classmethod from_yaml(cls, constructor, node)` to parse from YAML, handling scalar, mapping, or sequence nodes as needed 3. Register constructor with the YAML loader in `__init__.py` 4. The class instance persists through `parse_yaml_with_references()`, allowing layer-by-layer resolution +### Reference Tag Forms +1. **Scalar shorthand is supported**: `!reference path/to/file.yml` and `!reference-all glob/*.yml` are valid when only `path` or `glob` is needed. +2. **Mapping form is still required for optional fields**: Use mappings such as `{ path: "file.yml", anchor: "section" }` when specifying `anchor`. + ### Reference Resolution Order -1. **Circular reference detection** occurs during recursive resolution by tracking a "resolution stack" +1. **Circular reference detection** occurs during recursive resolution by tracking visited file paths 2. **Anchors** (optional parameter): If specified, extract only the anchored section from the referenced file -3. **Recursive expansion**: `load_yaml_with_references()` recursively expands all tags, applying `!flatten` and `!merge` logic as it encounters them +3. **Recursive expansion**: `load_yaml_with_references()` recursively resolves `!reference` and `!reference-all` first +4. **Ignore pruning**: `!ignore` content is removed after full reference resolution so ignored values from referenced files can remove their parent keys or list items +5. **Post-processing**: `!flatten` is evaluated after ignore pruning, and `!merge` is evaluated last ### Error Handling -- **ValueError** for spec violations: absolute paths, circular references, invalid anchors +- **ValueError** for spec violations: absolute paths, circular references, invalid anchors, malformed merge contents - **FileNotFoundError** for missing referenced files -- **Glob errors**: Return empty list `[]` if glob matches no files (silent omission) +- **PermissionError** for disallowed `!reference` targets +- **Glob behavior**: `!reference-all` returns `[]` when a glob matches no files or when all matches are filtered out by path restrictions ### Spec Compliance Testing The project tests against `yaml-reference-specs`, a Go-based reference implementation. The spec tests verify: -- Correct expansion of all four tags +- Correct expansion of all supported tags - Proper error detection (bad paths, missing files, circular refs) - Path restriction enforcement -- Edge cases like empty globs and nested composition +- Edge cases like empty globs, ignored content, shorthand reference syntax, and nested composition Run with: `make spec-test` or `scripts/spec-test.sh` @@ -127,15 +136,15 @@ Install hooks with: `pre-commit install` ### Adding a new tag type 1. Create a class in `yaml_reference/__init__.py` with `yaml_tag` attribute and `from_yaml()` classmethod 2. Register the constructor after the class definition -3. Add resolution logic (handle in recursive expansion) +3. Add resolution or post-processing logic in the appropriate stage (`_recursively_resolve_references()`, `prune_ignores()`, `flatten_sequences()`, or `merge_mappings()`) 4. Write tests in `tests/unit/test_*.py` following existing patterns 5. Update README.md with usage example ### Debugging a reference resolution issue -1. Use `parse_yaml_with_references()` to see raw Reference objects before resolution -2. Add print statements or use a debugger to trace the `_resolve_references()` recursive calls -3. Check the resolution stack to verify circular reference detection is working -4. Run a specific test with `-v` flag to see detailed assertion output +1. Use `parse_yaml_with_references()` to inspect raw `Reference`, `ReferenceAll`, `Ignore`, `Flatten`, and `Merge` objects before evaluation +2. Trace `_recursively_resolve_references()` to debug cross-file expansion and circular reference handling +3. Check the post-processing stages in order: `prune_ignores()`, then `flatten_sequences()`, then `merge_mappings()` +4. Run the most specific unit test with `-v` flag to see detailed assertion output ### Updating error messages Ensure error messages follow this pattern: include the problematic value, the path of the file where the error occurred, and the specific constraint violated. This helps spec tests verify proper error handling. diff --git a/.vscode/settings.json b/.vscode/settings.json index f15c4f2..e083273 100644 --- a/.vscode/settings.json +++ b/.vscode/settings.json @@ -1,8 +1,13 @@ { "yaml.customTags": [ "!reference mapping", + "!reference scalar", "!reference-all mapping", + "!reference-all scalar", "!flatten sequence", - "!merge sequence" + "!merge sequence", + "!ignore scalar", + "!ignore mapping", + "!ignore sequence" ] } diff --git a/README.md b/README.md index e1b6866..e22163e 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ uv add yaml-reference ``` ## Spec -![Spec Status](https://img.shields.io/badge/spec%20v0.2.8--1-passing-brightgreen?link=https%3A%2F%2Fgithub.com%2Fdsillman2000%2Fyaml-reference-specs%2Ftree%2Fv0.2.8-1) +![Spec Status](https://img.shields.io/badge/spec%20v0.2.9--0-passing-brightgreen?link=https%3A%2F%2Fgithub.com%2Fdsillman2000%2Fyaml-reference-specs%2Ftree%2Fv0.2.9-0) This Python library implements the YAML specification for cross-file references and YAML composition in YAML files using tags `!reference`, `!reference-all`, `!flatten`, `!merge`, and `!ignore` as defined in the [yaml-reference-specs project](https://github.com/dsillman2000/yaml-reference-specs). @@ -97,6 +97,15 @@ networks: !reference-all { glob: "networks/*.yaml" } Use the mapping form when you need optional arguments such as `anchor`; use the scalar shorthand when you only need `path` or `glob`. +### Multi-Document YAML + +yaml-reference distinguishes between a single YAML document whose root value is a sequence and a YAML file that contains multiple documents separated by `---`. + +- `!reference` requires the target file to contain exactly one YAML document. If the referenced file contains multiple documents, loading fails with a `ValueError`. +- `!reference-all` expands matched files document-by-document. A single-document file contributes one list element, while a multi-document file contributes one element per document in document order. +- When `anchor` is used with `!reference-all`, the anchored value is extracted from every document in each matched file, preserving file order and then document order. +- If the root input file contains multiple documents, `load_yaml_with_references()` returns a Python list with one resolved output element per document. Root documents tagged with `!ignore` are omitted entirely. + ### The `!ignore` Tag The `!ignore` tag marks YAML content that should be parsed but omitted from the final resolved output. The most common use case is a hidden section of reusable anchors that should remain available for aliases elsewhere in the document without being emitted in the resolved result. diff --git a/tests/unit/test_flatten.py b/tests/unit/test_flatten.py index 5bbebcb..31f5407 100644 --- a/tests/unit/test_flatten.py +++ b/tests/unit/test_flatten.py @@ -130,6 +130,21 @@ def test_flatten_combined_with_reference_all(stage_files): assert data["data"] == [1, 2, 3, 4, 5, 6] +def test_flatten_combined_with_multi_document_reference_all(stage_files): + files = { + "main.yml": """ +data: !flatten + - !reference-all { glob: ./entries.yml } +""", + "entries.yml": "---\n- [1, 2]\n---\n- [3, 4]\n", + } + stg = stage_files(files) + + data = load_yaml_with_references(stg / "main.yml") + + assert data["data"] == [1, 2, 3, 4] + + def test_parse_flatten_tag(stage_files): """Test that !flatten tags are parsed correctly without resolution.""" files = { diff --git a/tests/unit/test_merge.py b/tests/unit/test_merge.py index fe6c082..df90f89 100644 --- a/tests/unit/test_merge.py +++ b/tests/unit/test_merge.py @@ -169,3 +169,23 @@ def test_flatten_and_merge(stage_files): stg = stage_files(files) data = load_yaml_with_references(stg / "test.yml") assert data["result"] == [{"a": 2}, {"b": 2, "c": 3}] + + +def test_merge_combined_with_multi_document_reference_all(stage_files): + files = { + "test.yml": """ +result: !merge + - {base: true, version: 1} + - !reference-all { glob: ./patches.yml } +""", + "patches.yml": "---\nversion: 2\n---\nfeature: enabled\n", + } + stg = stage_files(files) + + data = load_yaml_with_references(stg / "test.yml") + + assert data["result"] == { + "base": True, + "version": 2, + "feature": "enabled", + } diff --git a/tests/unit/test_multidocument.py b/tests/unit/test_multidocument.py new file mode 100644 index 0000000..867dcfc --- /dev/null +++ b/tests/unit/test_multidocument.py @@ -0,0 +1,34 @@ +from yaml_reference import load_yaml_with_references + + +def test_multi_document_root_file_loads_as_array(stage_files): + files = { + "root.yml": """ +--- +service: !reference { path: ./service.yml } +--- +ignored_only: !ignore true +--- !ignore +drop_me: true +--- +items: !flatten + - !reference-all { glob: ./entries.yml } +--- +config: !merge + - {a: 1} + - !reference-all { glob: ./patches.yml } +""", + "service.yml": "name: api\n", + "entries.yml": "---\n- [1, 2]\n---\n- [3, 4]\n", + "patches.yml": "---\na: 2\n---\nb: 3\n", + } + stg = stage_files(files) + + data = load_yaml_with_references(stg / "root.yml") + + assert data == [ + {"service": {"name": "api"}}, + {}, + {"items": [1, 2, 3, 4]}, + {"config": {"a": 2, "b": 3}}, + ] diff --git a/tests/unit/test_reference.py b/tests/unit/test_reference.py index 6e5c3bd..53c86c8 100644 --- a/tests/unit/test_reference.py +++ b/tests/unit/test_reference.py @@ -57,6 +57,20 @@ def test_reference_load_shorthand(stage_files): assert data["contents"]["inner"] == "inner_value" +def test_reference_rejects_multi_document_target(stage_files): + files = { + "test.yml": "contents: !reference { path: ./multi.yml }", + "multi.yml": "---\nvalue: 1\n---\nvalue: 2\n", + } + stg = stage_files(files) + + with pytest.raises( + ValueError, + match="contains multiple YAML documents and cannot be used with !reference", + ): + load_yaml_with_references(stg / "test.yml") + + def test_reference_all_load(stage_files): files = { "test.yml": "hello: world\ncontents: !reference-all { glob: ./chapters/*.yml }", @@ -103,6 +117,66 @@ def test_reference_all_load_shorthand(stage_files): assert {"chapter_value": 3} in data["contents"] +def test_reference_all_expands_multi_document_file(stage_files): + files = { + "test.yml": "contents: !reference-all { glob: ./multi.yml }", + "multi.yml": "---\nvalue: 1\n---\nvalue: 2\n", + } + stg = stage_files(files) + + data = load_yaml_with_references(stg / "test.yml") + + assert data["contents"] == [{"value": 1}, {"value": 2}] + + +def test_reference_all_mixed_single_and_multi_document_order(stage_files): + files = { + "test.yml": "contents: !reference-all { glob: ./parts/*.yml }", + "parts/a.yml": "value: a\n", + "parts/b.yml": "---\nvalue: b1\n---\nvalue: b2\n", + "parts/c.yml": "value: c\n", + } + stg = stage_files(files) + + data = load_yaml_with_references(stg / "test.yml") + + assert data["contents"] == [ + {"value": "a"}, + {"value": "b1"}, + {"value": "b2"}, + {"value": "c"}, + ] + + +def test_reference_all_skips_ignored_root_documents_in_multi_document_file(stage_files): + files = { + "test.yml": "contents: !reference-all { glob: ./multi.yml }", + "multi.yml": "--- !ignore\nignored: true\n---\nvalue: kept\n", + } + stg = stage_files(files) + + data = load_yaml_with_references(stg / "test.yml") + + assert data["contents"] == [{"value": "kept"}] + + +def test_reference_all_anchor_extracts_from_every_document(stage_files): + files = { + "test.yml": "contents: !reference-all { glob: ./parts/*.yml, anchor: item }", + "parts/a.yml": "---\nroot: &item {value: 1}\n---\nroot: &item {value: 2}\n", + "parts/b.yml": "root: &item {value: 3}\n", + } + stg = stage_files(files) + + data = load_yaml_with_references(stg / "test.yml") + + assert data["contents"] == [ + {"value": 1}, + {"value": 2}, + {"value": 3}, + ] + + def test_parse_references(stage_files): files = { "test.yml": "inner: !reference { path: next/open.yml }\n", diff --git a/yaml_reference/__init__.py b/yaml_reference/__init__.py index 16945f9..26624c8 100644 --- a/yaml_reference/__init__.py +++ b/yaml_reference/__init__.py @@ -1,6 +1,7 @@ import io import os from collections import defaultdict +from dataclasses import dataclass from pathlib import Path from typing import IO, Any, Optional, Sequence, Union @@ -252,9 +253,32 @@ def from_yaml(cls, constructor, node): return cls(seq) +@dataclass +class MultiDocument: + documents: list[Any] + is_multi_document: bool + + def __repr__(self): + return ( + "MultiDocument(" + f"documents={self.documents!r}, is_multi_document={self.is_multi_document!r}" + ")" + ) + + PathLike = Union[str, Path, os.PathLike] +def _build_yaml_loader() -> YAML: + yaml = YAML(typ="safe") + yaml.register_class(Reference) + yaml.register_class(ReferenceAll) + yaml.register_class(Flatten) + yaml.register_class(Merge) + yaml.register_class(Ignore) + return yaml + + def _check_file_path(path: PathLike, allow_paths: Sequence[PathLike]) -> Path: if not isinstance(path, Path): path = Path(path) @@ -273,11 +297,35 @@ def _check_file_path(path: PathLike, allow_paths: Sequence[PathLike]) -> Path: raise PermissionError(f"File '{path}' is not allowed.") -def _extract_anchor_from_parser_events(yaml: YAML, stream: IO, anchor: str) -> Any: +def _collect_document_event_streams(yaml: YAML, stream: IO) -> list[list[events.Event]]: + document_streams = [] + current_document = None + for event in yaml.parse(stream): + if isinstance(event, events.DocumentStartEvent): + current_document = [events.StreamStartEvent(), event] + elif isinstance(event, events.DocumentEndEvent): + if current_document is None: + current_document = [ + events.StreamStartEvent(), + events.DocumentStartEvent(), + ] + current_document.append(event) + current_document.append(events.StreamEndEvent()) + document_streams.append(current_document) + current_document = None + elif not isinstance(event, (events.StreamStartEvent, events.StreamEndEvent)): + if current_document is not None: + current_document.append(event) + return document_streams + + +def _extract_anchor_from_parser_events( + yaml: YAML, parsed_events: Sequence[events.Event], anchor: str +) -> Any: anchor_lookup = dict() level_lookup = defaultdict(int) _nonzero_keys = lambda dd: [key for key, value in dd.items() if value > 0] # noqa: E731 - for event in yaml.parse(stream): + for event in parsed_events: if ( hasattr(event, "anchor") and event.anchor is not None @@ -360,14 +408,51 @@ def _resolve_aliases(my_events: list[events.Event]) -> list[events.Event]: ) raise ValueError(msg) strio.seek(0) - document = yaml.load(strio) + document = _build_yaml_loader().load(strio) return document +def _parse_yaml_documents( + file_path: PathLike, + anchor: Optional[str] = None, + allow_paths: Optional[Sequence[PathLike]] = None, +) -> MultiDocument: + if not allow_paths: + allow_paths = [Path(file_path).parent.absolute()] + path: Path = _check_file_path(file_path, allow_paths=allow_paths) + + if anchor is None: + yaml = _build_yaml_loader() + with path.open("r") as f: + parsed_documents = list(yaml.load_all(f)) + else: + yaml = _build_yaml_loader() + with path.open("r") as f: + document_streams = _collect_document_event_streams(yaml, f) + if not document_streams: + raise ValueError(f"Anchor '{anchor}' not found in the YAML document.") + parsed_documents = [ + _extract_anchor_from_parser_events(yaml, document_stream, anchor) + for document_stream in document_streams + ] + + if not parsed_documents: + parsed_documents = [None] + + parsed_documents = [ + _recursively_attribute_location_to_references(document, path) + for document in parsed_documents + ] + return MultiDocument( + documents=parsed_documents, + is_multi_document=len(parsed_documents) > 1, + ) + + def parse_yaml_with_references( file_path: PathLike, anchor: Optional[str] = None, - allow_paths: Sequence[PathLike] = [], + allow_paths: Optional[Sequence[PathLike]] = None, ) -> Any: """ Interface method for reading a YAML file into memory which contains references. References are not resolved in the @@ -386,29 +471,25 @@ def parse_yaml_with_references( ValueError: If the file is not a valid YAML file. """ - if not allow_paths: - allow_paths = [Path(file_path).parent.absolute()] - path: Path = _check_file_path(file_path, allow_paths=allow_paths) - - yaml = YAML(typ="safe") - yaml.register_class(Reference) - yaml.register_class(ReferenceAll) - yaml.register_class(Flatten) - yaml.register_class(Merge) - yaml.register_class(Ignore) - - if not anchor: - with path.open("r") as f: - parsed = yaml.load(f) - else: - with path.open("r") as f: - parsed = _extract_anchor_from_parser_events(yaml, f, anchor) - - parsed = _recursively_attribute_location_to_references(parsed, path) + parsed = _parse_yaml_documents( + file_path, + anchor=anchor, + allow_paths=allow_paths, + ) + if not parsed.is_multi_document and len(parsed.documents) == 1: + return parsed.documents[0] return parsed def _recursively_attribute_location_to_references(data: Any, base_path: Path): + if isinstance(data, MultiDocument): + return MultiDocument( + documents=[ + _recursively_attribute_location_to_references(item, base_path) + for item in data.documents + ], + is_multi_document=data.is_multi_document, + ) if isinstance(data, Flatten): return Flatten( sequence=[ @@ -514,6 +595,17 @@ def _recursively_resolve_references( if visited_paths is None: visited_paths = set() + if isinstance(data, MultiDocument): + return MultiDocument( + documents=[ + _recursively_resolve_references( + item, allow_paths=allow_paths, visited_paths=visited_paths + ) + for item in data.documents + ], + is_multi_document=data.is_multi_document, + ) + if isinstance(data, Flatten): return Flatten( sequence=[ @@ -547,11 +639,18 @@ def _recursively_resolve_references( # Check for circular reference and track path _check_and_track_path(abs_path, visited_paths) - parsed = parse_yaml_with_references( + parsed = _parse_yaml_documents( abs_path, anchor=data.anchor, allow_paths=allow_paths ) + + if len(parsed.documents) != 1: + visited_paths.remove(abs_path) + raise ValueError( + f"Referenced file '{abs_path}' contains multiple YAML documents and cannot be used with !reference." + ) + resolved = _recursively_resolve_references( - parsed, allow_paths=allow_paths, visited_paths=visited_paths + parsed.documents[0], allow_paths=allow_paths, visited_paths=visited_paths ) # Remove current path from visited set after processing @@ -587,13 +686,16 @@ def _recursively_resolve_references( # Check for circular reference and track path _check_and_track_path(path, visited_paths) - parsed = parse_yaml_with_references( + parsed = _parse_yaml_documents( path, anchor=data.anchor, allow_paths=allow_paths ) resolved = _recursively_resolve_references( parsed, allow_paths=allow_paths, visited_paths=visited_paths ) - resolved_items.append(resolved) + if isinstance(resolved, MultiDocument): + resolved_items.extend(resolved.documents) + else: + resolved_items.append(resolved) # Remove current path from visited set after processing visited_paths.remove(path) @@ -623,6 +725,11 @@ def flatten_sequences(data: Any) -> Any: Given an object which may contain Flatten(...) objects which was parsed from a YAML document containing !flatten tags, return the object without any Flatten(...) objects, but having flattened all sequences marked with them. """ + if isinstance(data, MultiDocument): + return MultiDocument( + documents=[flatten_sequences(item) for item in data.documents], + is_multi_document=data.is_multi_document, + ) if isinstance(data, Flatten): return data.flattened() if isinstance(data, Merge): @@ -641,6 +748,11 @@ def merge_mappings(data: Any) -> Any: Given an object which may contain Merge(...) objects which was parsed from a YAML document containing !merge tags, return the object without any Merge(...) objects, but having merged all mappings marked with them. """ + if isinstance(data, MultiDocument): + return MultiDocument( + documents=[merge_mappings(item) for item in data.documents], + is_multi_document=data.is_multi_document, + ) if isinstance(data, Merge): return merge_mappings(data.merged()) if isinstance(data, list): @@ -658,6 +770,25 @@ def prune_ignores(data: Any) -> Any: removed from the list. If an Ignore(...) object is found as a value in a dict, the key-value pair is removed from the dict. If an Ignore(...) object is found as a value which is not in a list or dict, it is replaced with None. """ + if isinstance(data, MultiDocument): + if not data.is_multi_document: + if not data.documents: + return MultiDocument(documents=[None], is_multi_document=False) + return MultiDocument( + documents=[prune_ignores(data.documents[0])], + is_multi_document=False, + ) + + pruned_documents = [] + for item in data.documents: + # For multi-document streams, only omit documents explicitly tagged !ignore. + # Preserve documents that prune to None (e.g., explicit null/empty documents) + # so that document count and ordering remain stable. + if isinstance(item, Ignore): + continue + pruned_item = prune_ignores(item) + pruned_documents.append(pruned_item) + return MultiDocument(documents=pruned_documents, is_multi_document=True) if isinstance(data, Ignore): return None if isinstance(data, Flatten): @@ -715,7 +846,7 @@ def load_yaml_with_references( allow_paths = [] allow_paths += [Path(file_path).parent.absolute()] path = _check_file_path(file_path, allow_paths=allow_paths) - parsed = parse_yaml_with_references(path, allow_paths=allow_paths) + parsed = _parse_yaml_documents(path, allow_paths=allow_paths) # Initialize visited paths with the root file to detect self-references visited_paths = {path.resolve()} @@ -732,6 +863,14 @@ def load_yaml_with_references( pruned = prune_ignores(resolved) flattened = flatten_sequences(pruned) merged = merge_mappings(flattened) + if isinstance(merged, MultiDocument): + if merged.is_multi_document: + return merged.documents + if not merged.documents: + return None + if len(merged.documents) == 1: + return merged.documents[0] + return None return merged @@ -742,6 +881,7 @@ def load_yaml_with_references( "Flatten", "merge_mappings", "Merge", + "MultiDocument", "prune_ignores", "Ignore", ] diff --git a/yaml_reference/cli.py b/yaml_reference/cli.py index 1ec01a9..6fda404 100644 --- a/yaml_reference/cli.py +++ b/yaml_reference/cli.py @@ -2,6 +2,7 @@ import sys from pathlib import Path +from ruamel.yaml.error import YAMLError from yaml_reference import load_yaml_with_references @@ -33,6 +34,12 @@ def compile_main(input_file: str, allow_paths: list[str] = []): file=sys.stderr, ) sys.exit(1) + except (FileNotFoundError, ValueError, YAMLError) as err: + print( + f'Error: Failed to compile "{input_path}":\n{err}', + file=sys.stderr, + ) + sys.exit(1) json.dump(data, sys.stdout, sort_keys=True, indent=2)