Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 30 additions & 21 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Project Overview

**yaml-reference** is a Python library that extends `ruamel.yaml` with cross-file YAML composition using custom tags (`!reference`, `!reference-all`, `!flatten`, `!merge`). It's built to be a reference implementation of the [yaml-reference-specs](https://github.com/dsillman2000/yaml-reference-specs) specification.
**yaml-reference** is a Python library that extends `ruamel.yaml` with cross-file YAML composition using custom tags (`!reference`, `!reference-all`, `!flatten`, `!merge`, `!ignore`). It's built to be a reference implementation of the [yaml-reference-specs](https://github.com/dsillman2000/yaml-reference-specs) specification.

## Build, Test, and Lint

Expand Down Expand Up @@ -62,19 +62,21 @@ uv build
The library is structured in two key parts:

### Core Module (`yaml_reference/__init__.py`)
- **Reference & ReferenceAll classes**: Represent the `!reference` and `!reference-all` YAML tags as Python objects
- **parse_yaml_with_references()**: Parses YAML, returning Reference/ReferenceAll objects without resolving them (one layer only)
- **load_yaml_with_references()**: Fully recursively resolves all references, returning a complete Python dict
- **Flatten & Merge classes**: Represent `!flatten` and `!merge` tag logic
- **YAML loader setup**: Registers custom constructors with `ruamel.yaml.YAML` for each tag
- **Reference & ReferenceAll classes**: Represent the `!reference` and `!reference-all` YAML tags as Python objects, supporting both mapping form and scalar shorthand (`!reference path/to/file.yml`, `!reference-all glob/*.yml`)
- **Ignore, Flatten, and Merge classes**: Represent `!ignore`, `!flatten`, and `!merge` tag logic
- **parse_yaml_with_references()**: Parses YAML and preserves composition tags as Python objects without resolving cross-file references
- **load_yaml_with_references()**: Fully resolves references, then prunes ignored content, flattens sequences, and merges mappings to produce the final Python data structure
- **Helper transforms**: `prune_ignores()`, `flatten_sequences()`, and `merge_mappings()` implement the post-resolution evaluation pipeline
- **YAML loader setup**: Registers custom constructors with `ruamel.yaml.YAML` for each supported tag

### CLI Module (`yaml_reference/cli.py`)
- Simple entry point that calls the core loading functions
- Simple entry point that calls the core loading functions for YAML containing any supported composition tags
- Outputs JSON to stdout (compatible with spec tests)
- Takes optional `--allow` flag for path restrictions

### Test Structure (`tests/unit/`)
- `test_reference.py`: Tests for `!reference` and `!reference-all` tag resolution
- `test_ignore.py`: Tests for `!ignore` parsing and pruning behavior
- `test_flatten.py`: Tests for `!flatten` tag behavior
- `test_merge.py`: Tests for `!merge` tag behavior
- `conftest.py`: Pytest fixtures and test utilities
Expand All @@ -83,33 +85,40 @@ The library is structured in two key parts:

### Security-First Path Handling
1. **Relative paths only**: All references must use relative paths (e.g., `path: "config/db.yaml"`). Absolute paths raise `ValueError`.
2. **Path restriction by default**: References can only access files in the same directory or subdirectories (no `..` to escape). Use `allow_paths` parameter to explicitly allow other directory trees.
2. **Path restriction by default**: The referencing file's parent directory is always allowed. Use `allow_paths` to explicitly allow additional directory trees.
3. **Security invariant**: Disallowed files are **never opened or read into memory**. Path filtering happens before file I/O.
4. **Silent omission (for `!reference-all`)**: When a glob pattern matches files outside allowed paths, those files are silently dropped from results and the function returns `rc=0` (not an error).
4. **Silent omission (for `!reference-all`)**: When a glob pattern matches files outside allowed paths, those files are silently dropped from results. Empty or fully filtered globs resolve to `[]` rather than an error.

### YAML Tag Implementation Pattern
Each custom tag follows this pattern:
1. Define a class with `yaml_tag` attribute
2. Implement `@classmethod from_yaml(cls, constructor, node)` to parse from YAML
2. Implement `@classmethod from_yaml(cls, constructor, node)` to parse from YAML, handling scalar, mapping, or sequence nodes as needed
3. Register constructor with the YAML loader in `__init__.py`
4. The class instance persists through `parse_yaml_with_references()`, allowing layer-by-layer resolution

### Reference Tag Forms
1. **Scalar shorthand is supported**: `!reference path/to/file.yml` and `!reference-all glob/*.yml` are valid when only `path` or `glob` is needed.
2. **Mapping form is still required for optional fields**: Use mappings such as `{ path: "file.yml", anchor: "section" }` when specifying `anchor`.

### Reference Resolution Order
1. **Circular reference detection** occurs during recursive resolution by tracking a "resolution stack"
1. **Circular reference detection** occurs during recursive resolution by tracking visited file paths
2. **Anchors** (optional parameter): If specified, extract only the anchored section from the referenced file
3. **Recursive expansion**: `load_yaml_with_references()` recursively expands all tags, applying `!flatten` and `!merge` logic as it encounters them
3. **Recursive expansion**: `load_yaml_with_references()` recursively resolves `!reference` and `!reference-all` first
4. **Ignore pruning**: `!ignore` content is removed after full reference resolution so ignored values from referenced files can remove their parent keys or list items
5. **Post-processing**: `!flatten` is evaluated after ignore pruning, and `!merge` is evaluated last

### Error Handling
- **ValueError** for spec violations: absolute paths, circular references, invalid anchors
- **ValueError** for spec violations: absolute paths, circular references, invalid anchors, malformed merge contents
- **FileNotFoundError** for missing referenced files
- **Glob errors**: Return empty list `[]` if glob matches no files (silent omission)
- **PermissionError** for disallowed `!reference` targets
- **Glob behavior**: `!reference-all` returns `[]` when a glob matches no files or when all matches are filtered out by path restrictions

### Spec Compliance Testing
The project tests against `yaml-reference-specs`, a Go-based reference implementation. The spec tests verify:
- Correct expansion of all four tags
- Correct expansion of all supported tags
- Proper error detection (bad paths, missing files, circular refs)
- Path restriction enforcement
- Edge cases like empty globs and nested composition
- Edge cases like empty globs, ignored content, shorthand reference syntax, and nested composition

Run with: `make spec-test` or `scripts/spec-test.sh`

Expand All @@ -127,15 +136,15 @@ Install hooks with: `pre-commit install`
### Adding a new tag type
1. Create a class in `yaml_reference/__init__.py` with `yaml_tag` attribute and `from_yaml()` classmethod
2. Register the constructor after the class definition
3. Add resolution logic (handle in recursive expansion)
3. Add resolution or post-processing logic in the appropriate stage (`_recursively_resolve_references()`, `prune_ignores()`, `flatten_sequences()`, or `merge_mappings()`)
4. Write tests in `tests/unit/test_*.py` following existing patterns
5. Update README.md with usage example

### Debugging a reference resolution issue
1. Use `parse_yaml_with_references()` to see raw Reference objects before resolution
2. Add print statements or use a debugger to trace the `_resolve_references()` recursive calls
3. Check the resolution stack to verify circular reference detection is working
4. Run a specific test with `-v` flag to see detailed assertion output
1. Use `parse_yaml_with_references()` to inspect raw `Reference`, `ReferenceAll`, `Ignore`, `Flatten`, and `Merge` objects before evaluation
2. Trace `_recursively_resolve_references()` to debug cross-file expansion and circular reference handling
3. Check the post-processing stages in order: `prune_ignores()`, then `flatten_sequences()`, then `merge_mappings()`
4. Run the most specific unit test with `-v` flag to see detailed assertion output

### Updating error messages
Ensure error messages follow this pattern: include the problematic value, the path of the file where the error occurred, and the specific constraint violated. This helps spec tests verify proper error handling.
7 changes: 6 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
{
"yaml.customTags": [
"!reference mapping",
"!reference scalar",
"!reference-all mapping",
"!reference-all scalar",
"!flatten sequence",
"!merge sequence"
"!merge sequence",
"!ignore scalar",
"!ignore mapping",
"!ignore sequence"
]
}
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ uv add yaml-reference
```

## Spec
![Spec Status](https://img.shields.io/badge/spec%20v0.2.8--1-passing-brightgreen?link=https%3A%2F%2Fgithub.com%2Fdsillman2000%2Fyaml-reference-specs%2Ftree%2Fv0.2.8-1)
![Spec Status](https://img.shields.io/badge/spec%20v0.2.9--0-passing-brightgreen?link=https%3A%2F%2Fgithub.com%2Fdsillman2000%2Fyaml-reference-specs%2Ftree%2Fv0.2.9-0)

This Python library implements the YAML specification for cross-file references and YAML composition in YAML files using tags `!reference`, `!reference-all`, `!flatten`, `!merge`, and `!ignore` as defined in the [yaml-reference-specs project](https://github.com/dsillman2000/yaml-reference-specs).

Expand Down Expand Up @@ -97,6 +97,15 @@ networks: !reference-all { glob: "networks/*.yaml" }

Use the mapping form when you need optional arguments such as `anchor`; use the scalar shorthand when you only need `path` or `glob`.

### Multi-Document YAML

yaml-reference distinguishes between a single YAML document whose root value is a sequence and a YAML file that contains multiple documents separated by `---`.

- `!reference` requires the target file to contain exactly one YAML document. If the referenced file contains multiple documents, loading fails with a `ValueError`.
- `!reference-all` expands matched files document-by-document. A single-document file contributes one list element, while a multi-document file contributes one element per document in document order.
- When `anchor` is used with `!reference-all`, the anchored value is extracted from every document in each matched file, preserving file order and then document order.
- If the root input file contains multiple documents, `load_yaml_with_references()` returns a Python list with one resolved output element per document. Root documents tagged with `!ignore` are omitted entirely.

### The `!ignore` Tag

The `!ignore` tag marks YAML content that should be parsed but omitted from the final resolved output. The most common use case is a hidden section of reusable anchors that should remain available for aliases elsewhere in the document without being emitted in the resolved result.
Expand Down
15 changes: 15 additions & 0 deletions tests/unit/test_flatten.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,21 @@ def test_flatten_combined_with_reference_all(stage_files):
assert data["data"] == [1, 2, 3, 4, 5, 6]


def test_flatten_combined_with_multi_document_reference_all(stage_files):
files = {
"main.yml": """
data: !flatten
- !reference-all { glob: ./entries.yml }
""",
"entries.yml": "---\n- [1, 2]\n---\n- [3, 4]\n",
}
stg = stage_files(files)

data = load_yaml_with_references(stg / "main.yml")

assert data["data"] == [1, 2, 3, 4]


def test_parse_flatten_tag(stage_files):
"""Test that !flatten tags are parsed correctly without resolution."""
files = {
Expand Down
20 changes: 20 additions & 0 deletions tests/unit/test_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,3 +169,23 @@ def test_flatten_and_merge(stage_files):
stg = stage_files(files)
data = load_yaml_with_references(stg / "test.yml")
assert data["result"] == [{"a": 2}, {"b": 2, "c": 3}]


def test_merge_combined_with_multi_document_reference_all(stage_files):
files = {
"test.yml": """
result: !merge
- {base: true, version: 1}
- !reference-all { glob: ./patches.yml }
""",
"patches.yml": "---\nversion: 2\n---\nfeature: enabled\n",
}
stg = stage_files(files)

data = load_yaml_with_references(stg / "test.yml")

assert data["result"] == {
"base": True,
"version": 2,
"feature": "enabled",
}
34 changes: 34 additions & 0 deletions tests/unit/test_multidocument.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
from yaml_reference import load_yaml_with_references


def test_multi_document_root_file_loads_as_array(stage_files):
files = {
"root.yml": """
---
service: !reference { path: ./service.yml }
---
ignored_only: !ignore true
--- !ignore
drop_me: true
---
items: !flatten
- !reference-all { glob: ./entries.yml }
---
config: !merge
- {a: 1}
- !reference-all { glob: ./patches.yml }
""",
"service.yml": "name: api\n",
"entries.yml": "---\n- [1, 2]\n---\n- [3, 4]\n",
"patches.yml": "---\na: 2\n---\nb: 3\n",
}
stg = stage_files(files)

data = load_yaml_with_references(stg / "root.yml")

assert data == [
{"service": {"name": "api"}},
{},
{"items": [1, 2, 3, 4]},
{"config": {"a": 2, "b": 3}},
]
74 changes: 74 additions & 0 deletions tests/unit/test_reference.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,20 @@ def test_reference_load_shorthand(stage_files):
assert data["contents"]["inner"] == "inner_value"


def test_reference_rejects_multi_document_target(stage_files):
files = {
"test.yml": "contents: !reference { path: ./multi.yml }",
"multi.yml": "---\nvalue: 1\n---\nvalue: 2\n",
}
stg = stage_files(files)

with pytest.raises(
ValueError,
match="contains multiple YAML documents and cannot be used with !reference",
):
load_yaml_with_references(stg / "test.yml")


def test_reference_all_load(stage_files):
files = {
"test.yml": "hello: world\ncontents: !reference-all { glob: ./chapters/*.yml }",
Expand Down Expand Up @@ -103,6 +117,66 @@ def test_reference_all_load_shorthand(stage_files):
assert {"chapter_value": 3} in data["contents"]


def test_reference_all_expands_multi_document_file(stage_files):
files = {
"test.yml": "contents: !reference-all { glob: ./multi.yml }",
"multi.yml": "---\nvalue: 1\n---\nvalue: 2\n",
}
stg = stage_files(files)

data = load_yaml_with_references(stg / "test.yml")

assert data["contents"] == [{"value": 1}, {"value": 2}]


def test_reference_all_mixed_single_and_multi_document_order(stage_files):
files = {
"test.yml": "contents: !reference-all { glob: ./parts/*.yml }",
"parts/a.yml": "value: a\n",
"parts/b.yml": "---\nvalue: b1\n---\nvalue: b2\n",
"parts/c.yml": "value: c\n",
}
stg = stage_files(files)

data = load_yaml_with_references(stg / "test.yml")

assert data["contents"] == [
{"value": "a"},
{"value": "b1"},
{"value": "b2"},
{"value": "c"},
]


def test_reference_all_skips_ignored_root_documents_in_multi_document_file(stage_files):
files = {
"test.yml": "contents: !reference-all { glob: ./multi.yml }",
"multi.yml": "--- !ignore\nignored: true\n---\nvalue: kept\n",
}
stg = stage_files(files)

data = load_yaml_with_references(stg / "test.yml")

assert data["contents"] == [{"value": "kept"}]


def test_reference_all_anchor_extracts_from_every_document(stage_files):
files = {
"test.yml": "contents: !reference-all { glob: ./parts/*.yml, anchor: item }",
"parts/a.yml": "---\nroot: &item {value: 1}\n---\nroot: &item {value: 2}\n",
"parts/b.yml": "root: &item {value: 3}\n",
}
stg = stage_files(files)

data = load_yaml_with_references(stg / "test.yml")

assert data["contents"] == [
{"value": 1},
{"value": 2},
{"value": 3},
]


def test_parse_references(stage_files):
files = {
"test.yml": "inner: !reference { path: next/open.yml }\n",
Expand Down
Loading
Loading