feat: add Root::parse_bytes for non-UTF8 input by philiptaron · Pull Request #179 · nix-community/rnix-parser

philiptaron · 2026-01-29T13:44:30Z

Summary

Add Root::parse_bytes(&[u8]) method that handles non-UTF8 input
Invalid UTF-8 sequences are replaced with U+FFFD (replacement character)

Background

The C++ Nix parser uses %option 8bit in its flex lexer, which allows it to handle arbitrary byte values without UTF-8 validation. The raw bytes are preserved in string literals.

rnix uses Rowan for its syntax tree, which requires valid UTF-8. This means we cannot preserve arbitrary bytes exactly. Instead, parse_bytes does lossy UTF-8 conversion - invalid sequences become U+FFFD (�).

Behavior comparison:

Input bytes	nix output	rnix parse_bytes output
`{ x = "\xff"; }`	`{ x = "\xff"; }` (raw 0xFF preserved)	`{ x = "�"; }` (U+FFFD replacement)

This is sufficient for most use cases (linting, formatting, analysis) where exact byte preservation isn't required.

Example

// { x = "\xff"; } with raw 0xFF byte (invalid UTF-8)
let bytes: &[u8] = &[0x7b, 0x20, 0x78, 0x20, 0x3d, 0x20, 0x22, 0xff, 0x22, 0x3b, 0x20, 0x7d];
let parse = Root::parse_bytes(bytes);
assert!(parse.errors().is_empty());
// Output text: { x = "�"; }

Test plan

Added test case non_utf8_can_be_parsed_with_parse_bytes_issue173
All existing tests pass

Closes #173

nix (C++) can parse files with non-UTF8 bytes, but rnix currently requires valid UTF-8 because Root::parse takes &str.

Add a new `parse_bytes(&[u8])` method that handles non-UTF8 input by doing lossy UTF-8 conversion. Invalid byte sequences are replaced with U+FFFD (replacement character), matching the behavior of the C++ Nix parser. This allows parsing `.nix` files that contain non-UTF8 bytes, which was previously impossible since `Root::parse` requires `&str`. Closes #173

philiptaron added 2 commits January 29, 2026 05:41

test: add test documenting non-UTF8 limitation (issue #173)

8d7c39c

nix (C++) can parse files with non-UTF8 bytes, but rnix currently requires valid UTF-8 because Root::parse takes &str.

philiptaron marked this pull request as draft January 29, 2026 13:47

style: fix formatting

f8c7001

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Root::parse_bytes for non-UTF8 input#179

feat: add Root::parse_bytes for non-UTF8 input#179
philiptaron wants to merge 3 commits intomasterfrom
fix-issue-173

philiptaron commented Jan 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

philiptaron commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Example

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

philiptaron commented Jan 29, 2026 •

edited

Loading