Skip to content

buffa-yaml: YAML serialization extension crate #101

@iainmcgin

Description

@iainmcgin

Tracking issue for adding YAML support to buffa, equivalent to what bufbuild/protoyaml-go provides for Go: marshal and unmarshal protobuf messages as YAML, fully compatible with the protobuf-canonical JSON mapping. Useful for hand-authored config files, test fixtures, and anywhere YAML is more readable than JSON.

How protoyaml-go is built (and why buffa is structurally different)

protoyaml-go is asymmetric: encode delegates to protojson (proto → JSON bytes → re-emit as YAML), decode is a hand-rolled *yaml.Node tree walker driven by protoreflect. The decode side is hand-rolled for two reasons: rich diagnostics (file:line:col + source line + ^ pointer), and lenience extensions beyond protojson (hex/octal/binary int literals, byte-size suffixes like 1Ki, Go-style durations like 1h30m, field-number addressing).

buffa's situation is different. buffa's JSON support is serde-derive-based — the generated Serialize/Deserialize impls and buffa::json_helpers with-modules encode protobuf-JSON semantics through serde's carrier-agnostic data model. They never call format-specific APIs except using serde_json::Value as an opaque scratch type (which any carrier can populate, since Value: Deserialize). This means the equivalent of protoyaml-go's encode and decode paths already exist if the carrier is swapped from serde_json to a YAML serde crate:

let yaml = serde_norway::to_string(&msg)?;          // shared Serialize impl
let msg: MyMessage = serde_norway::from_str(yaml)?;  // shared Deserialize impl

What you would not get from this is the diagnostic snippets and the lenience extensions. Those are explicitly out of scope for Phase 1.

Design: buffa-yaml extension crate

A new published workspace member rather than a feature flag on buffa core:

  1. Decouple YAML-crate churn from buffa's versioning. The carrier crate ecosystem is unstable (serde_yaml deprecated 2024-03, serde_yml is RUSTSEC-2025-0068 unsound + unmaintained, serde_norway/serde_yaml_ng/serde-saphyr are the survivors). If the carrier needs to change, that's a buffa-yaml 0.x.y bump, not a buffa semver event.
  2. Curate the supply chain. Pinning a vetted carrier in a published crate steers users away from the CVE-bearing fork.
  3. Keep buffa core's dependency surface minimal. YAML carriers are std-only; buffa core is no_std-capable.
  4. Room to grow. A future lenience layer (Phase 2) needs a home.
  5. Precedent. The json and text features in buffa core gate runtime infrastructure. YAML doesn't need new runtime infrastructure — it reuses the JSON serde impls. The thing it adds is a carrier dependency and convenience wrappers.

Carrier choice: serde_norway

Empirically validated by round-tripping buffa-test's generated types (json_types, edge_cases non-string maps, ext_json extension-range messages) plus WKTs through both candidates:

  • serde_norway (0.9.42, maintained fork of serde_yaml, wraps unsafe-libyaml-norway): all round-trips clean, zero buffa changes required. Inherits dtolnay's restricted scalar resolution, so name: no arrives as a string (no Norway problem). Float specials (Infinity, .nan) are delivered as visit_f64, which buffa's float/double helpers already accept. !!binary is passed through as the raw text; buffa's single base64 decode produces the right answer.
  • serde-saphyr (0.0.26, pure Rust, span diagnostics, forbid(unsafe_code), anti-DoS budgets): three conflicts found, all addressable but each needs work — (a) it canonicalizes float specials to .nan/.inf strings in deserialize_any (designed so serde_json::Value round-trips them), which buffa's visit_str rejects; (b) it pre-decodes !!binary so buffa's visit_str decodes again (double-decode), needs ignore_binary_tag_for_string: true; (c) defaults to YAML 1.1 lenient bools (Norway problem), needs strict_booleans: true. Also 0.0.x with the Options API mid-deprecation.

serde_norway is the safe Phase 1 default. serde-saphyr is the more interesting long-term carrier — its diagnostics and security posture align with what makes protoyaml-go valuable — but switching to it would need (a) a gated JsonParseOptions::accept_yaml_float_literals option in buffa core (so JSON parsing stays strictly conformant; JsonParseOptions is already #[non_exhaustive]), and (b) the Options API to settle. Track as a future revisit.

A note on #[serde(flatten)]: this was a concern (saphyr documents flatten limitations), but it turns out to be a non-issue. buffa's codegen never uses serde's flatten machinery on the deserialize side — every message that has a flattened oneof or extension wrapper also gets a hand-written visit_map Deserialize impl. The derived #[derive(Serialize)] does use #[serde(flatten)], but serialize-side flatten is just FlatMapSerializer collecting keys into the parent map, which every carrier supports. Confirmed in the spike.

API surface (Phase 1)

Free functions mirroring serde_json / serde_norway conventions, with a Message bound for discoverability and future-proofing:

pub fn to_string<M: buffa::Message + serde::Serialize>(msg: &M) -> Result<String, Error>;
pub fn to_writer<W: io::Write, M: buffa::Message + serde::Serialize>(w: W, msg: &M) -> Result<(), Error>;
pub fn from_str<M: buffa::Message + serde::de::DeserializeOwned>(s: &str) -> Result<M, Error>;
pub fn from_slice<M: buffa::Message + serde::de::DeserializeOwned>(b: &[u8]) -> Result<M, Error>;
pub fn from_reader<R: io::Read, M: buffa::Message + serde::de::DeserializeOwned>(r: R) -> Result<M, Error>;

Error wraps the carrier's error type and exposes its Location (line/col) so callers can render diagnostics.

Layout:

buffa-yaml/
  Cargo.toml      # buffa (default-features=false, features=["json","std"]), serde, serde_norway
  src/
    lib.rs        # docs + re-exports
    encode.rs     # to_string, to_writer
    decode.rs     # from_str, from_slice, from_reader
    error.rs      # Error wrapper preserving Location

Documented behavioral deltas vs protoyaml-go

buffa-yaml Phase 1 targets "protobuf-JSON semantics on a YAML carrier," not full protoyaml-go parity. Differences to document up front:

Capability protoyaml-go buffa-yaml Phase 1
protojson-equivalent encode/decode
camelCase + snake_case field names ✅ (#[serde(rename, alias)])
Hex/octal int literals (0x1F) ✅ (carrier scalar resolution)
Field number as YAML key (13: true)
Reject YAML 1.1 bool aliases (True/TRUE) ✅ (only true/false) ❌ — carrier-defined, accepts case variants
Byte-size suffixes (1Ki, 2Gi)
Go-style duration (1h30m) ❌ — protojson "1.5s" only
Line/column in errors ✅ (Location)
^ pointer + snippet rendering
Validation hooks (protovalidate) n/a
Encode from view types n/a ❌ — blocked on #83

The "free wins" from the carrier (hex literals, RFC3339, multi-form aliases, .inf-as-float) get us partway to protoyaml-go's lenience without writing any code.

Test plan

A #[cfg(test)] module driving buffa-test's generated types through round-trips:

  • Every WKT: Timestamp, Duration, Any, Struct/Value/ListValue, FieldMask, Empty, all wrappers.
  • int64 as quoted string round-trip (>2^53 precision check), double NaN/Inf as string token, base64 bytes, multi-line strings.
  • Oneof field naming (serialize-side #[serde(flatten)]).
  • Extension-range messages (serialize-side flatten of the ext wrapper).
  • Maps with string/int/bool keys (carrier coerces YAML int keys to string, buffa's int-keyed map deserializer parses them — confirmed in spike).
  • Repeated/map enum with unknown-value filtering (the with_json_parse_options thread-local path — works carrier-agnostically since it routes through serde_json::Value).
  • A hand-written .yaml fixture exercising YAML-specific scalar resolution surprises (yes/0x1F/~/<<: merge key/!!binary) and asserting the actual carrier behavior, to lock it in as a documented baseline. This is the test most likely to catch a carrier-version regression.

Out of scope (potential follow-ups)

  • Lenience layer (byte-size suffixes, Go durations, field-number addressing, snippet diagnostics). These need either a YAML→serde_json::Value bridge with a normalization pass, or runtime reflection (Add linked descriptor types for runtime reflection #9) for type-directed parsing. Realistically a 600–1000 LOC effort. File only if there's actual demand for 1Ki-style literals.
  • accept_yaml_float_literals parse option in buffa core, gated behind JsonParseOptions. Sketch is ready; only needed if the carrier changes to one that string-canonicalizes float specials (see serde-saphyr notes above).
  • musli rearchitecture. A unified mode-parameterized text-format framework would solve the lenience problem at the type level instead of with ambient state, but it's a buffa 2.0-scale project. Tracked separately in Discussion: musli as buffa's text-format framework #100.
  • View encoding. Blocked on View types lack serde::Serialize, forcing to_owned() for JSON #83 (View types lack serde::Serialize).

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions