You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tracking issue for adding YAML support to buffa, equivalent to what bufbuild/protoyaml-go provides for Go: marshal and unmarshal protobuf messages as YAML, fully compatible with the protobuf-canonical JSON mapping. Useful for hand-authored config files, test fixtures, and anywhere YAML is more readable than JSON.
How protoyaml-go is built (and why buffa is structurally different)
protoyaml-go is asymmetric: encode delegates to protojson (proto → JSON bytes → re-emit as YAML), decode is a hand-rolled *yaml.Node tree walker driven by protoreflect. The decode side is hand-rolled for two reasons: rich diagnostics (file:line:col + source line + ^ pointer), and lenience extensions beyond protojson (hex/octal/binary int literals, byte-size suffixes like 1Ki, Go-style durations like 1h30m, field-number addressing).
buffa's situation is different. buffa's JSON support is serde-derive-based — the generated Serialize/Deserialize impls and buffa::json_helperswith-modules encode protobuf-JSON semantics through serde's carrier-agnostic data model. They never call format-specific APIs except using serde_json::Value as an opaque scratch type (which any carrier can populate, since Value: Deserialize). This means the equivalent of protoyaml-go's encode and decode paths already exist if the carrier is swapped from serde_json to a YAML serde crate:
What you would not get from this is the diagnostic snippets and the lenience extensions. Those are explicitly out of scope for Phase 1.
Design: buffa-yaml extension crate
A new published workspace member rather than a feature flag on buffa core:
Decouple YAML-crate churn from buffa's versioning. The carrier crate ecosystem is unstable (serde_yaml deprecated 2024-03, serde_yml is RUSTSEC-2025-0068 unsound + unmaintained, serde_norway/serde_yaml_ng/serde-saphyr are the survivors). If the carrier needs to change, that's a buffa-yaml 0.x.y bump, not a buffa semver event.
Curate the supply chain. Pinning a vetted carrier in a published crate steers users away from the CVE-bearing fork.
Keep buffa core's dependency surface minimal. YAML carriers are std-only; buffa core is no_std-capable.
Room to grow. A future lenience layer (Phase 2) needs a home.
Precedent. The json and text features in buffa core gate runtime infrastructure. YAML doesn't need new runtime infrastructure — it reuses the JSON serde impls. The thing it adds is a carrier dependency and convenience wrappers.
Carrier choice: serde_norway
Empirically validated by round-tripping buffa-test's generated types (json_types, edge_cases non-string maps, ext_json extension-range messages) plus WKTs through both candidates:
serde_norway (0.9.42, maintained fork of serde_yaml, wraps unsafe-libyaml-norway): all round-trips clean, zero buffa changes required. Inherits dtolnay's restricted scalar resolution, so name: no arrives as a string (no Norway problem). Float specials (Infinity, .nan) are delivered as visit_f64, which buffa's float/double helpers already accept. !!binary is passed through as the raw text; buffa's single base64 decode produces the right answer.
serde-saphyr (0.0.26, pure Rust, span diagnostics, forbid(unsafe_code), anti-DoS budgets): three conflicts found, all addressable but each needs work — (a) it canonicalizes float specials to .nan/.inf strings in deserialize_any (designed so serde_json::Value round-trips them), which buffa's visit_str rejects; (b) it pre-decodes !!binary so buffa's visit_str decodes again (double-decode), needs ignore_binary_tag_for_string: true; (c) defaults to YAML 1.1 lenient bools (Norway problem), needs strict_booleans: true. Also 0.0.x with the Options API mid-deprecation.
serde_norway is the safe Phase 1 default. serde-saphyr is the more interesting long-term carrier — its diagnostics and security posture align with what makes protoyaml-go valuable — but switching to it would need (a) a gated JsonParseOptions::accept_yaml_float_literals option in buffa core (so JSON parsing stays strictly conformant; JsonParseOptions is already #[non_exhaustive]), and (b) the Options API to settle. Track as a future revisit.
A note on #[serde(flatten)]: this was a concern (saphyr documents flatten limitations), but it turns out to be a non-issue. buffa's codegen never uses serde's flatten machinery on the deserialize side — every message that has a flattened oneof or extension wrapper also gets a hand-written visit_map Deserialize impl. The derived #[derive(Serialize)] does use #[serde(flatten)], but serialize-side flatten is just FlatMapSerializer collecting keys into the parent map, which every carrier supports. Confirmed in the spike.
API surface (Phase 1)
Free functions mirroring serde_json / serde_norway conventions, with a Message bound for discoverability and future-proofing:
The "free wins" from the carrier (hex literals, RFC3339, multi-form aliases, .inf-as-float) get us partway to protoyaml-go's lenience without writing any code.
Test plan
A #[cfg(test)] module driving buffa-test's generated types through round-trips:
Every WKT: Timestamp, Duration, Any, Struct/Value/ListValue, FieldMask, Empty, all wrappers.
int64 as quoted string round-trip (>2^53 precision check), double NaN/Inf as string token, base64 bytes, multi-line strings.
Oneof field naming (serialize-side #[serde(flatten)]).
Extension-range messages (serialize-side flatten of the ext wrapper).
Maps with string/int/bool keys (carrier coerces YAML int keys to string, buffa's int-keyed map deserializer parses them — confirmed in spike).
Repeated/map enum with unknown-value filtering (the with_json_parse_options thread-local path — works carrier-agnostically since it routes through serde_json::Value).
A hand-written .yaml fixture exercising YAML-specific scalar resolution surprises (yes/0x1F/~/<<: merge key/!!binary) and asserting the actual carrier behavior, to lock it in as a documented baseline. This is the test most likely to catch a carrier-version regression.
Out of scope (potential follow-ups)
Lenience layer (byte-size suffixes, Go durations, field-number addressing, snippet diagnostics). These need either a YAML→serde_json::Value bridge with a normalization pass, or runtime reflection (Add linked descriptor types for runtime reflection #9) for type-directed parsing. Realistically a 600–1000 LOC effort. File only if there's actual demand for 1Ki-style literals.
accept_yaml_float_literals parse option in buffa core, gated behind JsonParseOptions. Sketch is ready; only needed if the carrier changes to one that string-canonicalizes float specials (see serde-saphyr notes above).
musli rearchitecture. A unified mode-parameterized text-format framework would solve the lenience problem at the type level instead of with ambient state, but it's a buffa 2.0-scale project. Tracked separately in Discussion: musli as buffa's text-format framework #100.
Tracking issue for adding YAML support to buffa, equivalent to what
bufbuild/protoyaml-goprovides for Go: marshal and unmarshal protobuf messages as YAML, fully compatible with the protobuf-canonical JSON mapping. Useful for hand-authored config files, test fixtures, and anywhere YAML is more readable than JSON.How protoyaml-go is built (and why buffa is structurally different)
protoyaml-gois asymmetric: encode delegates toprotojson(proto → JSON bytes → re-emit as YAML), decode is a hand-rolled*yaml.Nodetree walker driven byprotoreflect. The decode side is hand-rolled for two reasons: rich diagnostics (file:line:col+ source line +^pointer), and lenience extensions beyond protojson (hex/octal/binary int literals, byte-size suffixes like1Ki, Go-style durations like1h30m, field-number addressing).buffa's situation is different. buffa's JSON support is serde-derive-based — the generated
Serialize/Deserializeimpls andbuffa::json_helperswith-modules encode protobuf-JSON semantics through serde's carrier-agnostic data model. They never call format-specific APIs except usingserde_json::Valueas an opaque scratch type (which any carrier can populate, sinceValue: Deserialize). This means the equivalent of protoyaml-go's encode and decode paths already exist if the carrier is swapped fromserde_jsonto a YAML serde crate:What you would not get from this is the diagnostic snippets and the lenience extensions. Those are explicitly out of scope for Phase 1.
Design:
buffa-yamlextension crateA new published workspace member rather than a feature flag on
buffacore:serde_yamldeprecated 2024-03,serde_ymlis RUSTSEC-2025-0068 unsound + unmaintained,serde_norway/serde_yaml_ng/serde-saphyrare the survivors). If the carrier needs to change, that's abuffa-yaml0.x.y bump, not abuffasemver event.buffacore's dependency surface minimal. YAML carriers arestd-only; buffa core isno_std-capable.jsonandtextfeatures inbuffacore gate runtime infrastructure. YAML doesn't need new runtime infrastructure — it reuses the JSON serde impls. The thing it adds is a carrier dependency and convenience wrappers.Carrier choice:
serde_norwayEmpirically validated by round-tripping
buffa-test's generated types (json_types,edge_casesnon-string maps,ext_jsonextension-range messages) plus WKTs through both candidates:serde_norway(0.9.42, maintained fork ofserde_yaml, wrapsunsafe-libyaml-norway): all round-trips clean, zerobuffachanges required. Inherits dtolnay's restricted scalar resolution, soname: noarrives as a string (no Norway problem). Float specials (Infinity,.nan) are delivered asvisit_f64, which buffa'sfloat/doublehelpers already accept.!!binaryis passed through as the raw text; buffa's single base64 decode produces the right answer.serde-saphyr(0.0.26, pure Rust, span diagnostics,forbid(unsafe_code), anti-DoS budgets): three conflicts found, all addressable but each needs work — (a) it canonicalizes float specials to.nan/.infstrings indeserialize_any(designed soserde_json::Valueround-trips them), which buffa'svisit_strrejects; (b) it pre-decodes!!binaryso buffa'svisit_strdecodes again (double-decode), needsignore_binary_tag_for_string: true; (c) defaults to YAML 1.1 lenient bools (Norway problem), needsstrict_booleans: true. Also0.0.xwith theOptionsAPI mid-deprecation.serde_norwayis the safe Phase 1 default.serde-saphyris the more interesting long-term carrier — its diagnostics and security posture align with what makes protoyaml-go valuable — but switching to it would need (a) a gatedJsonParseOptions::accept_yaml_float_literalsoption inbuffacore (so JSON parsing stays strictly conformant;JsonParseOptionsis already#[non_exhaustive]), and (b) theOptionsAPI to settle. Track as a future revisit.A note on
#[serde(flatten)]: this was a concern (saphyr documents flatten limitations), but it turns out to be a non-issue. buffa's codegen never uses serde's flatten machinery on the deserialize side — every message that has a flattened oneof or extension wrapper also gets a hand-writtenvisit_mapDeserialize impl. The derived#[derive(Serialize)]does use#[serde(flatten)], but serialize-side flatten is justFlatMapSerializercollecting keys into the parent map, which every carrier supports. Confirmed in the spike.API surface (Phase 1)
Free functions mirroring
serde_json/serde_norwayconventions, with aMessagebound for discoverability and future-proofing:Errorwraps the carrier's error type and exposes itsLocation(line/col) so callers can render diagnostics.Layout:
Documented behavioral deltas vs protoyaml-go
buffa-yamlPhase 1 targets "protobuf-JSON semantics on a YAML carrier," not full protoyaml-go parity. Differences to document up front:buffa-yamlPhase 1#[serde(rename, alias)])0x1F)13: true)True/TRUE)true/false)1Ki,2Gi)1h30m)"1.5s"onlyLocation)^pointer + snippet renderingprotovalidate)The "free wins" from the carrier (hex literals, RFC3339, multi-form aliases,
.inf-as-float) get us partway to protoyaml-go's lenience without writing any code.Test plan
A
#[cfg(test)]module drivingbuffa-test's generated types through round-trips:Timestamp,Duration,Any,Struct/Value/ListValue,FieldMask,Empty, all wrappers.int64as quoted string round-trip (>2^53 precision check),doubleNaN/Inf as string token, base64 bytes, multi-line strings.#[serde(flatten)]).with_json_parse_optionsthread-local path — works carrier-agnostically since it routes throughserde_json::Value)..yamlfixture exercising YAML-specific scalar resolution surprises (yes/0x1F/~/<<:merge key/!!binary) and asserting the actual carrier behavior, to lock it in as a documented baseline. This is the test most likely to catch a carrier-version regression.Out of scope (potential follow-ups)
serde_json::Valuebridge with a normalization pass, or runtime reflection (Add linked descriptor types for runtime reflection #9) for type-directed parsing. Realistically a 600–1000 LOC effort. File only if there's actual demand for1Ki-style literals.accept_yaml_float_literalsparse option inbuffacore, gated behindJsonParseOptions. Sketch is ready; only needed if the carrier changes to one that string-canonicalizes float specials (seeserde-saphyrnotes above).Viewtypes lackserde::Serialize).References
bufbuild/protoyaml-go— the prior art; encode delegates to protojson, decode is a hand-rolled*yaml.Nodewalker for diagnostics + lenience.serde_norway— chosen carrier.serde-saphyr— future carrier candidate; rich diagnostics,forbid(unsafe_code), anti-DoS budgets, but0.0.xand needs theJsonParseOptionsoption to round-trip float specials.serde_ymlmust not be the carrier.serde::Serialize(blocks view encoding).