Skip to content

feat: opt-in non-Candid wire format via encoder / decoder parenthetical#5996

Draft
ggreif wants to merge 50 commits intomasterfrom
gabor/encoder
Draft

feat: opt-in non-Candid wire format via encoder / decoder parenthetical#5996
ggreif wants to merge 50 commits intomasterfrom
gabor/encoder

Conversation

@ggreif
Copy link
Copy Markdown
Contributor

@ggreif ggreif commented Apr 9, 2026

Summary

Per-public-method, opt-in escape hatch from Candid in both directions:

persistent actor {
  (with encoder = <T -> Blob> ;
        decoder = <Blob -> A>)
  public func go(_ : A) : async T = …
}
  • encoder : T -> Blob replaces Candid serialization on reply.
  • decoder : Blob -> A replaces Candid deserialization on ingress.

Both fields are independently optional; either, both, or neither can appear, and unrecognised fields warn (M0212). The strategic driver is direct OpenAPI/Web2 interfacing — see .claude/plans/non-candid.md for the design memo, including the actor-level inheritance idea and the motivation. (The OpenAPI integration is its own follow-on.)

Implementation

Frontend (src/mo_frontend/typing.ml)

  • check_vis_parenthetical does per-field bidirectional checking against a known-fields table [("encoder", T -> Blob); ("decoder", Blob -> A)]. Encoder type is driven by the method's return type, decoder by its ingress type. Direction (method → codec) is documented in a comment alongside the alternative inverse direction (codec types driving the method signature) for future exploration.
  • M0214 type-mismatch (per-field), M0215 effect-free check named per field, M0211 redundant-empty parenthetical, M0212 unrecognised attribute.

IR (src/ir_def/ir.ml)

  • FuncE's trailing exp option (* encoder *) replaced by a labeled record codecs = { encoder : exp option; decoder : exp option }. no_codecs helper for the common no-annotation case. Future codec-shaped fields (e.g. inbound-cycles caps, schema pins) land additively.
  • check_ir independently type-checks both fields.

IR passes

  • rename, subst_var, freevars, await, async, const, tailcall, erase_typ_field, eq, show — pass-through, retyped from exp option to codecs.
  • arrange_ir.ml prints both fields side by side.

Desugaring (src/lowering/desugar.ml)

  • find_codec_in_par factored out from the existing find_encoder_in_par.
  • New find_decoder_in_par mirror.
  • build_codecs (replacing build_encoders) returns (enc_opt, dec_opt) list.
  • build_actor stamps both fields onto each public method's FuncE.codecs, preserving any pre-existing per-field override.

Async lowering (src/ir_passes/async.ml)

  • The CPS transform consults codecs.encoder when synthesising the reply continuation, lifting it into ICReplyPrim(ts, Some enc). Decoder unaffected here — it lives only on FuncE.

Codegen (src/codegen/compile_classical.ml & compile_enhanced.ml)

  • Encoder (existing): ICReplyPrim branches on enc_optSome → call closure, IC.reply_with_data; NoneSerialization.serialize.
  • Decoder (new): FuncDec.{lit, closed, compile_const_message} gain a ?(decoder=None) thunk parameter (E.t -> VarEnv.t -> G.t) option. The thunk is constructed at the FuncE call site (where compile_exp_vanilla is in scope) wrapping the decoder expression. Inside compile_const_message, branch at the argument-decoding step: NoneSerialization.deserialize; Some compile_deccompile_dec env ae0 ; closure-call on raw IC.arg_data.

AST interpreter (src/mo_interpreter/interpret.ml)

  • Codec parentheticals are intentionally not visited (the AST interpreter doesn't model Candid serialization). One-line comment surfaces this for reviewers.

Tests

Run-drun:

  • parenthetical-public.mo — encoder, returns (), all phases.
  • parenthetical-decoder.mo — full pipeline. Method's ingress is ?Nat; decoder is the flow Blob -> ?Text -> ?Nat composed via decodeUtf8 and Nat.fromText. The //CALL payload is the raw three ASCII bytes "123" (0x313233) — not a Candid envelope. With the decoder active that blob deserialises as ?123, and the reply is Candid Nat 123 (0x4449444c00017d7b). Without the decoder Candid would reject 0x313233 as malformed input — so the green test is end-to-end proof that ingress Candid is bypassed.

Fail (matched pairs, encoder ↔ decoder):

  • parenthetical-{encoder,decoder}-effect.mo — M0215 effect-free check fires.
  • parenthetical-{encoder,decoder}-mismatch.mo — M0095 (finer than field-level M0214) on a wrong codec signature.

Test plan

  • make -C test/run-drun parenthetical-public.only passes
  • make -C test/run-drun parenthetical-decoder.only passes
  • make -C test/fail includes the four new fail tests with stable output

🤖 Generated with Claude Code

@ggreif ggreif changed the title feat: parenthetical encoder annotation for public actor methods feat: parenthetical encoder annotation for public actor methods Apr 9, 2026
@ggreif ggreif self-assigned this Apr 9, 2026
@ggreif ggreif added the feature New feature or request label Apr 9, 2026
ggreif and others added 6 commits April 17, 2026 12:52
Adds syntax `(with encoder = <func>) public func name() : async T = ...`
to annotate the serialization encoder for an actor method's return value.

- `vis'` type gains `exp option` (the annotation) alongside the deprecation string;
  moved into the `exp` mutual recursion group to resolve the forward-reference
- Parser: `vis → parenthetical PUBLIC` with `%prec VIS_NO_PAREN` to resolve
  the 3 shift/reduce conflicts from the nullable vis prefix; `--strict` preserved
- Type checker: `check_vis_parenthetical` validates the encoder in checking mode
  against the expected type `ret_typ → Blob` (async peeled via `Promises`/`ts2`);
  non-method placements warn with M0212; effects forbidden (M0215)
- Test: `test/run-drun/parenthetical-public.mo`

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…freedom

IR representation:
- `FuncE` gains an 8th `exp option` field for the encoder (None everywhere
  except public actor methods with a `(with encoder = …)` annotation)
- All IR passes, rename, subst_var, arrange, freevars, interpreter, and both
  codegens updated: pass-through or wildcard `_enc` as appropriate

Desugaring:
- `build_encoders` mirrors `build_stabs` to correctly pair each IR dec with
  its optional encoder across IncludeD/TypD expansion
- In `build_actor` the encoder expression is extracted from the vis
  parenthetical's `encoder` field and injected into the IR `FuncE` slot

Checking:
- IR type-checker (`check_ir.ml`): verifies encoder type is `ret_typ → Blob`
  and effect is `T.Triv`
- Source type-checker (`typing.ml`): `check_vis_parenthetical` additionally
  guards `note_eff = T.Triv`, emitting M0215 if the annotation has effects

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Refactors `ICReplyPrim` to carry an `exp option` encoder slot instead
of using a mutable `reply_encoder` field on the compilation environment.
The encoder expression (user-supplied `T -> Blob` function) is injected
by `desugar.ml`'s `build_actor` into the `FuncE` IR node, propagated
through all IR passes (`erase_typ_field`, `show`, `eq`, `await`, `async`),
and consumed by `async.ml`'s CPS transform to build the reply continuation
`k` as `ICReplyPrim (ts, Some enc')` instead of the default Candid path.

Both classical and enhanced backends call the encoder closure and send
the raw blob bytes via `IC.reply_with_data`, bypassing Candid serialization.
`arrange_ir` and `check_ir` are updated to print/validate the encoder.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ggreif ggreif mentioned this pull request Apr 23, 2026
29 tasks
ggreif and others added 8 commits April 25, 2026 12:55
- `check_vis_parenthetical` rewritten to do per-field bidirectional
  checking against a known-fields table. `encoder` keeps its type
  `ret_typ -> Blob`; `decode` is added with the symmetric type
  `Blob -> arg_typ` (the method's ingress type). No desugaring yet —
  the field is plumbed through typing only, future work will tighten
  semantics or invert the direction (codec types driving method sig).
- Effect-free check is now per-field and names the offending label
  in M0215, replacing the encoder-only message.
- Positive test (`parenthetical-decode.mo`): a `Blob -> ?Nat` flow
  pipeline composing `decodeUtf8` (Blob -> ?Text) with `Nat.fromText`
  (Text -> ?Nat) via `do ?` ; method returns Nat over standard Candid.
- Fail tests (`parenthetical-{encoder,decode}-effect.mo`): a parenthetical
  field with embedded async block triggers M0215.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rejects a `Blob -> ?Text` decoder when the method's ingress is `?Nat`
— bidirectional check pushes the expected return into the FuncE body
and catches the mismatch at M0095 (finer than the field-level M0214).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror of parenthetical-decode-mismatch on the encoder side: a
`Nat -> Text` encoder for a method returning `Nat` is rejected at
M0095, pointed precisely at the `"oops"` literal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The AST interpreter never visits `encoder`/`decode` payloads on
`Public(_, Some par)`: it models the high-level `[run]` semantics
where Candid serialization isn't modelled, so any wire-byte transform
is moot. Comment surfaces this so reviewers don't wonder why the
parenthetical is invisible here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the ad-hoc \`exp option (* encoder *)\` trailing field on
\`FuncE\` with a labeled record \`{ encoder; decoder }\`. The encoder
behaviour is unchanged; \`decoder\` is always \`None\` until desugaring
lands. Pure rename/repackage — no behaviour change, all tests pass.

Threading sites updated:
- ir.ml: type def + record + doc note (incl. actor-level inheritance
  TODO)
- construct.ml: \`no_codecs\` helper, three FuncE constructors
- check_ir.ml: type-check the decoder field on the same footing as
  encoder (Blob -> seq ts1, effect-free)
- arrange_ir.ml: print decoder alongside encoder
- desugar.ml: build_actor still installs encoder; decoder stays None
- async.ml, await.ml, erase_typ_field.ml: thread codecs through
- rename/subst_var/tailcall/freevars/const/eq/show/interpret_ir/
  compile_classical/compile_enhanced: pass-through pattern bindings
  silently retyped from \`exp option\` to \`codecs\`, no source change

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the encoder/decoder parenthetical work — current state,
pending desugaring + codegen, actor-level inheritance idea, OpenAPI/
Web2 motivation as the strategic driver. Replaces an in-code TODO in
ir.ml with a pointer to the plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Desugaring (\`desugar.ml\`):
- Factor \`find_codec_in_par lab\` from the existing \`find_encoder_in_par\`.
- New \`find_decode_in_par\` mirror.
- \`build_codecs\` (replacing \`build_encoders\`) returns \`(enc_opt,
  dec_opt) list\`, one per dec-field.
- \`build_actor\` now stamps both fields onto the FuncE's \`codecs\`
  record, preserving any pre-existing per-field override.

Codegen (\`compile_classical.ml\` & \`compile_enhanced.ml\`):
- New \`?(decoder=None)\` optional argument threaded through
  \`FuncDec.{lit,closed,compile_const_message}\`. Type:
  \`(E.t -> VarEnv.t -> G.t) option\` — a thunk that compiles the
  decoder \`exp\` with the inner env/ae of the message handler.
  Caller (in \`compile_exp\` / \`compile_const_exp\`) wraps
  \`compile_exp_vanilla\` so the closure is generated where the
  function is actually in scope.
- Inside \`compile_const_message\`, branch on \`decoder\` at the
  argument-decoding step: \`None\` keeps \`Serialization.deserialize\`;
  \`Some compile_dec\` emits \`compile_dec; closure-call\` on the raw
  \`IC.arg_data\` instead.

Test (\`parenthetical-decode.mo\` & \`.ok\` regen):
- Method ingress is \`?Nat\`; decoder is \`Blob -> ?Nat\` built as
  \`do ? { Nat.fromText((decodeUtf8 b)!)! }\`.
- //CALL payload is the raw three ASCII bytes \"123\" (0x313233) —
  *not* a Candid envelope. The decoder turns it into \`?123\`; the
  method echoes \`123\` back via standard Candid (\`0x...017d7b\`).
- Demonstrates that ingress Candid is genuinely bypassed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ggreif ggreif changed the title feat: parenthetical encoder annotation for public actor methods feat: opt-in non-Candid wire format via encoder / decode parenthetical Apr 25, 2026
ggreif and others added 3 commits April 25, 2026 14:51
The codec-annotation commit (f9b629c) widened \`Public\`'s payload
from \`None\` to \`(None, None)\`, taking \`extract.ml:84\` over the
line-length limit. Splitting the trailing \`@@ d.at)\` onto its own
line restores ocamlformat conformance and unblocks CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Methods that opt out of Candid via \`encoder\`/\`decode\` shouldn't
appear in the Candid interface surfaces:

- the \`__get_candid_interface_tmpl_v1\` canister metadata blob the
  IC serves to Candid-aware tooling, and
- the \`.did\` file produced by \`moc --idl\`.

Candid-only clients (other canisters, \`dfx call\`, \`didc\`) would
otherwise Candid-encode arguments and Candid-decode replies that
the canister never produces. Cleanest first cut is to suppress the
whole method when either codec is set; partial entries (e.g.
decoder-only methods keeping their Candid-shaped reply in the
dictionary) can come later.

The \`Type.t\` for an actor is currently codec-blind, so we'll
either need a side-table mapping method-name → codec-presence, or
to filter at the IR level before type extraction. Side-table keeps
\`Type.t\` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The two pending items \"decoder desugaring\" and \"codegen hook\" both
landed in 07b02b8. Section heading updated, content rewritten to
describe what shipped (incl. the thunk parameter through FuncDec
and the \`0x313233 -> Nat 123\` end-to-end test). Removed the two
items from \"Pending work\" and renumbered the rest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ggreif ggreif changed the title feat: opt-in non-Candid wire format via encoder / decode parenthetical experiment: opt-in non-Candid wire format via encoder / decode parenthetical Apr 25, 2026
Symmetry with \`encoder\`. Touch list:
- typing.ml: known-fields table flips the label string.
- desugar.ml: \`find_decode_in_par\` → \`find_decoder_in_par\`,
  \`find_codec_in_par "decoder"\` lookup.
- Three test files renamed (\`parenthetical-decode*.mo\` →
  \`-decoder*\`); .ok files regenerated via \`accept\` to pick up the
  new field name in the rendered diagnostics and the new test-name
  prefix in the location prefix.
- Plan doc rewritten throughout to use \`decoder\`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ggreif ggreif changed the title experiment: opt-in non-Candid wire format via encoder / decode parenthetical feat: opt-in non-Candid wire format via encoder / decoder parenthetical Apr 25, 2026
ggreif and others added 4 commits April 26, 2026 15:38
…d-trip

First step toward a benchmark that decodes a real Apple Event Object
Specifier (a non-Candid wire format) on ingress, identity-passes it
through the actor body, and re-encodes it on egress — counting cycles
and heap delta on each codec leg.

This commit lays the foundation:

- Adds the `query.md` design plan (AEOM-inspired heap querying) to
  the tracked plans, since the bench architecture follows it.
- Scaffolds `test/bench/object-spec.mo` with the full type surface
  from that plan — `ObjectSpec`, `KeyForm`, `BoolExpr`, `Comparison`,
  `CandidValue` — and a builder for the running example query
  ("every client's yearly income whose country is Germany and age
  between 45 and 55 years").
- Codec stubs (`encode`, `decode`) plus a harness that already wires
  up `payload_bytes` / `decode_{heap,cycles}` / `encode_{heap,cycles}`
  reporting, so the schema is stable as the codec fills in.

The roadmap (sketched in the file's header comment): AE binary samples
generated via macOS `osarun`/`osacompile` → Motoko AE decoder → matching
encoder → wired into a `(with decoder = …; encoder = …)` parenthetical
on a public actor method whose body is the identity. Each step is a
separate commit and the harness already reports the right keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes from review:

1. Rename `encode` → `encoder`, `decode` → `decoder` so the function
   names match the parenthetical-field names they'll eventually be
   wired into (`(with encoder = …; decoder = …)`).
2. Move the cycle/heap measurement *inside* `encoder` and `decoder`
   themselves, parameterised by a `stage` label. Each call now
   self-reports — including the previously-untimed pre-encode that
   builds the wire fixture. When v4 lands a real encoder, all three
   legs (pre / ingress / egress) will yield separate cost lines under
   `encoder/pre`, `decoder/ingress`, `encoder/egress`.

`go` now does no measurement of its own; it just sequences the three
calls. Output schema gains a `stage` key so a parser can attribute
costs unambiguously.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an `appscript-src` flake input pointing at hhas/appscript on GitHub
and a `nix/ae-encoder.nix` derivation that builds the Python `appscript`
package from its `py-appscript/` subdirectory and exposes a small CLI
script via `pkgs.writeShellApplication`.

The intent is to provide a reproducible AE compact-binary fixture
generator for `test/bench/object-spec.mo` without checking any Python
code into the motoko source tree — the harness body lives as a string
literal in `nix/ae-encoder.nix`, not as a file in the repo.

The derivation is darwin-only: `appscript` links against
`AEvent.framework` and uses PyObjC, neither of which has working
Linux builds. `flake.nix` exposes `packages.<system>.ae-encoder` only
on `aarch64-darwin` / `x86_64-darwin` via `lib.optionalAttrs`; `nix
flake show` on Linux silently omits the attribute.

Build fix-ups for upstream's setuptools incompatibility:
- `lib/appscript/__init__.py` declares `__version__ = 'dev'` which
  modern setuptools (PEP 440) rejects; `postPatch` substitutes a
  conventional placeholder version `1.3.0`.
- `doCheck = false` since upstream ships no usable Python test suite.

v1 harness is a smoke-test only — `nix run .#ae-encoder` confirms the
appscript/aem stack imports cleanly. Subsequent commits grow the
harness into a real fixture generator that takes a query name and
prints AE-binary hex for the bench to embed as `Blob` literals.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the v1 smoke-test stub with a real fixture generator. The
harness now constructs the German-clients query

  every client's yearly income whose country is Germany and 45 <= age <= 55

via aem's reference builder (`app.elements('clnt').byfilter(...)`),
packs it through `AEM_packself`, and prints the flattened compact-
binary form as `<name>=<hex>` on stdout. The hex output matches the
spec verbatim — `obj `, `want`, `clnt`, `cmpd`, `logi`, `AND `,
`>=  `, `Germany` (UTF-16), all visible in the bytes.

Catalogue is a single entry today (`german_midlife_client_income`);
new queries register by adding a builder function and a `QUERIES`
entry — no other surface area to touch.

Subsequent commits will:
- pipe the hex output into the bench's `Blob` literals (manually for
  now; possibly via a generated `.mo` file later);
- write the Motoko AE decoder so the bench actually times the
  ingress codec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ggreif and others added 28 commits April 26, 2026 16:32
`check_vis_parenthetical` was being invoked with the *outer* env at
[infer_obj][src/mo_frontend/typing.ml#L3805], not with the env
enriched by the actor's `scope`. As a result, a parenthetical like

  (with encoder; decoder)
  public func go(spec : ObjectSpec) : async ObjectSpec { spec }

failed with `M0057 unbound variable encoder` whenever `encoder` and
`decoder` were sibling actor-field bindings rather than module-level
or imported names — even though the bindings are in scope everywhere
else inside the actor body.

Fix: build a single `par_env = adjoin_vals env scope.val_env` once
before the iteration and pass it to every `check_vis_parenthetical`
call. The full actor scope is now visible to parenthetical typing,
which matches the user-facing scoping intuition (other actor-fields
*are* in scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`go(spec : ObjectSpec) : async ObjectSpec { spec }` is now decorated
with the punning parenthetical `(with encoder; decoder)`, pairing the
sibling actor-fields with the framework's ingress/egress hooks. The
body is the identity, so any cycles/heap reported come purely from the
codec round-trip.

The //CALL hex below is the AE compact-binary form of the German-
clients query, produced by `nix run .#ae-encoder` (see
`nix/ae-encoder.nix`). Until v3 lands real codec bodies the encoder
still returns "" and the decoder returns `#root` — the wiring is
end-to-end correct, the numbers are just trivial today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops walkthrough-style commentary (parenthetical-pun explainer,
roadmap, decorative banners) — readers are Motoko experts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decoder/encoder reports both fire end-to-end (cycles=279 each, stub
bodies); reply is empty because encoder returns "". Numbers become
meaningful once the AE codec bodies land.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Forward-only Iter<Nat8> reader for AE binary parsing — `take n` returns
?Blob (null on short read, no zero-fill), `readU32BE` reads big-endian u32.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
parseDescBody dispatches 'obj '/'null'; parseObjBody walks the 4
record fields and recurses on 'from'. Errors trap via prim. parseValue
scaffolded for utxt/long/enum/null. 117k cycles / 2.8k heap-words on
the 638-byte fixture; want/form/seld bodies still consumed-and-discarded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decoder: SELD now interprets the 4cc body as a property name when
form=='prop' (test form remains a TODO; logi predicate body is skipped).
Encoder: Writer class with pre-computed length, encDescLen, writeDesc,
writeObjBody — emits the full 'dle2' envelope + recursive obj/null tree.
35k cycles / 1.7k heap-words to encode a 152-byte wire on the partially-
decoded fixture; 'inco' property round-trips, predicate is lost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decoder: parseBoolExpr / parseValue / parseLogiBody / parseCmpdBody +
'exmn' (typeObjectBeingExamined) collapsed to #root. SELD interprets
the 'logi' predicate when form=='test'; 'cmpd' obj1/relo/obj2 yield
#compare with prop/op/value extracted from the obj1 ObjectSpec, the
relo enum, and the obj2 literal (utxt/long/enum/null).

Encoder still stubs #test as 'prop' form + 4-zero seld (writeBoolExpr
TODO). go() now decodes the encoded wire and prints the round-trip,
making the encoder loss visible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
writeBoolExpr / writeValue / writeLogiHeader emit 'logi'/'cmpd'/'utxt'/
'long'/'enum'/'null' descriptors; valueDescLen and boolExprDescLen feed
the pre-allocated Writer. encDescLen is now seld-aware so test-form
objs size correctly. Round-trip print confirms decoded == roundtrip
(structurally; original 'exmn' iterand collapses to 'null' since #it
isn't modeled). 638-byte wire in / 638-byte wire out.

counters() now returns (Int, Nat64), matching iter.mo / alloc.mo /
heap-32.mo. textToUtf16 carries an ASCII-assumption note pointing at
surrogate-pair encoding for non-BMP if extended later.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
100-element flat client array (deterministic Array_tabulate) plus a
hand-coded countMatchers helper. With 60% Germany / 50% age-in-range,
joint hit rate is 31/100 — within the planned 30% band for the
running query (country=="Germany" AND 45<=age<=55).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smurf = typed accessor for one Client property, indexed by the AE 4cc
name the decoder produces ("cntr"/"age "/"inco"). lookupSmurf maps the
4cc → Client field reader; cmp does typed comparison over CandidValue;
evalBoolExpr recursively evaluates the BoolExpr tree.

extractPredicate digs the #test out of the running query shape; go now
runs countMatchersDecoded against the 100-client DB and reports its
own cycles/heap. Decoded-predicate matchers (31) match the hand-coded
countMatchers (31), proving the predicate round-trips through the wire
into a faithful boolean.

Limitation: Smurfs are monomorphic. Future work — when Client gets a
nested Address subobject (or any non-leaf field) — will need an
existential "this is your container, go fishing" Smurf shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
runQuery walks the running shape (#obj prop → #obj clnt #test → #root)
and emits one CandidValue per matching client (the requested property).
For the German-midlife-income query against the 100-client mock DB,
that's 31 yearly incomes from $51k to $146k.

Two-pass resolution since `mo:⛔` has no Buffer: count matchers, allocate
[var CandidValue], fill, freeze via Array_tabulate. 369k cycles / 13k
heap-words for the full query. Smurf type carries a TODO marking the
zipper-like evolution needed once nested entities (Address, etc.) appear.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Frees the name 'Smurf' for the upcoming existential-via-Candid protocol
(generic, blob-keyed, polymorphic). The current monomorphic accessor
becomes PropReader — the typed-fast-path used by the running query
while the protocol lands. Behaviour unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smurf, Accessor, LookupKey land as type definitions. Smurf carries the
existential boundary (blob: Blob; methods close over T via from_candid<T>);
Accessor is the navigation hook (form, fourcc, lookUp). Mutual recursion
typechecks. No implementations or wiring yet — those come in subsequent
commits as constructors and concrete accessors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three protocol-level cleanups, all type-only:

- blob is now a thunk `() -> Blob`. VarAccessor-class Smurfs can return
  "" when no consumer pulls; eager Candid encode is paid only at
  boundaries that need it.
- primaryKey field gone — it was an implementation detail of toDesc.
  Each Smurf constructor closes over the primary-key logic locally and
  bakes it into toDesc directly.
- Accessor.lookUp now takes the whole parent Smurf, not just a Blob.
  The child closes over parent.toDesc() (the zipper edge) and pulls
  parent.blob() lazily when from_candid<P> needs it.

No instances yet; constructors and concrete accessors land separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The AE-404 of the Smurf protocol: every accessor returns this when
lookup misses. blob is "", accessors empty, enumerate immediately
exhausted, readField returns null, isNotFound=true so the encoder can
special-case (eventual: emit errAENoSuchObject = -1728 envelope).

Underscore-prefixed since no accessor instances exist yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Terminal leaf in the Smurf protocol. Reads `fieldName` from `parent` at
construction (via parent.readField), stores the CandidValue, and
discards the parent reference. The resulting Smurf has classFourcc=""
(no class), no accessors, no enumeration; readField returns the stored
value regardless of name; toDesc is a placeholder (#root) since AE
ObjSpecs are references — the encoder treats leaf values via classFourcc=""
and routes through the value rather than the spec.

Underscore-prefixed since no instances yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Typed escape hatch for stable-var-backed accessors. Captures `stab : [T]`
at construction and ignores `parent.blob()` — no Candid round-trip on
input. `wrap : T -> Smurf` lifts each typed element back into the
existential protocol; for #indexed keys it picks `stab[n-1]` (1-based,
AppleScript convention) and applies wrap. Negative or out-of-bounds
positions return _notFoundSmurf. #named and #test still TODO.

Class instance is structurally an Accessor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_clientSmurf : Client -> Smurf wraps a Client as an existential Smurf.
readField maps wire 4ccs ("cntr"/"age "/"inco") to typed fields; blob
lazily Candid-encodes via to_candid; classFourcc is "clnt"; accessors
empty for now (per-property accessors land separately). toDesc still a
placeholder (#root) until VarAccessor threads parent through.

_actorSmurf is the canister root: classFourcc="", accessors hosts a
single _VarAccessor<Client>(clients, "clnt", #indexed, _clientSmurf).
The class instance fits [Accessor] by structural subtyping. toDesc=#root.

Both underscore-prefixed; runQuery still uses the typed-fast-path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Public method that exercises the existential protocol: looks up
_actorSmurf.accessors[0] (the clnt VarAccessor), calls lookUp(parent,
#indexed 1), and surfaces a few fields of the resulting clientSmurf
via readField. Output proves the chain: VarAccessor takes clients[0],
clientSmurf wraps, readField bridges 4ccs to typed fields. Got the
expected (Germany, 35, 50000) for the first generated client.

Demonstrates that class instances satisfy the Accessor type via
structural subtyping and that readField is the existential boundary
the predicate evaluator will use.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Client gains a Text `name` (the natural primary key for stable
references via #name). Names dispatch by country: German clients draw
from {Hans/Anna/Otto/Maria/Karl/Helga} × {Müller/Schmidt/Weber/Fischer/
Bauer/Hoffmann}, French from {Jean/Marie/Pierre/Anne/Michel/Claire} ×
{Martin/Bernard/Dubois/Petit/Moreau/Leroy}. First client at index 0
(German) is "Hans Müller".

readField on _clientSmurf accepts "name" → #text c.name; tiny1 surfaces
it in its debug print.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three coupled changes:

- _VarAccessor's wrap signature is now (T, Smurf) -> Smurf. lookUp
  threads `parent` through to the wrap callback so the resulting child
  Smurf can close over `parent.toDesc()` (the zipper edge).
- _clientSmurf accepts (Client, Smurf). Its toDesc returns
  #obj { class_="clnt"; container=parent.toDesc(); key=#name (c.name) },
  the AppleScript-equivalent of `client "<name>" of <root>`.
- tiny1 now returns ObjectSpec via (with encoder) and emits
  `s.toDesc()`. Encoder side gains #name keyform support
  (NAME = 'name', seld='utxt' with the BE UTF-16 of c.name).

The 104-byte reply is `'dle2' + 'obj ' clnt … name "Hans Müller" … 'null'`.
ü leaks via the ASCII-assumption bug in textToUtf16 — deterministic but
malformed; surrogate-pair handling stays a TODO.

Also pun spec=spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled tweaks to _VarAccessor.lookUp:

- Negative indices count from the end (AE/AppleScript convention):
  -1 is last, -size is first; out of range → _notFoundSmurf. Math is
  local since the accessor knows stab.size().
- Dispatch on (form_, key) tuple so the indexed branch fires only
  when the accessor was declared with form_ = #indexed. A #named-
  declared VarAccessor receiving a #indexed key returns notFound.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tiny1(i : Int) drives the clnt accessor with #indexed i, returning the
resulting clientSmurf's stable reference (or the AE-404 spec when out
of range).

Two //CALL lines exercise the new convention: tiny1(1) returns
"Hans Müller", tiny1(-1) returns "Anne Moreau" — the 100th client
via negative-from-end addressing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 6×6 pools collided every i mod 36 — 0 and 36 both mapped to
"Hans Müller" (both inside the German subset). Two coupled fixes:

- Expand each name pool to 10 entries: with firstIdx=i%10 and
  lastIdx=(i/10)%10, all i in [0, 99] yield a unique (firstIdx, lastIdx)
  pair. Adds Klaus/Ingrid/Werner/Ursula + Schulz/Wagner/Becker/Koch
  (German) and Henri/Sophie/Paul/Camille + Roux/Vincent/Fournier/Girard
  (French).
- Add an init-time assertion (`do { for-for }`, O(n²)): for each client,
  count occurrences of its name across the whole array and trap if it
  isn't exactly 1. Caught at canister_init so a future tweak that
  reintroduces collisions screams loudly.

tiny1(-1) now resolves to "Camille Girard" (was "Anne Moreau" with the
old pools).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Demonstrates tiny1(100) and tiny1(-1) resolve to the same client
("Camille Girard") via the two branches of the indexed math:
positive 100 → n=100, negative -1 → n=size-1+1=100. Same array
index 99, identical 108-byte reply, identical cycle count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Snapshot of the AEOM-inspired bench against the original roadmap as of
2026-04-27. What's shipped (AE codec, Smurf protocol skeleton, _VarAccessor
typed-fast-path, _clientSmurf with parent-zipper toDesc, _actorSmurf,
tiny1 public method, mock DB with uniqueness assertion). What's left
(full resolver wiring through the existential boundary, #named/#test
forms, Accessors.mo codegen library, certified-data integration,
HTTP/JSON endpoint, RTS hooks). Plus the protocol-level gaps
(#named lookup, #test as filter on Smurf, leaf accessors on
_clientSmurf, 'exmn' ↔ #it, real UTF-16 BE, #ne desugar).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three coupled additions to drive a two-step navigation through the
existential protocol:

- _ValueSmurf now closes over parent.toDesc() (not the parent's data),
  so its toDesc emits `<#property fieldName> of <parentDesc>`. The leaf
  is now a real spec node in the chain, not a placeholder.
- _VarAccessor gains #named handling: scans `stab` for the entry whose
  `getName` matches the lookup key (relies on the init-time uniqueness
  assertion). Constructor takes a getName : T -> Text param (ignored
  for #indexed).
- _actorSmurf hosts two clnt accessors: one #indexed at accessors[0],
  one #named at accessors[1].

tiny2(input : Text) drives accessors[1].lookUp(parent, #named input),
materialises `_ValueSmurf(clientS, "name")`, returns its toDesc.
Demoed with tiny2("Hans Müller") → 172-byte AE wire reply.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant