feat(runtime-expressions): improve ABNF grammar clarity by frankkilcommins · Pull Request #454 · OAI/Arazzo-Specification

frankkilcommins · 2026-03-23T11:21:17Z

Define CHAR to exclude curly braces for unambiguous parsing (Ambiguity in runtime expressions embedded in strings #424)
Add expression-string and embedded-expression grammar (Introduction of ABNF for expressions embedded in strings #425)
Specialize reference types by context (Introduce complete grammar for parsing expressions #426)
Add Source Description expression resolution priority (Ambiguity in $sourceDescription.* runtime expressions #427)
Require explicit component types in $components (Ambiguity in $components runtime expressions #428)
Clarify type conversion in embedded expressions (List or object usage in runtime expressions embedded in strings #437)
Fix invalid examples in spec (conditions, stepId references)
Update examples table with nested inputs and component actions

$components now requires explicit component type (parameters/successActions/failureActions). Generic components.name pattern removed. Note: This was already semantically invalid per spec.

fixes: #424
fixes: #425
fixes: #426
fixes: #428
fixes: #437

resolves: #427

DmitryAnansky · 2026-03-26T10:10:46Z

examples/1.0.0/bnpl-arazzo.yaml

+          "firstName": "{$inputs.customer#/firstName}",
+          "lastName": "{$inputs.customer#/lastName}",
+          "dateOfBirth": "{$inputs.customer#/dateOfBirth}",
+          "postalCode": "{$inputs.customer#/postalCode}"


It would be great to include more complex and diverse examples demonstrating the application of ABNF syntax.

char0n · 2026-04-02T11:41:52Z

src/arazzo.md

+  component-name = identifier
+
+  ; Identifier rule
+  identifier = 1*( ALPHA / DIGIT / "." / "-" / "_" )


The PR's identifier = 1*( ALPHA / DIGIT / "." / "-" / "_" ) is used for all IDs (stepId, workflowId, sourceDescriptionName, component keys, input/output names). But the spec defines two
different patterns:

stepId, workflowId, sourceDescriptionName: SHOULD [A-Za-z0-9_-]+ (no dot)

Components keys: MUST ^[a-zA-Z0-9.-_]+$ (with dot)

A single shared identifier rule conflates these — it allows dots in step/workflow IDs where the spec says they shouldn't be, and it's only SHOULD-level enforcement anyway. Separate rules would
be more faithful to the spec's intent.

char0n · 2026-04-02T11:55:54Z

src/arazzo.md

+  field-name = identifier
+
+  ; Source descriptions expressions
+  source-reference = source-name "." reference-id


source-reference is too restrictive with identifier

The proposed grammar uses:

source-reference = source-name "." reference-id reference-id = identifier

The <reference> part can be an operationId from an OpenAPI description or a workflowId from
an Arazzo document. OpenAPI does not constrain operationId to any specific character set —
it's just a string. This means operationIds like get/pets, get pets, or create-user@v2 are
technically valid in OpenAPI but would be rejected by the identifier rule.

I'd suggest using a less restrictive rule for reference-id — something like 1*CHAR (any
character except { and }) — to avoid rejecting valid OpenAPI operationIds.

char0n · 2026-04-02T12:03:31Z

src/arazzo.md

+      "$statusCode" /
+      "$request." source /
+      "$response." source /
+      "$inputs." input-reference /


Plural for consistency with $steps or $workflows

Suggested change

"$inputs." input-reference /

"$inputs." inputs-reference /

char0n · 2026-04-02T12:03:38Z

src/arazzo.md

+      "$request." source /
+      "$response." source /
+      "$inputs." input-reference /
+      "$outputs." output-reference /


Suggested change

"$outputs." output-reference /

"$outputs." outputs-reference /

char0n · 2026-04-02T13:13:48Z

src/arazzo.md

+  ; JSON Pointer (RFC 6901)
+  json-pointer = *( "/" reference-token )
+  reference-token = *( unescaped / escaped )
+  unescaped = %x00-2E / %x30-7D / %x7F-10FFFF


unescaped in json-pointer still includes { and } — breaks embedded expression parsing

The PR correctly excludes { and } from the CHAR rule for unambiguous embedded
expression parsing, but the unescaped rule in json-pointer still uses %x30-7D, which
includes } (%x7D) and { (%x7B).

This means an embedded expression like {$request.body#/status} or
{$steps.someStepId.outputs.pets#/0/id} cannot be reliably parsed — the json-pointer's
unescaped will consume the closing }, making it impossible to determine where the
expression ends.

The fix is to change unescaped from:
unescaped = %x00-2E / %x30-7D / %x7F-10FFFF
to:
unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF
; %x2F ('/'), %x7E ('~'), %x7B ('{'), %x7D ('}') are excluded

This is a minor deviation from RFC 6901, but { and } in JSON Pointer reference tokens
are extremely rare in practice, and without this fix the expression-string grammar
cannot work correctly for any expression containing a json-pointer.

We validated this in our ABNF parser implementation at
https://github.com/swaggerexpert/arazzo-runtime-expression — after making this change,
all expressions with json-pointers work correctly in both standalone and embedded
contexts.

Restructure the ABNF grammar to use explicit, typed reference rules in the primary grammar instead of relying on secondary grammars with two-pass parsing. This improves grammar clarity and aligns with the proposed spec changes in OAI/Arazzo-Specification#454. Key changes: - Add $self expression support - Add $inputs/$outputs JSON Pointer support (e.g., $inputs.customer#/firstName) - Inline all secondary grammars into the primary grammar - Extract shared identifier and identifier-strict rules - Adapt json-pointer to exclude { and } from unescaped for unambiguous embedded expression parsing, fixing the body expression extract limitation - Require explicit component types (parameters/successActions/failureActions) - Update README with current grammar and examples Resolves: OAI/Arazzo-Specification#424, OAI/Arazzo-Specification#425, OAI/Arazzo-Specification#426, OAI/Arazzo-Specification#428 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

char0n · 2026-04-02T13:29:45Z

src/arazzo.md

+      ; Matches recommended pattern [A-Za-z0-9_\-]+ from spec
+
+  ; Legacy 'name' rule (retained for query/path references)
+  name = *( CHAR )


name isn't legacy. It's the correct rule for query and path references because query parameter names and path parameter names are user-defined and can contain any valid character.

char0n · 2026-04-02T14:18:07Z

Implementation Verification

I implemented the proposed grammar changes in my ABNF parser at swaggerexpert/arazzo-runtime-expression#116 to verify the grammar is correct and parseable. All 152 tests pass. Below are the findings from the implementation.

Issue: `unescaped` in json-pointer still includes `{` and `}`

The CHAR rule correctly excludes { (%x7B) and } (%x7D) for unambiguous embedded expression parsing, but the unescaped rule in json-pointer still uses %x30-7D, which includes both characters.

This means embedded expressions containing JSON pointers — like {$request.body#/status}, {$inputs.customer#/firstName}, or {$steps.foo.outputs.bar#/0/id} — cannot be reliably parsed. The json-pointer's unescaped will consume the closing }, making it impossible to determine where the expression ends.

Suggested fix — change unescaped from:

unescaped = %x00-2E / %x30-7D / %x7F-10FFFF

to:

unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF
    ; %x2F ('/'), %x7E ('~'), %x7B ('{'), %x7D ('}') are excluded

This is a minor deviation from RFC 6901, but { and } in JSON Pointer reference tokens are extremely rare in practice, and without this fix the expression-string grammar cannot work correctly for any expression containing a json-pointer.

Issue: Single `identifier` rule conflates two different spec constraints

The proposed grammar uses a single identifier = 1*( ALPHA / DIGIT / "." / "-" / "_" ) rule for everything — step IDs, workflow IDs, source description names, component keys, input/output names, and field names. However, the spec defines two different patterns:

stepId, workflowId, sourceDescriptionName: SHOULD conform to [A-Za-z0-9_\-]+ (no dot)
Components keys: MUST match ^[a-zA-Z0-9\.\-_]+$ (with dot)

Using a single shared rule allows dots in step/workflow IDs where the spec says they shouldn't be. In my implementation, I split this into two rules:

identifier        = 1*(ALPHA / DIGIT / "." / "-" / "_")   ; for field names, component keys
identifier-strict = 1*(ALPHA / DIGIT / "_" / "-")          ; for step/workflow/source-description IDs

Issue: `source-descriptions-reference` (`reference-id`) is too restrictive

The proposed grammar constrains reference-id to identifier, but this value can be an operationId from an OpenAPI description. OpenAPI does not constrain operationId to any specific character set — it's just a string. OperationIds like get/pets, get pets, or create-user@v2 are technically valid in OpenAPI but would be rejected by the identifier rule.

In my implementation, I use 1*CHAR (any character except { and }) for this rule.

Issue: Simplified `CHAR` rule diverges from OpenAPI

The PR replaces the JSON string-based CHAR definition (from RFC 7159, with escape sequences) with a simpler character range: CHAR = %x00-7A / %x7C / %x7E-10FFFF. This changes the semantics — a bare \ becomes a valid character, and JSON escape sequences like \n, \uXXXX are no longer recognized.

OpenAPI's runtime expression ABNF uses the RFC 7159-based CHAR definition. Since Arazzo builds on top of OpenAPI and shares the runtime expression concept, simplifying CHAR introduces a subtle divergence. An expression valid in one spec could behave differently in the other. I'd recommend keeping the RFC 7159-based definition for interoperability.

Suggestion: `name` rule is not "legacy"

The PR labels the name rule as ; Legacy 'name' rule (retained for query/path references). This rule isn't legacy — it's the correct rule for query and path parameter names, which are user-defined and can contain any valid character. The comment could be misleading and suggest future removal. A more accurate comment would be something like ; Unconstrained name rule for query/path references.

Note: Example file version mismatch

The example fixes in examples/1.0.0/bnpl-arazzo.yaml (changing $inputs.customer.firstName to $inputs.customer#/firstName) apply 1.1.0 grammar semantics to a 1.0.0 example file. This could cause confusion about backward compatibility. Consider applying these fixes only to a 1.1.0 example, or noting that the 1.0.0 example has been updated to reflect the corrected grammar.

Note: Missing comma in example payload

In bnpl-arazzo.yaml, there's a pre-existing missing comma after the postalCode line in the JSON payload template, making it invalid JSON:

"postalCode": "{$inputs.customer#/postalCode}"
  "termsAndConditionsAccepted": true

Our ABNF grammar for reference

For reference, here is the complete ABNF grammar from my implementation that addresses the issues above:

; Arazzo runtime expression ABNF syntax
expression = (
    "$url" /
    "$method" /
    "$statusCode" /
    "$request." source /
    "$response." source /
    "$inputs." inputs-reference /
    "$outputs." outputs-reference /
    "$steps." steps-reference /
    "$workflows." workflows-reference /
    "$sourceDescriptions." source-reference /
    "$components." components-reference /
    "$self"
  )
; Request/Response sources
source                  = ( header-reference / query-reference / path-reference / body-reference )
header-reference        = "header." token
query-reference         = "query." name
path-reference          = "path." name
body-reference          = "body" ["#" json-pointer ]

; Input/Output references
inputs-reference        = inputs-name ["#" json-pointer]
inputs-name             = identifier
outputs-reference       = outputs-name ["#" json-pointer]
outputs-name            = identifier

; Steps expressions
steps-reference         = steps-id ".outputs." outputs-name ["#" json-pointer]
steps-id                = identifier-strict

; Workflows expressions
workflows-reference     = workflows-id "." workflows-field "." workflows-field-name ["#" json-pointer]
workflows-id            = identifier-strict
workflows-field         = "inputs" / "outputs"
workflows-field-name    = identifier

; Source descriptions expressions
source-reference                = source-descriptions-name "." source-descriptions-reference
source-descriptions-name        = identifier-strict
source-descriptions-reference   = 1*CHAR

; Components expressions
components-reference    = components-type "." components-name
components-type         = "parameters" / "successActions" / "failureActions"
components-name         = identifier

; Unconstrained name rule for query/path references and source description references
name                    = *( CHAR )

; Grammar for parsing template strings with embedded expressions
expression-string    = *( literal-char / embedded-expression )
embedded-expression  = "{" expression "}"
literal-char         = %x00-7A / %x7C / %x7E-10FFFF  ; anything except { (%x7B) and } (%x7D)

; JSON Pointer (RFC 6901, adapted)
; { (%x7B) and } (%x7D) are excluded from 'unescaped' for unambiguous embedded expression parsing
json-pointer     = *( "/" reference-token )
reference-token  = *( unescaped / escaped )
unescaped        = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF
                 ; %x2F ('/'), %x7E ('~'), %x7B ('{'), %x7D ('}') are excluded
escaped          = "~" ( "0" / "1" )
                 ; representing '~' and '/', respectively

; https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.6
token          = 1*tchar
tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
               / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
               / DIGIT / ALPHA
               ; any VCHAR, except delimiters

; https://www.rfc-editor.org/rfc/rfc7159#section-7
CHAR = unescape /
    escape (
        %x22 /          ; "    quotation mark  U+0022
        %x5C /          ; \    reverse solidus U+005C
        %x2F /          ; /    solidus         U+002F
        %x62 /          ; b    backspace       U+0008
        %x66 /          ; f    form feed       U+000C
        %x6E /          ; n    line feed       U+000A
        %x72 /          ; r    carriage return U+000D
        %x74 /          ; t    tab             U+0009
        %x75 4HEXDIG )  ; uXXXX                U+XXXX
escape         = %x5C   ; \
unescape       = %x20-21 / %x23-5B / %x5D-7A / %x7C / %x7E-10FFFF
               ; %x7B ('{') and %x7D ('}') are excluded from 'unescape'

; Identifier rules
identifier        = 1*(ALPHA / DIGIT / "." / "-" / "_")
                  ; Alphanumeric with dots, hyphens, underscores
identifier-strict = 1*(ALPHA / DIGIT / "_" / "-")
                  ; Alphanumeric with hyphens, underscores (no dots)

; https://datatracker.ietf.org/doc/html/rfc5234#appendix-B.1
HEXDIG         =  DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
DIGIT          =  %x30-39   ; 0-9
ALPHA          =  %x41-5A / %x61-7A   ; A-Z / a-z

feat(runtime-expressions): improve ABNF grammar clarity

64e553f

DmitryAnansky reviewed Mar 26, 2026

View reviewed changes

char0n reviewed Apr 2, 2026

View reviewed changes

char0n mentioned this pull request Apr 2, 2026

feat: improve ABNF grammar clarity and inline all secondary grammars swaggerexpert/arazzo-runtime-expression#116

Open

5 tasks

char0n reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(runtime-expressions): improve ABNF grammar clarity#454

feat(runtime-expressions): improve ABNF grammar clarity#454
frankkilcommins wants to merge 1 commit intoOAI:v1.1-devfrom
frankkilcommins:abnf-grammer-improvements

frankkilcommins commented Mar 23, 2026 •

edited

Loading

Uh oh!

DmitryAnansky Mar 26, 2026

Uh oh!

char0n Apr 2, 2026

Uh oh!

char0n Apr 2, 2026

Uh oh!

char0n Apr 2, 2026

Uh oh!

char0n Apr 2, 2026

Uh oh!

char0n Apr 2, 2026

Uh oh!

char0n Apr 2, 2026

Uh oh!

char0n commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	"$outputs." output-reference /
	"$outputs." outputs-reference /

Conversation

frankkilcommins commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DmitryAnansky Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

char0n Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

char0n Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

char0n Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

char0n Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

char0n Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

char0n Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

char0n commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Verification

Issue: unescaped in json-pointer still includes { and }

Issue: Single identifier rule conflates two different spec constraints

Issue: source-descriptions-reference (reference-id) is too restrictive

Issue: Simplified CHAR rule diverges from OpenAPI

Suggestion: name rule is not "legacy"

Note: Example file version mismatch

Note: Missing comma in example payload

Our ABNF grammar for reference

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

frankkilcommins commented Mar 23, 2026 •

edited

Loading

char0n commented Apr 2, 2026 •

edited

Loading

Issue: `unescaped` in json-pointer still includes `{` and `}`

Issue: Single `identifier` rule conflates two different spec constraints

Issue: `source-descriptions-reference` (`reference-id`) is too restrictive

Issue: Simplified `CHAR` rule diverges from OpenAPI

Suggestion: `name` rule is not "legacy"