Skip to content

feat(runtime-expressions): improve ABNF grammar clarity#454

Open
frankkilcommins wants to merge 1 commit intoOAI:v1.1-devfrom
frankkilcommins:abnf-grammer-improvements
Open

feat(runtime-expressions): improve ABNF grammar clarity#454
frankkilcommins wants to merge 1 commit intoOAI:v1.1-devfrom
frankkilcommins:abnf-grammer-improvements

Conversation

@frankkilcommins
Copy link
Copy Markdown
Collaborator

@frankkilcommins frankkilcommins commented Mar 23, 2026

$components now requires explicit component type (parameters/successActions/failureActions). Generic components.name pattern removed. Note: This was already semantically invalid per spec.

fixes: #424
fixes: #425
fixes: #426
fixes: #428
fixes: #437

resolves: #427

"firstName": "{$inputs.customer#/firstName}",
"lastName": "{$inputs.customer#/lastName}",
"dateOfBirth": "{$inputs.customer#/dateOfBirth}",
"postalCode": "{$inputs.customer#/postalCode}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to include more complex and diverse examples demonstrating the application of ABNF syntax.

component-name = identifier

; Identifier rule
identifier = 1*( ALPHA / DIGIT / "." / "-" / "_" )
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR's identifier = 1*( ALPHA / DIGIT / "." / "-" / "_" ) is used for all IDs (stepId, workflowId, sourceDescriptionName, component keys, input/output names). But the spec defines two
different patterns:

  • stepId, workflowId, sourceDescriptionName: SHOULD [A-Za-z0-9_-]+ (no dot)
  • Components keys: MUST ^[a-zA-Z0-9.-_]+$ (with dot)

A single shared identifier rule conflates these — it allows dots in step/workflow IDs where the spec says they shouldn't be, and it's only SHOULD-level enforcement anyway. Separate rules would
be more faithful to the spec's intent.

field-name = identifier

; Source descriptions expressions
source-reference = source-name "." reference-id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source-reference is too restrictive with identifier

The proposed grammar uses:

  source-reference = source-name "." reference-id                                             
  reference-id = identifier                      

The <reference> part can be an operationId from an OpenAPI description or a workflowId from
an Arazzo document. OpenAPI does not constrain operationId to any specific character set —
it's just a string. This means operationIds like get/pets, get pets, or create-user@v2 are
technically valid in OpenAPI but would be rejected by the identifier rule.

I'd suggest using a less restrictive rule for reference-id — something like 1*CHAR (any
character except { and }) — to avoid rejecting valid OpenAPI operationIds.

"$statusCode" /
"$request." source /
"$response." source /
"$inputs." input-reference /
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plural for consistency with $steps or $workflows

Suggested change
"$inputs." input-reference /
"$inputs." inputs-reference /

"$request." source /
"$response." source /
"$inputs." input-reference /
"$outputs." output-reference /
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"$outputs." output-reference /
"$outputs." outputs-reference /

; JSON Pointer (RFC 6901)
json-pointer = *( "/" reference-token )
reference-token = *( unescaped / escaped )
unescaped = %x00-2E / %x30-7D / %x7F-10FFFF
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unescaped in json-pointer still includes { and } — breaks embedded expression parsing

The PR correctly excludes { and } from the CHAR rule for unambiguous embedded
expression parsing, but the unescaped rule in json-pointer still uses %x30-7D, which
includes } (%x7D) and { (%x7B).

This means an embedded expression like {$request.body#/status} or
{$steps.someStepId.outputs.pets#/0/id} cannot be reliably parsed — the json-pointer's
unescaped will consume the closing }, making it impossible to determine where the
expression ends.

The fix is to change unescaped from:
unescaped = %x00-2E / %x30-7D / %x7F-10FFFF
to:
unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF
; %x2F ('/'), %x7E ('~'), %x7B ('{'), %x7D ('}') are excluded

This is a minor deviation from RFC 6901, but { and } in JSON Pointer reference tokens
are extremely rare in practice, and without this fix the expression-string grammar
cannot work correctly for any expression containing a json-pointer.

We validated this in our ABNF parser implementation at
https://github.com/swaggerexpert/arazzo-runtime-expression — after making this change,
all expressions with json-pointers work correctly in both standalone and embedded
contexts.

char0n added a commit to swaggerexpert/arazzo-runtime-expression that referenced this pull request Apr 2, 2026
Restructure the ABNF grammar to use explicit, typed reference rules in the
primary grammar instead of relying on secondary grammars with two-pass parsing.
This improves grammar clarity and aligns with the proposed spec changes in
OAI/Arazzo-Specification#454.

Key changes:
- Add $self expression support
- Add $inputs/$outputs JSON Pointer support (e.g., $inputs.customer#/firstName)
- Inline all secondary grammars into the primary grammar
- Extract shared identifier and identifier-strict rules
- Adapt json-pointer to exclude { and } from unescaped for unambiguous
  embedded expression parsing, fixing the body expression extract limitation
- Require explicit component types (parameters/successActions/failureActions)
- Update README with current grammar and examples

Resolves: OAI/Arazzo-Specification#424, OAI/Arazzo-Specification#425,
OAI/Arazzo-Specification#426, OAI/Arazzo-Specification#428

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
; Matches recommended pattern [A-Za-z0-9_\-]+ from spec

; Legacy 'name' rule (retained for query/path references)
name = *( CHAR )
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name isn't legacy. It's the correct rule for query and path references because query parameter names and path parameter names are user-defined and can contain any valid character.

@char0n
Copy link
Copy Markdown
Contributor

char0n commented Apr 2, 2026

Implementation Verification

I implemented the proposed grammar changes in my ABNF parser at swaggerexpert/arazzo-runtime-expression#116 to verify the grammar is correct and parseable. All 152 tests pass. Below are the findings from the implementation.

Issue: unescaped in json-pointer still includes { and }

The CHAR rule correctly excludes { (%x7B) and } (%x7D) for unambiguous embedded expression parsing, but the unescaped rule in json-pointer still uses %x30-7D, which includes both characters.

This means embedded expressions containing JSON pointers — like {$request.body#/status}, {$inputs.customer#/firstName}, or {$steps.foo.outputs.bar#/0/id} — cannot be reliably parsed. The json-pointer's unescaped will consume the closing }, making it impossible to determine where the expression ends.

Suggested fix — change unescaped from:

unescaped = %x00-2E / %x30-7D / %x7F-10FFFF

to:

unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF
    ; %x2F ('/'), %x7E ('~'), %x7B ('{'), %x7D ('}') are excluded

This is a minor deviation from RFC 6901, but { and } in JSON Pointer reference tokens are extremely rare in practice, and without this fix the expression-string grammar cannot work correctly for any expression containing a json-pointer.

Issue: Single identifier rule conflates two different spec constraints

The proposed grammar uses a single identifier = 1*( ALPHA / DIGIT / "." / "-" / "_" ) rule for everything — step IDs, workflow IDs, source description names, component keys, input/output names, and field names. However, the spec defines two different patterns:

  • stepId, workflowId, sourceDescriptionName: SHOULD conform to [A-Za-z0-9_\-]+ (no dot)
  • Components keys: MUST match ^[a-zA-Z0-9\.\-_]+$ (with dot)

Using a single shared rule allows dots in step/workflow IDs where the spec says they shouldn't be. In my implementation, I split this into two rules:

identifier        = 1*(ALPHA / DIGIT / "." / "-" / "_")   ; for field names, component keys
identifier-strict = 1*(ALPHA / DIGIT / "_" / "-")          ; for step/workflow/source-description IDs

Issue: source-descriptions-reference (reference-id) is too restrictive

The proposed grammar constrains reference-id to identifier, but this value can be an operationId from an OpenAPI description. OpenAPI does not constrain operationId to any specific character set — it's just a string. OperationIds like get/pets, get pets, or create-user@v2 are technically valid in OpenAPI but would be rejected by the identifier rule.

In my implementation, I use 1*CHAR (any character except { and }) for this rule.

Issue: Simplified CHAR rule diverges from OpenAPI

The PR replaces the JSON string-based CHAR definition (from RFC 7159, with escape sequences) with a simpler character range: CHAR = %x00-7A / %x7C / %x7E-10FFFF. This changes the semantics — a bare \ becomes a valid character, and JSON escape sequences like \n, \uXXXX are no longer recognized.

OpenAPI's runtime expression ABNF uses the RFC 7159-based CHAR definition. Since Arazzo builds on top of OpenAPI and shares the runtime expression concept, simplifying CHAR introduces a subtle divergence. An expression valid in one spec could behave differently in the other. I'd recommend keeping the RFC 7159-based definition for interoperability.

Suggestion: name rule is not "legacy"

The PR labels the name rule as ; Legacy 'name' rule (retained for query/path references). This rule isn't legacy — it's the correct rule for query and path parameter names, which are user-defined and can contain any valid character. The comment could be misleading and suggest future removal. A more accurate comment would be something like ; Unconstrained name rule for query/path references.

Note: Example file version mismatch

The example fixes in examples/1.0.0/bnpl-arazzo.yaml (changing $inputs.customer.firstName to $inputs.customer#/firstName) apply 1.1.0 grammar semantics to a 1.0.0 example file. This could cause confusion about backward compatibility. Consider applying these fixes only to a 1.1.0 example, or noting that the 1.0.0 example has been updated to reflect the corrected grammar.

Note: Missing comma in example payload

In bnpl-arazzo.yaml, there's a pre-existing missing comma after the postalCode line in the JSON payload template, making it invalid JSON:

"postalCode": "{$inputs.customer#/postalCode}"
  "termsAndConditionsAccepted": true

Our ABNF grammar for reference

For reference, here is the complete ABNF grammar from my implementation that addresses the issues above:

; Arazzo runtime expression ABNF syntax
expression = (
    "$url" /
    "$method" /
    "$statusCode" /
    "$request." source /
    "$response." source /
    "$inputs." inputs-reference /
    "$outputs." outputs-reference /
    "$steps." steps-reference /
    "$workflows." workflows-reference /
    "$sourceDescriptions." source-reference /
    "$components." components-reference /
    "$self"
  )
; Request/Response sources
source                  = ( header-reference / query-reference / path-reference / body-reference )
header-reference        = "header." token
query-reference         = "query." name
path-reference          = "path." name
body-reference          = "body" ["#" json-pointer ]

; Input/Output references
inputs-reference        = inputs-name ["#" json-pointer]
inputs-name             = identifier
outputs-reference       = outputs-name ["#" json-pointer]
outputs-name            = identifier

; Steps expressions
steps-reference         = steps-id ".outputs." outputs-name ["#" json-pointer]
steps-id                = identifier-strict

; Workflows expressions
workflows-reference     = workflows-id "." workflows-field "." workflows-field-name ["#" json-pointer]
workflows-id            = identifier-strict
workflows-field         = "inputs" / "outputs"
workflows-field-name    = identifier

; Source descriptions expressions
source-reference                = source-descriptions-name "." source-descriptions-reference
source-descriptions-name        = identifier-strict
source-descriptions-reference   = 1*CHAR

; Components expressions
components-reference    = components-type "." components-name
components-type         = "parameters" / "successActions" / "failureActions"
components-name         = identifier

; Unconstrained name rule for query/path references and source description references
name                    = *( CHAR )

; Grammar for parsing template strings with embedded expressions
expression-string    = *( literal-char / embedded-expression )
embedded-expression  = "{" expression "}"
literal-char         = %x00-7A / %x7C / %x7E-10FFFF  ; anything except { (%x7B) and } (%x7D)

; JSON Pointer (RFC 6901, adapted)
; { (%x7B) and } (%x7D) are excluded from 'unescaped' for unambiguous embedded expression parsing
json-pointer     = *( "/" reference-token )
reference-token  = *( unescaped / escaped )
unescaped        = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF
                 ; %x2F ('/'), %x7E ('~'), %x7B ('{'), %x7D ('}') are excluded
escaped          = "~" ( "0" / "1" )
                 ; representing '~' and '/', respectively

; https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.6
token          = 1*tchar
tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
               / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
               / DIGIT / ALPHA
               ; any VCHAR, except delimiters

; https://www.rfc-editor.org/rfc/rfc7159#section-7
CHAR = unescape /
    escape (
        %x22 /          ; "    quotation mark  U+0022
        %x5C /          ; \    reverse solidus U+005C
        %x2F /          ; /    solidus         U+002F
        %x62 /          ; b    backspace       U+0008
        %x66 /          ; f    form feed       U+000C
        %x6E /          ; n    line feed       U+000A
        %x72 /          ; r    carriage return U+000D
        %x74 /          ; t    tab             U+0009
        %x75 4HEXDIG )  ; uXXXX                U+XXXX
escape         = %x5C   ; \
unescape       = %x20-21 / %x23-5B / %x5D-7A / %x7C / %x7E-10FFFF
               ; %x7B ('{') and %x7D ('}') are excluded from 'unescape'

; Identifier rules
identifier        = 1*(ALPHA / DIGIT / "." / "-" / "_")
                  ; Alphanumeric with dots, hyphens, underscores
identifier-strict = 1*(ALPHA / DIGIT / "_" / "-")
                  ; Alphanumeric with hyphens, underscores (no dots)

; https://datatracker.ietf.org/doc/html/rfc5234#appendix-B.1
HEXDIG         =  DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
DIGIT          =  %x30-39   ; 0-9
ALPHA          =  %x41-5A / %x61-7A   ; A-Z / a-z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants