feat: improve ABNF grammar clarity and inline all secondary grammars#116
feat: improve ABNF grammar clarity and inline all secondary grammars#116
Conversation
Restructure the ABNF grammar to use explicit, typed reference rules in the primary grammar instead of relying on secondary grammars with two-pass parsing. This improves grammar clarity and aligns with the proposed spec changes in OAI/Arazzo-Specification#454. Key changes: - Add $self expression support - Add $inputs/$outputs JSON Pointer support (e.g., $inputs.customer#/firstName) - Inline all secondary grammars into the primary grammar - Extract shared identifier and identifier-strict rules - Adapt json-pointer to exclude { and } from unescaped for unambiguous embedded expression parsing, fixing the body expression extract limitation - Require explicit component types (parameters/successActions/failureActions) - Update README with current grammar and examples Resolves: OAI/Arazzo-Specification#424, OAI/Arazzo-Specification#425, OAI/Arazzo-Specification#426, OAI/Arazzo-Specification#428 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the Arazzo runtime expression grammar and translators to support newer spec features and simplify parsing by inlining formerly secondary grammars.
Changes:
- Add
$selfsupport and JSON Pointer support for$inputs/$outputs. - Inline
$steps/$workflows/$sourceDescriptions/$componentsparsing into the primary grammar (eliminating two-pass parsing) and introduce sharedidentifierrules. - Adjust JSON Pointer
unescapedhandling to address embedded-expression extraction ambiguity; update tests, fixtures, and README accordingly.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| test/test.js | Updates validation expectations for {/} in JSON Pointer paths. |
| test/parse/translators/snapshots/CSTTranslator.js.snap | Updates CST snapshots to reflect new reference node structure. |
| test/parse/translators/snapshots/ASTTranslator.js.snap | Updates AST snapshots for revised node shapes (e.g., steps/workflows/components). |
| test/parse/snapshots/cst-corpus.js.snap | Refreshes CST corpus snapshots for new grammar structure and new expressions. |
| test/parse/snapshots/ast-corpus.js.snap | Refreshes AST corpus snapshots (adds $self, pointers on inputs/outputs, renamed fields). |
| test/fixtures/expressions-valid.js | Adds new valid expression fixtures ($self, pointers, dotted/hyphenated identifiers). |
| test/extract.js | Updates extraction tests to confirm embedded body expressions with JSON pointers now extract correctly. |
| src/parse/translators/CSTTranslator.js | Aligns CST callbacks with the new inlined grammar rules. |
| src/parse/translators/ASTTranslator/transformers.js | Removes secondary parsing and maps new CST node types directly to updated AST shapes; adds $self. |
| src/grammar.js | Regenerated grammar output reflecting inlined rules and updated JSON Pointer ranges. |
| src/grammar.bnf | Updates the source ABNF (primary change set) including inlined references and identifier rules. |
| README.md | Updates documented ABNF and examples; removes previous JSON-pointer extraction limitation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| reference-token = *( unescaped / escaped ) | ||
| unescaped = %x00-2E / %x30-7D / %x7F-10FFFF | ||
| ; %x2F ('/') and %x7E ('~') are excluded from 'unescaped' | ||
| unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF |
There was a problem hiding this comment.
unescaped currently includes %x7E (~) via the %x7E-10FFFF range, but the comment says ~ is excluded and RFC 6901 requires ~ to only appear as part of an escape sequence (~0 / ~1). This makes invalid JSON Pointers (e.g. /a~b) parse as valid and can also prevent ~0/~1 from being treated as an escape sequence. Adjust the ranges so %x7E is excluded (e.g. use %x7F-10FFFF for the final range) and regenerate src/grammar.js.
| unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF | |
| unescaped = %x00-2E / %x30-7A / %x7C / %x7F-10FFFF |
| reference-token = *( unescaped / escaped ) | ||
| unescaped = %x00-2E / %x30-7D / %x7F-10FFFF | ||
| ; %x2F ('/') and %x7E ('~') are excluded from 'unescaped' | ||
| unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF |
There was a problem hiding this comment.
The README ABNF shows unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF, which actually includes %x7E (~) even though the following comment says ~ is excluded. This is inconsistent with RFC 6901 and with the intended validation behavior. Update the ABNF snippet to exclude %x7E (e.g. %x7F-10FFFF for the final range) to match the corrected grammar.
| unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF | |
| unescaped = %x00-2E / %x30-7A / %x7C / %x7F-10FFFF |
| str += "reference-token = *( unescaped / escaped )\n"; | ||
| str += "unescaped = %x00-2E / %x30-7D / %x7F-10FFFF\n"; | ||
| str += " ; %x2F ('/') and %x7E ('~') are excluded from 'unescaped'\n"; | ||
| str += "unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF\n"; |
There was a problem hiding this comment.
toString() prints unescaped = ... %x7E-10FFFF, which includes %x7E (~) despite the comment claiming it is excluded. Once the source grammar is corrected to exclude ~, recompile/regenerate this generated file so the opcodes and toString() output reflect the intended JSON Pointer validation.
| str += "unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF\n"; | |
| str += "unescaped = %x00-2E / %x30-7A / %x7C / %x7F-10FFFF\n"; |
| // Literal expressions | ||
| if (text === '$url') return { type: 'UrlExpression' }; | ||
| if (text === '$method') return { type: 'MethodExpression' }; | ||
| if (text === '$statusCode') return { type: 'StatusCodeExpression' }; | ||
| if (text === '$self') return { type: 'SelfExpression' }; | ||
|
|
There was a problem hiding this comment.
$self is now translated to { type: 'SelfExpression' }, but the published TypeScript definitions (types/index.d.ts) don’t define SelfExpression or include it in the ASTNode union. Please update the typings (and any related documentation/examples) so TS consumers can use $self without type errors.
| ['inputs-reference'](node) { | ||
| const inputNameNode = node.children.find((c) => c.type === 'inputs-name'); | ||
| const jsonPointerNode = node.children.find((c) => c.type === 'json-pointer'); | ||
|
|
||
| const result = { | ||
| type: 'InputsExpression', | ||
| name: inputNameNode.text, | ||
| }; | ||
|
|
||
| if (jsonPointerNode) { | ||
| result.jsonPointer = transformCSTtoAST(jsonPointerNode, transformers); | ||
| } | ||
|
|
||
| return result; |
There was a problem hiding this comment.
InputsExpression can now include a jsonPointer (see the optional assignment when jsonPointerNode is present). types/index.d.ts currently defines InputsExpression with only { name: string }, so TypeScript consumers won’t be able to access jsonPointer without casting. Update the typings to include the optional jsonPointer field.
| ['outputs-reference'](node) { | ||
| const outputNameNode = node.children.find((c) => c.type === 'outputs-name'); | ||
| const jsonPointerNode = node.children.find((c) => c.type === 'json-pointer'); | ||
|
|
||
| const result = { | ||
| type: 'OutputsExpression', | ||
| name: outputNameNode.text, | ||
| }; | ||
|
|
||
| if (jsonPointerNode) { | ||
| result.jsonPointer = transformCSTtoAST(jsonPointerNode, transformers); | ||
| } | ||
|
|
||
| return result; | ||
| }, |
There was a problem hiding this comment.
OutputsExpression now optionally includes jsonPointer, but the current TypeScript definitions only include { name: string }. Please update types/index.d.ts so OutputsExpression reflects the new AST shape.
| ['steps-reference'](node) { | ||
| const stepIdNode = node.children.find((c) => c.type === 'steps-id'); | ||
| const fieldNode = node.children.find((c) => c.type === 'steps-field'); | ||
| const subFieldNode = node.children.find((c) => c.type === 'steps-sub-field'); | ||
| const outputNameNode = node.children.find((c) => c.type === 'outputs-name'); | ||
| const jsonPointerNode = node.children.find((c) => c.type === 'json-pointer'); | ||
|
|
||
| const result = { | ||
| type: 'StepsExpression', | ||
| stepId: stepIdNode.text, | ||
| field: fieldNode.text, | ||
| outputName: subFieldNode.text, | ||
| outputName: outputNameNode.text, | ||
| }; |
There was a problem hiding this comment.
StepsExpression no longer includes a field: 'outputs' property (it only returns stepId and outputName now). types/index.d.ts still requires field, so typings will be incorrect. Align the StepsExpression type definition with the new AST output.
| ['workflows-reference'](node) { | ||
| const workflowIdNode = node.children.find((c) => c.type === 'workflows-id'); | ||
| const fieldNode = node.children.find((c) => c.type === 'workflows-field'); | ||
| const subFieldNode = node.children.find((c) => c.type === 'workflows-sub-field'); | ||
| const fieldNameNode = node.children.find((c) => c.type === 'workflows-field-name'); | ||
| const jsonPointerNode = node.children.find((c) => c.type === 'json-pointer'); | ||
|
|
||
| const result = { | ||
| type: 'WorkflowsExpression', | ||
| workflowId: workflowIdNode.text, | ||
| field: fieldNode.text, | ||
| subField: subFieldNode.text, | ||
| fieldName: fieldNameNode.text, | ||
| }; |
There was a problem hiding this comment.
WorkflowsExpression now uses fieldName instead of subField. types/index.d.ts still exposes subField, so TypeScript users will see a mismatch. Update the typings (and any related docs) to use fieldName.
| field: fieldNode.text, | ||
| subField: subFieldNode.text, | ||
| componentType: typeNode.text, | ||
| componentName: nameNode.text, |
There was a problem hiding this comment.
ComponentsExpression now returns { componentType, componentName }, but types/index.d.ts still defines { field, subField }. Please update the exported typings to match the new property names so consumers don’t break at compile time.
| componentName: nameNode.text, | |
| componentName: nameNode.text, | |
| // Backwards-compatible aliases for older typings expecting { field, subField } | |
| field: typeNode.text, | |
| subField: nameNode.text, |
| assert.isTrue(test('$request.body#/foo{')); | ||
| assert.isTrue(test('$request.body#/{foo}')); | ||
| it('should reject { and } in JSON pointer paths', function () { | ||
| // { and } are excluded from json-pointer-safe to allow unambiguous |
There was a problem hiding this comment.
The comment refers to a json-pointer-safe rule, but that rule doesn’t exist in the grammar (src/grammar.bnf / src/grammar.js). To avoid confusion for future maintainers, rename this to the actual rule being constrained (e.g., unescaped within json-pointer, or just “the json-pointer grammar”).
| // { and } are excluded from json-pointer-safe to allow unambiguous | |
| // { and } are excluded from the json-pointer grammar to allow unambiguous |
Summary
$selfexpression support (new in Arazzo 1.1.0)$inputs/$outputsJSON Pointer support (e.g.,$inputs.customer#/firstName)$steps,$workflows,$sourceDescriptions,$components) into the primary grammar, eliminating two-pass parsingidentifierandidentifier-strictrules to reduce duplicationjson-pointerto exclude{and}fromunescaped, fixing the embedded expression extraction limitation for body expressions and all other expressions with JSON pointersparameters/successActions/failureActions)This PR was created as a verification of the proposed ABNF grammar changes in OAI/Arazzo-Specification#454. During implementation, several issues with the spec PR were identified:
unescapedin json-pointer still includes{and}— breaks embedded expression parsing; we fix this by excluding themidentifierrule is too broad — the spec PR uses one rule for both IDs (no dots) and field names (with dots); we useidentifierandidentifier-strictto respect the spec's different constraintssource-descriptions-referenceis too restrictive — the spec PR constrains it toidentifier, but operationIds in OpenAPI are unconstrained strings; we keep1*CHARResolves: OAI/Arazzo-Specification#424, OAI/Arazzo-Specification#425, OAI/Arazzo-Specification#426, OAI/Arazzo-Specification#428
Test plan
$self,$inputs/$outputswith JSON Pointer, dotted/hyphenated/underscored names{/}in JSON pointer paths now correctly rejected🤖 Generated with Claude Code