Skip to content

feat: improve ABNF grammar clarity and inline all secondary grammars#116

Open
char0n wants to merge 2 commits intomainfrom
feat/improve-abnf-grammar
Open

feat: improve ABNF grammar clarity and inline all secondary grammars#116
char0n wants to merge 2 commits intomainfrom
feat/improve-abnf-grammar

Conversation

@char0n
Copy link
Copy Markdown
Member

@char0n char0n commented Apr 2, 2026

Summary

  • Add $self expression support (new in Arazzo 1.1.0)
  • Add $inputs/$outputs JSON Pointer support (e.g., $inputs.customer#/firstName)
  • Inline all secondary grammars ($steps, $workflows, $sourceDescriptions, $components) into the primary grammar, eliminating two-pass parsing
  • Extract shared identifier and identifier-strict rules to reduce duplication
  • Adapt json-pointer to exclude { and } from unescaped, fixing the embedded expression extraction limitation for body expressions and all other expressions with JSON pointers
  • Require explicit component types (parameters/successActions/failureActions)
  • Update README with current grammar, examples, and removal of the "known limitation" note

This PR was created as a verification of the proposed ABNF grammar changes in OAI/Arazzo-Specification#454. During implementation, several issues with the spec PR were identified:

  1. unescaped in json-pointer still includes { and } — breaks embedded expression parsing; we fix this by excluding them
  2. Single identifier rule is too broad — the spec PR uses one rule for both IDs (no dots) and field names (with dots); we use identifier and identifier-strict to respect the spec's different constraints
  3. source-descriptions-reference is too restrictive — the spec PR constrains it to identifier, but operationIds in OpenAPI are unconstrained strings; we keep 1*CHAR

Resolves: OAI/Arazzo-Specification#424, OAI/Arazzo-Specification#425, OAI/Arazzo-Specification#426, OAI/Arazzo-Specification#428

Test plan

  • All 152 tests pass
  • New test fixtures for $self, $inputs/$outputs with JSON Pointer, dotted/hyphenated/underscored names
  • Extract tests updated: body expressions with JSON pointers now work in embedded context
  • Validation tests updated: {/} in JSON pointer paths now correctly rejected
  • Snapshot tests updated for new CST/AST node structure

🤖 Generated with Claude Code

Restructure the ABNF grammar to use explicit, typed reference rules in the
primary grammar instead of relying on secondary grammars with two-pass parsing.
This improves grammar clarity and aligns with the proposed spec changes in
OAI/Arazzo-Specification#454.

Key changes:
- Add $self expression support
- Add $inputs/$outputs JSON Pointer support (e.g., $inputs.customer#/firstName)
- Inline all secondary grammars into the primary grammar
- Extract shared identifier and identifier-strict rules
- Adapt json-pointer to exclude { and } from unescaped for unambiguous
  embedded expression parsing, fixing the body expression extract limitation
- Require explicit component types (parameters/successActions/failureActions)
- Update README with current grammar and examples

Resolves: OAI/Arazzo-Specification#424, OAI/Arazzo-Specification#425,
OAI/Arazzo-Specification#426, OAI/Arazzo-Specification#428

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@char0n char0n requested a review from Copilot April 2, 2026 13:25
@char0n char0n self-assigned this Apr 2, 2026
@char0n char0n added the enhancement New feature or request label Apr 2, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Arazzo runtime expression grammar and translators to support newer spec features and simplify parsing by inlining formerly secondary grammars.

Changes:

  • Add $self support and JSON Pointer support for $inputs/$outputs.
  • Inline $steps/$workflows/$sourceDescriptions/$components parsing into the primary grammar (eliminating two-pass parsing) and introduce shared identifier rules.
  • Adjust JSON Pointer unescaped handling to address embedded-expression extraction ambiguity; update tests, fixtures, and README accordingly.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
test/test.js Updates validation expectations for {/} in JSON Pointer paths.
test/parse/translators/snapshots/CSTTranslator.js.snap Updates CST snapshots to reflect new reference node structure.
test/parse/translators/snapshots/ASTTranslator.js.snap Updates AST snapshots for revised node shapes (e.g., steps/workflows/components).
test/parse/snapshots/cst-corpus.js.snap Refreshes CST corpus snapshots for new grammar structure and new expressions.
test/parse/snapshots/ast-corpus.js.snap Refreshes AST corpus snapshots (adds $self, pointers on inputs/outputs, renamed fields).
test/fixtures/expressions-valid.js Adds new valid expression fixtures ($self, pointers, dotted/hyphenated identifiers).
test/extract.js Updates extraction tests to confirm embedded body expressions with JSON pointers now extract correctly.
src/parse/translators/CSTTranslator.js Aligns CST callbacks with the new inlined grammar rules.
src/parse/translators/ASTTranslator/transformers.js Removes secondary parsing and maps new CST node types directly to updated AST shapes; adds $self.
src/grammar.js Regenerated grammar output reflecting inlined rules and updated JSON Pointer ranges.
src/grammar.bnf Updates the source ABNF (primary change set) including inlined references and identifier rules.
README.md Updates documented ABNF and examples; removes previous JSON-pointer extraction limitation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

reference-token = *( unescaped / escaped )
unescaped = %x00-2E / %x30-7D / %x7F-10FFFF
; %x2F ('/') and %x7E ('~') are excluded from 'unescaped'
unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unescaped currently includes %x7E (~) via the %x7E-10FFFF range, but the comment says ~ is excluded and RFC 6901 requires ~ to only appear as part of an escape sequence (~0 / ~1). This makes invalid JSON Pointers (e.g. /a~b) parse as valid and can also prevent ~0/~1 from being treated as an escape sequence. Adjust the ranges so %x7E is excluded (e.g. use %x7F-10FFFF for the final range) and regenerate src/grammar.js.

Suggested change
unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF
unescaped = %x00-2E / %x30-7A / %x7C / %x7F-10FFFF

Copilot uses AI. Check for mistakes.
reference-token = *( unescaped / escaped )
unescaped = %x00-2E / %x30-7D / %x7F-10FFFF
; %x2F ('/') and %x7E ('~') are excluded from 'unescaped'
unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README ABNF shows unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF, which actually includes %x7E (~) even though the following comment says ~ is excluded. This is inconsistent with RFC 6901 and with the intended validation behavior. Update the ABNF snippet to exclude %x7E (e.g. %x7F-10FFFF for the final range) to match the corrected grammar.

Suggested change
unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF
unescaped = %x00-2E / %x30-7A / %x7C / %x7F-10FFFF

Copilot uses AI. Check for mistakes.
str += "reference-token = *( unescaped / escaped )\n";
str += "unescaped = %x00-2E / %x30-7D / %x7F-10FFFF\n";
str += " ; %x2F ('/') and %x7E ('~') are excluded from 'unescaped'\n";
str += "unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF\n";
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toString() prints unescaped = ... %x7E-10FFFF, which includes %x7E (~) despite the comment claiming it is excluded. Once the source grammar is corrected to exclude ~, recompile/regenerate this generated file so the opcodes and toString() output reflect the intended JSON Pointer validation.

Suggested change
str += "unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF\n";
str += "unescaped = %x00-2E / %x30-7A / %x7C / %x7F-10FFFF\n";

Copilot uses AI. Check for mistakes.
Comment on lines 13 to 18
// Literal expressions
if (text === '$url') return { type: 'UrlExpression' };
if (text === '$method') return { type: 'MethodExpression' };
if (text === '$statusCode') return { type: 'StatusCodeExpression' };
if (text === '$self') return { type: 'SelfExpression' };

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$self is now translated to { type: 'SelfExpression' }, but the published TypeScript definitions (types/index.d.ts) don’t define SelfExpression or include it in the ASTNode union. Please update the typings (and any related documentation/examples) so TS consumers can use $self without type errors.

Copilot uses AI. Check for mistakes.
Comment on lines +111 to +124
['inputs-reference'](node) {
const inputNameNode = node.children.find((c) => c.type === 'inputs-name');
const jsonPointerNode = node.children.find((c) => c.type === 'json-pointer');

const result = {
type: 'InputsExpression',
name: inputNameNode.text,
};

if (jsonPointerNode) {
result.jsonPointer = transformCSTtoAST(jsonPointerNode, transformers);
}

return result;
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InputsExpression can now include a jsonPointer (see the optional assignment when jsonPointerNode is present). types/index.d.ts currently defines InputsExpression with only { name: string }, so TypeScript consumers won’t be able to access jsonPointer without casting. Update the typings to include the optional jsonPointer field.

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +141
['outputs-reference'](node) {
const outputNameNode = node.children.find((c) => c.type === 'outputs-name');
const jsonPointerNode = node.children.find((c) => c.type === 'json-pointer');

const result = {
type: 'OutputsExpression',
name: outputNameNode.text,
};

if (jsonPointerNode) {
result.jsonPointer = transformCSTtoAST(jsonPointerNode, transformers);
}

return result;
},
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OutputsExpression now optionally includes jsonPointer, but the current TypeScript definitions only include { name: string }. Please update types/index.d.ts so OutputsExpression reflects the new AST shape.

Copilot uses AI. Check for mistakes.
Comment on lines +143 to 152
['steps-reference'](node) {
const stepIdNode = node.children.find((c) => c.type === 'steps-id');
const fieldNode = node.children.find((c) => c.type === 'steps-field');
const subFieldNode = node.children.find((c) => c.type === 'steps-sub-field');
const outputNameNode = node.children.find((c) => c.type === 'outputs-name');
const jsonPointerNode = node.children.find((c) => c.type === 'json-pointer');

const result = {
type: 'StepsExpression',
stepId: stepIdNode.text,
field: fieldNode.text,
outputName: subFieldNode.text,
outputName: outputNameNode.text,
};
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StepsExpression no longer includes a field: 'outputs' property (it only returns stepId and outputName now). types/index.d.ts still requires field, so typings will be incorrect. Align the StepsExpression type definition with the new AST output.

Copilot uses AI. Check for mistakes.
Comment on lines +161 to 172
['workflows-reference'](node) {
const workflowIdNode = node.children.find((c) => c.type === 'workflows-id');
const fieldNode = node.children.find((c) => c.type === 'workflows-field');
const subFieldNode = node.children.find((c) => c.type === 'workflows-sub-field');
const fieldNameNode = node.children.find((c) => c.type === 'workflows-field-name');
const jsonPointerNode = node.children.find((c) => c.type === 'json-pointer');

const result = {
type: 'WorkflowsExpression',
workflowId: workflowIdNode.text,
field: fieldNode.text,
subField: subFieldNode.text,
fieldName: fieldNameNode.text,
};
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkflowsExpression now uses fieldName instead of subField. types/index.d.ts still exposes subField, so TypeScript users will see a mismatch. Update the typings (and any related docs) to use fieldName.

Copilot uses AI. Check for mistakes.
field: fieldNode.text,
subField: subFieldNode.text,
componentType: typeNode.text,
componentName: nameNode.text,
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ComponentsExpression now returns { componentType, componentName }, but types/index.d.ts still defines { field, subField }. Please update the exported typings to match the new property names so consumers don’t break at compile time.

Suggested change
componentName: nameNode.text,
componentName: nameNode.text,
// Backwards-compatible aliases for older typings expecting { field, subField }
field: typeNode.text,
subField: nameNode.text,

Copilot uses AI. Check for mistakes.
assert.isTrue(test('$request.body#/foo{'));
assert.isTrue(test('$request.body#/{foo}'));
it('should reject { and } in JSON pointer paths', function () {
// { and } are excluded from json-pointer-safe to allow unambiguous
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment refers to a json-pointer-safe rule, but that rule doesn’t exist in the grammar (src/grammar.bnf / src/grammar.js). To avoid confusion for future maintainers, rename this to the actual rule being constrained (e.g., unescaped within json-pointer, or just “the json-pointer grammar”).

Suggested change
// { and } are excluded from json-pointer-safe to allow unambiguous
// { and } are excluded from the json-pointer grammar to allow unambiguous

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ambiguity in runtime expressions embedded in strings

2 participants