Convert OMG KeBNF grammar specifications to parser grammars. Parses the full KerML + SysML v2 KeBNF specs and emits target-specific output with semantic traceability.
| Format | Flag | Output | Status |
|---|---|---|---|
| ANTLR4 | --format antlr4 |
.g4 |
CI-validated -- compiles with antlr4, javac |
| tree-sitter | --format tree-sitter |
grammar.js |
CI-validated -- tested against tree-sitter-sysml corpus |
# Install from crates.io
cargo install nomograph-kebnf
# Convert SysML v2 KeBNF to ANTLR4 grammar
kebnf KerML.kebnf SysML.kebnf --format antlr4 -o Sysml.g4
# Fetch the latest specs from the OMG GitHub repo, then convert
kebnf --fetch-spec
kebnf ~/.cache/kebnf/*.kebnf --format antlr4 -o Sysml.g4To build from source instead:
cargo build --release
./target/release/kebnf --helpThe CI pipeline generates and validates Sysml.g4 on every commit.
Download it from the latest pipeline:
Pipeline > antlr4-validate job > Artifacts > Sysml.g4
Or browse: latest pipeline artifacts
Every push runs a five-stage validation:
- rust-build -- zero compiler warnings
- rust-test -- all tests pass
- rust-clippy -- zero lint warnings
- antlr4-validate -- generate .g4 from full KerML+SysML, compile with
antlr4 4.13.2(zero errors), compile generated Java withjavac 21 - tree-sitter-validate -- generate grammar.js from full KerML+SysML,
run
tree-sitter generate(valid parser.c produced)
The tree-sitter backend uses pattern-based emission: each definition and usage rule has its prefix keywords inlined for early disambiguation. This eliminates the shared-prefix ambiguity that causes GLR timeout in naive conversion approaches.
Corpus coverage tested against tree-sitter-sysml test snippets:
| Category | Coverage |
|---|---|
| Attributes, Calculations, Constraints, Definitions, Expressions, Flows, Metadata, Requirements, States, Successions, Actions | 100% |
| Views | 96% |
| Usages | 94% |
| Packages | 89% |
| Connections | 80% |
The following constructs are not yet supported. They require structural changes to the usage pattern that cause tree-sitter's LR table generation to timeout, or involve keyword/name ambiguity that tree-sitter cannot resolve without external tokenization.
-
Multiplicity + specialization after type:
part wheels : Wheel[4] :> parts;-- the specialization:> partsafter multiplicity[4]requiresrepeat(feature_specialization)in the usage pattern, which causes combinatorial conflict explosion during LR table generation. -
Specialization before name:
item :> shapes : Box[1] { }-- the:>subsetting appears before the name, which the usage_declaration rule does not expect. -
Complex end features:
end theCauses [*] occurrence theCause :> causes :>> source { }-- multiple keywords and specializations in an end feature declaration. -
N-ary connect syntax:
( cause1 ::> causer1, cause2 ::> causer2 )-- parenthesized connection endpoints with::>bindings. -
Keyword/name ambiguity:
comment about Vehicle /* ... */-- thecommentkeyword is also a valid identifier, and tree-sitter cannot disambiguate without context-sensitive tokenization. -
Nested redefinition in rendering:
view :>> columnView[1] { }-- theviewkeyword with:>>redefinition inside a rendering body.
See docs/TREE-SITTER-FINDINGS.md for the full research journey from mechanical conversion to pattern-based emission.
KeBNF (Kernel Extended BNF) is the grammar notation used by the OMG to define the concrete syntax of SysML v2 and KerML. It extends standard EBNF with metamodel-binding annotations:
- Type annotations (
Rule : Type = ...) -- bind rules to metamodel types - Property assignments (
prop = Value,items += Element) -- AST construction - Boolean flags (
isAbstract ?= 'abstract') -- keyword-driven properties - Cross-references (
[QualifiedName]) -- name resolution - Semantic actions (
{ isPortion = true }) -- unconditional property setting
These annotations control metamodel binding but have no syntactic effect.
kebnf strips them during conversion and records them in a mapping file
(--mapping mapping.json) for downstream tools that need traceability.
See docs/KEBNF-SPEC.md for the full notation reference.
KeBNF source (.kebnf)
|
v
Parser (chumsky) --> AST
| |
| +--> ANTLR4 emitter ------> .g4
| |
| +--> tree-sitter emitter --> grammar.js
| |
| +--> mapping generator ----> mapping.json
v
Statistics (--stats)
The parser handles all KerML + SysML v2 rules. Each emitter walks the same AST. The ANTLR4 emitter handles:
- Lexer/parser rule split (ALL_CAPS -> lexer, CamelCase -> parser)
- ANTLR4 reserved word escaping (
import->import_) - Duplicate rule deduplication (KerML and SysML overlap)
- Mutual left-recursion breaking (wrapper inlining + rule merging)
$ kebnf KerML.kebnf SysML.kebnf --format antlr4 --stats
{
"total_rules": 640,
"direct_conversion": 247,
"strip_and_convert": 353,
"best_effort": 37,
"manual_review": 3
}
kebnf [OPTIONS] <INPUT>...
Arguments:
<INPUT>... Input .kebnf files
Options:
-o <PATH> Output file (default: grammar.{js,g4})
-f, --format <FMT> Output format: tree-sitter, antlr4 (default: tree-sitter)
-n, --name <NAME> Grammar name (default: sysml)
-m, --mapping <PATH> Output mapping.json
--include <PATTERNS> Include rules matching patterns (comma-separated)
--exclude <PATTERNS> Exclude rules matching patterns
--stats Print conversion statistics
--validate Validate output with tree-sitter generate
--fetch-spec Download latest KeBNF specs from OMG GitHub
-v, --verbose Verbose output
MIT
- Nomograph Labs
- tree-sitter-sysml -- hand-tuned SysML v2 grammar for tree-sitter
- SysML v2 Release -- OMG KeBNF source files
Part of Nomograph Labs.