IMPORTANT This is a work in progress!
The Metaschema-codegen library and application is designed to parse a schema designed in the Metaschema lanaguage and generate a source library that implements the schema using a clear and idiomatic API. The generated library is intended to be standalone, with minimal dependencies beyond those provided by the language's standard library.
Metaschema-codegen has a two phase approach to generating code from a metaschema.
- First, it parses the metaschema into an internal data structure that is generic, and suitable for transformation into other programming languages.
- Second, it uses the jinja2 templating library to produce output files converting the schema into source code for the generated library.
Both of these steps have many sub-processes, but this design allows a developer to modify the second step to produce a library in another programming language without needing to develop a parser for metaschema.
By parsing the metaschema, the library will provide the following elements:
- Simple Datatypes, including their regular expressions.
- Complex Datatypes (Markup Types, fields, flags and assemblies), including the definition of the structures and references to datatypes for primitive elements
- Information about cross-references between metaschema elements defined in different schema documents.
The library provides information about the structure of defined metaschema, but a language implementation should provide the following elements:
- idiomatic representations of the structures
- functions and behaviours such as data validation
- import and export functions for documents that comply with a metaschema in one of the supported representations, such as an OSCAL catalog expressed in JSON
This project uses uv for development and packaging.
Installation instructions are here: https://docs.astral.sh/uv/getting-started/installation/
git clone https://github.com/Credentive-Sec/metaschema-codegen.gitNext, install all of the required submodules with the git submodule update --init command. Confirm that all four of the submodules listed below are cloned.
$ cd metaschema-codegen
$ git submodule update --init
Submodule 'OSCAL' (https://github.com/usnistgov/OSCAL.git) registered for path 'OSCAL'
Submodule 'metaschema' (https://github.com/usnistgov/metaschema.git) registered for path 'metaschema'
Submodule 'oscal-content' (https://github.com/usnistgov/oscal-content.git) registered for path 'oscal-content'
Cloning into '/workspaces/metaschema-codegen/OSCAL'...
Cloning into '/workspaces/metaschema-codegen/metaschema'...
Cloning into '/workspaces/metaschema-codegen/oscal-content'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 28 (delta 22), reused 27 (delta 21), pack-reused 0 (from 0)
Unpacking objects: 100% (28/28), 50.69 KiB | 5.07 MiB/s, done.
From https://github.com/usnistgov/OSCAL
* branch e139397dab7773f7620d65571a04f178d951fc1d -> FETCH_HEAD
Submodule path 'OSCAL': checked out 'e139397dab7773f7620d65571a04f178d951fc1d'
Submodule path 'metaschema': checked out 'cf5966076cce4756081a05db46a784f5fb25af27'
Submodule path 'oscal-content': checked out '941c978d14c57379fbf6f7fb388f675067d5bff7'UV installs a virtual env inside the project directory, which works well with VS Code.
Run this command from inside the metaschema-codegen directory within the project. The directory should contain a pyproject.toml file.
uv syncIt may be necessary to reload the VS Code window for VS Code to pick up and activate the new virtual environment and installed dependencies ("Developer: Reload Window").
If you open the repository root directory in VS Code, it may not discover the .venv directory in the metaschema-subdirectory, and you may need to manually select it.
Tests should automatically be discovered.
To generate the python package source code, run the "test_package_generator" test. The generated code will appear in the "test-output/oscal" directory under the main project directory.
Breakpoints can be added added at various points to inspect the data structures used for code generation:
- metaschema-codegen/metaschema_codegen/core/schemaparse.py line 134: the "metaschema_schema" variable shows the contents of the internal representation of the metaschema xsd. See "metaschema_schema.complex_types" and "metaschema_schema.simple_types" for the items we use most.
- metaschema-codegen/tests/test_codegen.py line 13: the "ms" variable has a complete MetaschemaSet object
- metaschema-codegen/tests/conftest.py line 26: The "pg" variable contains a complete generated package.
The "design-docs" folder contains some useful notes, and is a general folder for putting things that we refer to frequently during development. See "parsing-samples.md" for a dump of the metaschema-schema object.