Development of a metadata schema for experimental data, specifically electrochemical and electrocatalytic data.
Install pixi and get a copy of the metadata-schema:
git clone https://github.com/echemdb/metadata-schema.git
cd metadata-schemaThe mdstools package provides tools to convert nested YAML metadata into flat Excel/CSV formats with optional schema-based enrichment (descriptions and examples from JSON schemas).
Convert a YAML file to enriched Excel and CSV:
pixi run convert tests/example_metadata.yamlThis creates three files in generated/:
example_metadata.csv- Flat CSV with all metadataexample_metadata.xlsx- Single-sheet Excel fileexample_metadata_sheets.xlsx- Multi-sheet Excel (one sheet per top-level key)
All exported files include Description and Example columns populated from the JSON schemas, making it easier for users to understand and fill out the metadata templates.
pixi run convert <yaml_file> [--schema-dir DIR] [--output-dir DIR] [--no-enrichment]--schema-dir- Directory with JSON schemas (default:schemas)--output-dir- Output directory (default:generated)--no-enrichment- Disable enrichment (no Description/Example columns)
pixi run unflatten generated/example_metadata.xlsx --schema-file schemas/schema_pieces/minimum_echemdb.jsonThe mdstools package can also be used programmatically:
from mdstools.metadata.metadata import Metadata
from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata
# Load YAML metadata
metadata = Metadata.from_yaml('metadata.yaml')
# Flatten to tabular format
flattened = metadata.flatten()
# Add schema enrichment (descriptions and examples)
enriched = EnrichedFlattenedMetadata(flattened.rows, schema_dir='schemas')
# Get enriched DataFrame
df = enriched.to_pandas()
# Export to various formats
enriched.to_csv('output.csv')
enriched.to_excel('output.xlsx')
enriched.to_excel('output_multi.xlsx', separate_sheets=True) # One sheet per top-level key
enriched.to_markdown('output.md')You can also load a flat Excel/CSV file, reconstruct the nested dict, and
optionally write YAML. This workflow expects columns named Number, Key,
and Value and is intended for unflattening back to dict/YAML.
An enriched Excel can also be loaded.
from mdstools.metadata.flattened_metadata import FlattenedMetadata
flattened = FlattenedMetadata.from_excel("generated/example_metadata.xlsx")
metadata = flattened.unflatten()
data = metadata.data # Nested dict
metadata.to_yaml("generated/example_metadata.yaml")pixi run test # Run all tests
pixi run doctest # Run doctests only
pixi run test-comprehensive # Run integration tests onlyor all
pixi run -e dev test-allGenerate resolved (single-file) JSON schemas from the modular schema pieces:
pixi run resolve-schemasThis resolves all $ref references and writes the combined schemas to schemas/.
After intentional changes to schema pieces, update the expected baseline files:
pixi run update-expected-schemasTo validate the example files against the JSON schemas:
pixi run validate