Skip to content

A metadata-driven engine that compiles CDISC define.xml into executable validation logic.

Notifications You must be signed in to change notification settings

mnouira02/active-define-core

Repository files navigation

Active Define: The Clinical Semantic Layer

Dashboard Screenshot](assets/dashboard_demo.png)

Active Define transforms define.xml from a passive documentation artifact into an executable semantic graph.

Instead of hardcoding SQL or Python filters for every clinical trial (e.g., WHERE VSTESTCD='DIABP'), this engine reads the XML metadata contract and compiles validation logic dynamically.

🏗 Architecture

graph LR
    A[define.xml] -->|Compiler| B(Semantic Graph)
    C[Raw Data .xpt] -->|Engine| D{Auto-Filter}
    B --> D
    D --> E[Validated Subsets]
Loading

🚀 Project Structure

This project follows a production-grade modular architecture:

active-define/
│
├── data/                   # Raw inputs (define.xml, vs.xpt)
├── src/                    # Core Engine Logic
│   └── active_define/      # The Python Package (Parser, Transpiler)
├── scripts/                # Utility Tools
│   ├── setup.py            # Auto-provisions CDISC Pilot 01 test data
│   ├── visualize.py        # Generates Mermaid.js logic diagrams
│   └── export.py           # Compiles XML to clean JSON
├── notebooks/              # Interactive Demos
├── run_demo.py             # CLI Entry Point
└── compiled_metadata.json  # Artifact: The compiled logic schema

🛠 Usage

1. Setup

Install dependencies and download the official CDISC Pilot 01 dataset.

pip install -r requirements.txt
python scripts/setup.py

2. Run the Engine

Execute the semantic graph against the raw Vital Signs data.

python run_demo.py

Output: The engine will identify 6 unique Vital Sign definitions in the XML and automatically slice the 29,000+ row dataset into validated cohorts without hardcoded filters.

3. Generate Artifacts

Visualize the logic extracted from the XML.

python scripts/visualize.py

🧠 The "Why"

In traditional Clinical Programming, we manually write code that duplicates the logic in define.xml. Active Define proves that metadata can be Infrastructure as Code—driving the ingestion process automatically.

About

A metadata-driven engine that compiles CDISC define.xml into executable validation logic.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •