GitHub - DIGIT-X-Lab/MOSAICX: Medical cOmputational Suite for Advanced Intelligent eXtraction of Healthcare data using local LLMs

Developed by the DIGIT-X Lab at LMU Munich University
Quietly ambitious about the hard things

🧬 Structure first. Insight follows.

Medical data is inherently complex, unstructured, and heterogeneous. Before we can unlock meaningful patterns, predict outcomes, or enable clinical decision support, we must first impose order on chaos. MOSAICX embodies this fundamental principle: structured data is the prerequisite for knowledge discovery.

In healthcare, unstructured documents—radiology reports, clinical notes, pathology summaries—contain critical information locked in narrative text. MOSAICX transforms this chaos into validated, machine-readable structures using AI-driven schema generation and extraction pipelines. Only when data is properly structured can we apply advanced analytics, machine learning, and knowledge graphs to generate actionable insights.

Core Capabilities:

🔬 Schema Generation: Transform natural language descriptions into validated Pydantic models
📄 Document Extraction: Convert PDFs and clinical documents to structured JSON using generated schemas
📊 Clinical Summarization: Generate timeline-based summaries of radiology reports with standardized outputs
⚡ CLI & API: Powerful command-line interface and Python API for production workflows
🏥 Privacy-First: Process sensitive medical data locally using Ollama-compatible LLMs
🎯 Production-Ready: Robust error handling, validation, and reproducible outputs
🌐 Demo WebApp: Interactive web interface for demonstrations and testing

Powered by local LLMs via Ollama, PDF processing via Docling, and strict validation via Pydantic v2

🌐 Demo WebApp

Interactive web interface for demonstrations and testing only. Use CLI/API for production workflows.

Quick Demo Start:

cd webapp && ./start.sh

Demo Features:

🔬 Smart Contract Generator: Create Pydantic schemas from natural language
📄 PDF Extractor: Drag-and-drop PDF processing
📊 Report Summarizer: Timeline-based clinical analysis

Access Demo: http://localhost:3000 | Full Setup Guide →

Requirements:

Docker: Desktop or Engine 20.10+
RAM: 16GB+ (32GB recommended for large models)
Storage: 10GB+ for containers and models
GPU: Optional but recommended for large models

Architecture Notes:

Option 1: WebApp containers → Host Ollama (via host.docker.internal:11434)
Option 2: WebApp containers → Ollama container (via internal Docker network)

Features:

🔬 Smart Contract Generator: Create Pydantic models from natural language
📄 PDF Extractor: Drag-and-drop PDF processing with real-time results
📊 Report Summarizer: Timeline-based clinical report analysis
📋 Sample Data: Pre-loaded medical PDFs and schema templates
🎨 Glass Morphism UI: Electric cyan theme with professional medical interface

→ Full WebApp Documentation

🚀 Installation & Setup

System Requirements

Python: 3.11+ (3.12 recommended)
Operating System: macOS, Linux, Windows (with WSL2)
Memory: 16GB RAM minimum, 32GB recommended
Storage: 10GB free space for models

Step 1: Install Ollama

# macOS/Linux (automatic installation)
curl -fsSL https://ollama.com/install.sh | sh

# Windows: Download from https://ollama.com/download/windows

# Start Ollama service
ollama serve

Step 2: Install MOSAICX

# Using pip (standard)
pip install mosaicx

# Using uv (faster dependency resolution)
uv add mosaicx

# Using pipx (isolated installation)
pipx install mosaicx

# Development installation
git clone https://github.com/LalithShiyam/MOSAICX.git
cd MOSAICX
pip install -e .

Step 3: Download Required Models

# Default model (recommended for most use cases)
ollama pull gpt-oss:120b

# Alternative models
ollama pull llama3.1:8b-instruct    # Smaller, faster
ollama pull qwen2.5:7b-instruct     # Good balance
ollama pull deepseek-r1:7b          # Reasoning model

# Verify installation
mosaicx --version

Step 4: Quick Test

# Test connection to Ollama
mosaicx generate --desc "Simple patient record with name and age" --class-name TestModel

🔬 Usage Guide

Command Overview

MOSAICX provides three main commands with extensive options:

mosaicx --help                    # Show all commands
mosaicx generate --help           # Schema generation options
mosaicx extract --help            # Document extraction options  
mosaicx summarize --help          # Report summarization options
mosaicx schemas --help            # Schema management options

Default Settings

Model: gpt-oss:120b (configurable via --model)
Temperature:
- Schema generation: 0.2 (balanced creativity)
- Data extraction: 0.0 (deterministic)
- Summarization: 0.2 (slight creativity for readability)
Base URL: http://localhost:11434/v1 (Ollama default)
API Key: ollama (Ollama default)

1. Schema Generation from Natural Language

Transform clinical requirements into validated Pydantic models:

# Basic usage (uses defaults)
mosaicx generate \
  --desc "Echocardiography report with patient demographics, LVEF, valve grades, impression"

# Generated Pydantic Model:
```python
from pydantic import BaseModel, Field
from datetime import datetime
from typing import Literal, Optional

class EchocardiographyReport(BaseModel):
    """Echocardiography report with patient demographics, LVEF, valve grades, impression"""
    
    patient_id: str = Field(..., description="Unique patient identifier")
    patient_name: str = Field(..., description="Patient full name")
    date_of_birth: datetime = Field(..., description="Patient date of birth")
    exam_date: datetime = Field(..., description="Date of echocardiogram examination")
    lvef_percent: float = Field(..., ge=0, le=100, description="Left ventricular ejection fraction (%)")
    mitral_valve_grade: Literal["Normal", "Mild", "Moderate", "Severe"] = Field(
        ..., description="Mitral valve regurgitation severity"
    )
    aortic_valve_grade: Literal["Normal", "Mild", "Moderate", "Severe"] = Field(
        ..., description="Aortic valve stenosis/regurgitation severity"
    )
    tricuspid_valve_grade: Literal["Normal", "Mild", "Moderate", "Severe"] = Field(
        ..., description="Tricuspid valve regurgitation severity"
    )
    clinical_impression: str = Field(..., min_length=10, description="Cardiologist's clinical impression")

Advanced usage with custom settings:

mosaicx generate \
  --desc "Complete blood count with patient ID, test date, hemoglobin, hematocrit, WBC count, differential counts, and reference ranges" \
  --class-name CBCReport \
  --model llama3.1:8b-instruct \
  --temperature 0.1 \
  --schema-path schemas/cbc_report.py

# Generated Pydantic Model:
```python
from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional

class CBCReport(BaseModel):
    """Complete blood count with patient ID, test date, hemoglobin, hematocrit, WBC count, differential counts, and reference ranges"""
    
    patient_id: str = Field(..., description="Unique patient identifier")
    test_date: datetime = Field(..., description="Date when CBC test was performed")
    hemoglobin: float = Field(..., ge=0, le=25, description="Hemoglobin level in g/dL")
    hematocrit: float = Field(..., ge=0, le=70, description="Hematocrit percentage")
    wbc_count: float = Field(..., ge=0, description="White blood cell count (thousands/μL)")
    neutrophils_percent: float = Field(..., ge=0, le=100, description="Neutrophils percentage")
    lymphocytes_percent: float = Field(..., ge=0, le=100, description="Lymphocytes percentage")
    monocytes_percent: float = Field(..., ge=0, le=100, description="Monocytes percentage")
    eosinophils_percent: float = Field(..., ge=0, le=100, description="Eosinophils percentage")
    basophils_percent: float = Field(..., ge=0, le=100, description="Basophils percentage")
    hemoglobin_ref_range: str = Field(..., description="Reference range for hemoglobin")
    hematocrit_ref_range: str = Field(..., description="Reference range for hematocrit")
    wbc_ref_range: str = Field(..., description="Reference range for WBC count")

Available Options:

--desc (required): Natural language description
--class-name: Pydantic class name (default: "GeneratedModel")
--model: LLM model to use (default: "gpt-oss:120b")
--temperature: Sampling temperature 0.0-2.0 (default: 0.2)
--schema-path: Write the generated schema to this file
--base-url: Custom API endpoint
--api-key: Custom API key
--debug: Enable verbose logging

2. Document Extraction to Structured Data

Extract structured information from clinical documents:

# Basic extraction
mosaicx extract \
  --document patient_reports/echo_001.pdf \
  --schema EchoReport

# Advanced extraction with custom model
mosaicx extract \
  --document "case studies/complex_cardiology_report.pdf" \
  --schema CBCReport_20250925_143022 \
  --model qwen2.5:7b-instruct \
  --save results/structured_data.json

Supported formats include PDF, DOC/DOCX, PPT/PPTX, TXT/MD, and RTF—mix them freely in a single run.

Behind the scenes MOSAICX layers extraction: native Docling text, then forced OCR, and finally Gemma3:27b via Ollama for vision-language transcription when required.

Example CLI output (abridged – actual Rich formatting includes colors and panels):

📋 Extraction results based on schema: EchoReport

Field                  Extracted Value
patient_id             ECG-001-2025
exam_date              2025-09-15T00:00:00
lvef_percent           55.0
mitral_valve_grade     Mild
aortic_valve_grade     Normal
tricuspid_valve_grade  Normal
clinical_impression    Normal left ventricular systolic function...

📁 Extraction saved
JSON: results/structured_data.json

3. Clinical Report Summarization

Generate timeline-based summaries from radiology reports:

# Single patient, multiple reports
mosaicx summarize \
  --report patient_001/ct_baseline.pdf \
  --report patient_001/ct_3month.pdf \
  --report patient_001/ct_6month.pdf \
  --patient P001 \
  --json-out summaries/P001_longitudinal.json

# Process entire directory
mosaicx summarize \
  --dir ./radiology_reports/patient_P001/ \
  --patient P001 \
  --model llama3.1:8b-instruct \
  --temperature 0.1 \
  --json-out P001_summary.json

Supported formats include PDF, DOC/DOCX, PPT/PPTX, TXT/MD, and RTF—mix them freely in a single run.

Example CLI output (abridged – actual Rich formatting includes colors and panels):

Patient: P001
DOB: —        Sex: —        Updated: 2025-09-25T14:30:22Z

Timeline
Date         Source                    Critical Note
2025-08-01   CT Chest/Abdomen/Pelvis   Baseline study: Multiple pulmonary nodules...
2025-09-15   CT Chest Follow-up        Interval growth: RUL nodule now 12mm...

Overall Summary
Progressive pulmonary nodular disease with interval growth of the RUL lesion and new LLL nodule. [Source: CT Chest Follow-up]

🐍 Using MOSAICX as a Python Library

The CLI features are also exposed as pure Python helpers so you can script or integrate them into other services.

from pathlib import Path

from mosaicx import (
    extract_pdf,
    generate_schema,
    summarize_reports,
)

# 1) Generate a Pydantic schema from a plain-language description
schema = generate_schema(
    "Patient vitals with name, heart rate, systolic_bp, diastolic_bp",
    class_name="PatientVitals",
    model="gpt-oss:120b",
)
schema_path = schema.write(Path("schemas/patient_vitals.py"))

# 2) Extract structured data from a PDF using that schema
extraction = extract_pdf(
    pdf_path="tests/datasets/sample_patient_vitals.pdf",
    schema_path=schema_path,
)
payload = extraction.to_dict()

# 3) Summarize one or more clinical reports
summary = summarize_reports(
    paths=["tests/datasets/sample_patient_vitals.pdf"],
    patient_id="demo-patient",
)

Example Python output (illustrative values):

payload
{
    "patient_name": "John Doe",
    "heart_rate": 72,
    "systolic_bp": 118,
    "diastolic_bp": 76,
}

summary.overall
'Stable vital signs with normal heart rate and blood pressure. [Source: sample_patient_vitals.pdf]'

summary.timeline[0].model_dump()
{
    "date": None,
    "source": "sample_patient_vitals.pdf",
    "note": "Vitals within normal limits; no acute concerns.",
}

All helpers accept optional model, base_url, and api_key arguments; when omitted the defaults mirror the CLI (environment variables first, then local Ollama).

🎯 Why Structure Matters in Medical AI

At the DIGIT-X Lab, we believe that structure precedes insight. The proliferation of unstructured medical data—radiology reports, clinical notes, pathology summaries—represents both an opportunity and a challenge. While this data contains rich clinical knowledge, its unstructured nature makes it largely inaccessible to computational analysis.

Modern healthcare generates exabytes of unstructured text annually, yet most clinical decision support systems can only leverage structured fields from electronic health records. This fundamental disconnect limits our ability to develop robust clinical AI, conduct large-scale outcomes research, or enable personalized medicine approaches.

MOSAICX addresses this gap by:

Democratizing Data Structuring: Transforming natural language descriptions into production-ready data schemas without requiring deep technical expertise
Enabling Reproducible Extraction: Converting documents to validated JSON structures that can be reliably processed by downstream ML pipelines
Preserving Clinical Context: Maintaining semantic meaning while imposing computational structure through intelligent schema design
Supporting Privacy Requirements: Processing sensitive medical data locally without external API dependencies

The structured data produced by MOSAICX becomes the foundation for knowledge graphs, longitudinal analysis, cohort studies, and clinical prediction models. Structure first. Insight follows.

🔧 Advanced Features

Schema Registry Management

The schema registry tracks all generated Pydantic models for easy reuse:

# List all generated schemas with details
mosaicx schemas

# Filter by clinical domain or keywords
mosaicx schemas --description "cardiology"
mosaicx schemas --class-name "Echo"

# Clean up orphaned registry entries (files deleted outside MOSAICX)
mosaicx schemas --cleanup

# Scan and register existing schema files not tracked by registry
mosaicx schemas --scan

Available Schema Registry:

EchoReport_20250925_143022: Echocardiography report with LVEF and valve assessments
CBCReport_20250925_101530: Complete blood count with differential and references
PathologyReport_20250924_152045: Surgical pathology with tumor staging and margins

💡 Tip: Use schema ID, filename, or file path in extract commands

Batch Processing

Process multiple documents or directories efficiently:

# Batch summarization for multiple patients
for patient_dir in ./patients/*/; do
  patient_id=$(basename "$patient_dir")
  mosaicx summarize \
    --dir "$patient_dir" \
    --patient "$patient_id" \
    --json-out "summaries/${patient_id}_summary.json"
done

# Batch extraction using same schema
find ./reports -name "*.pdf" -exec mosaicx extract \
  --document {} \
  --schema UniversalLabReport \
  --save "structured_data/{}.json" \;

Custom Model Endpoints

Use alternative LLM providers or local deployments:

# OpenAI API
mosaicx generate \
  --desc "Pathology report with tumor staging" \
  --base-url https://api.openai.com/v1 \
  --api-key sk-your-openai-key \
  --model gpt-4-turbo

# Local LM Studio
mosaicx extract \
  --document report.pdf \
  --schema PathologyReport \
  --base-url http://localhost:1234/v1 \
  --api-key lm-studio \
  --model local-medical-llm

# Custom medical LLM deployment
mosaicx summarize \
  --dir ./radiology_reports/ \
  --base-url https://your-medical-llm.hospital.com/v1 \
  --api-key your-internal-key \
  --model hospital-radiology-model

Inference Backends for Structured Generation

MOSAICX uses Outlines for grammar-constrained generation, ensuring the LLM always produces valid JSON matching your Pydantic schema. Multiple backends are supported:

Backend	Use Case	Flag
`ollama`	Local Ollama server (default)	`--backend ollama`
`vllm`	vLLM server with native structured generation	`--backend vllm`
`openai`	OpenAI API or any OpenAI-compatible endpoint	`--backend openai`
`llamacpp`	Local GGUF models via llama-cpp-python	`--backend llamacpp`
`anthropic`	Anthropic Claude API	`--backend anthropic`
`sglang`	SGLang runtime server	`--backend sglang`
`tgi`	HuggingFace Text Generation Inference	`--backend tgi`

Examples:

# vLLM server (uses vLLM's native guided decoding)
mosaicx extract \
  --document report.pdf \
  --schema PatientRecord \
  --backend vllm \
  --base-url http://localhost:8000/v1 \
  --model openai/gpt-oss-20b

# llama.cpp server (OpenAI-compatible API)
mosaicx extract \
  --document report.pdf \
  --schema PatientRecord \
  --backend openai \
  --base-url http://localhost:8080/v1 \
  --model ggml-org/gpt-oss-120b-GGUF

# Local GGUF model (direct loading via llama-cpp-python)
mosaicx extract \
  --document report.pdf \
  --schema PatientRecord \
  --backend llamacpp \
  --model gpt-oss:20b

# Together AI / Groq / Fireworks (OpenAI-compatible providers)
mosaicx extract \
  --document report.pdf \
  --schema PatientRecord \
  --backend openai \
  --base-url https://api.together.xyz/v1 \
  --api-key $TOGETHER_API_KEY \
  --model mistralai/Mixtral-8x7B-Instruct-v0.1

Backend Auto-Detection: If --backend is not specified, MOSAICX automatically detects the backend from URL patterns and model names:

Port :11434 or model in Ollama → ollama
Port :8000 or /v1 endpoint → vllm
gpt-oss:20b or gpt-oss:120b → llamacpp
api.openai.com → openai
api.anthropic.com → anthropic

Environment Variables

Set default values to avoid repetitive command-line options:

# Set default model and endpoint
export OPENAI_BASE_URL="http://localhost:11434/v1"
export OPENAI_API_KEY="ollama"
export MOSAICX_DEFAULT_MODEL="gpt-oss:120b"

# Now use simplified commands
mosaicx generate --desc "Simple patient record"
mosaicx extract --document report.pdf --schema PatientRecord

📋 Best Practices & Model Selection

Recommended Models by Use Case

Model	Size	Use Case	Memory	Speed	Accuracy
`gpt-oss:120b`	~120B	Complex schemas, high accuracy	64GB+	Slow	★★★★★
`llama3.1:8b-instruct`	~8B	Balanced performance	16GB+	Fast	★★★★☆
`qwen2.5:7b-instruct`	~7B	Batch processing	12GB+	Fastest	★★★☆☆
`deepseek-r1:7b`	~7B	Reasoning tasks	16GB+	Medium	★★★★☆

Default Model: gpt-oss:120b provides the best accuracy for medical schema generation and extraction tasks.

Schema Design Guidelines

✅ Good Schema Design:

# Descriptive field names with medical terminology
class EchocardiographyReport(BaseModel):
    patient_id: str = Field(..., description="Unique patient identifier")
    exam_date: datetime = Field(..., description="Date of echocardiogram")
    lvef_percent: float = Field(..., ge=0, le=100, description="Left ventricular ejection fraction (%)")
    mitral_valve_grade: Literal["Normal", "Mild", "Moderate", "Severe"] = Field(
        ..., description="Mitral valve regurgitation severity"
    )
    clinical_impression: str = Field(..., min_length=10, description="Cardiologist's interpretation")

❌ Poor Schema Design:

# Vague field names, no validation, poor descriptions
class Report(BaseModel):
    data: str
    values: list
    result: float

Extraction Optimization

Document Preparation:

Ensure PDFs have searchable text layers (not just scanned images)
Use OCR preprocessing for scanned documents: tesseract input.pdf output.pdf
Remove password protection from PDFs before processing

Parameter Tuning:

Temperature 0.0: Deterministic extraction for consistent results
Temperature 0.1-0.2: Slight variation for creative schema generation
Higher models: Use for complex medical terminology and relationships

Validation Best Practices:

Always review extracted data for clinical accuracy
Implement post-processing validation against medical standards
Use enum constraints for standardized medical values
Set appropriate ranges for numeric clinical measurements

Production Deployment

Performance Optimization:

# Use quantized models for faster inference
ollama pull llama3.1:8b-instruct-q4_0    # 4-bit quantization

# Process in batches to maximize GPU utilization
# Use parallel processing for independent documents

Error Handling:

# Enable debug mode for troubleshooting
mosaicx extract --document document.pdf --schema MySchema --debug

# Implement retry logic for production systems
# Validate outputs against clinical standards
# Log failed extractions for manual review

🏥 Clinical Applications

Research & Analytics

Cohort Studies: Structure clinical notes for population-level analysis
Outcomes Research: Extract standardized endpoints from heterogeneous reports
Quality Metrics: Automate clinical quality measure extraction
Biomarker Discovery: Structure pathology and lab reports for analysis

Clinical Decision Support

Risk Stratification: Extract risk factors into computable formats
Care Pathway Optimization: Structure clinical workflows and outcomes
Longitudinal Tracking: Generate patient timelines from multiple reports
Adverse Event Detection: Structure safety data from clinical narratives

Operational Excellence

Revenue Cycle: Extract billable procedures and diagnoses
Compliance Reporting: Structure regulatory reporting requirements
Care Coordination: Generate structured handoff summaries
Quality Assurance: Standardize report review workflows

⚡ Performance & Scalability

Local Processing Benefits

Privacy Compliance: No PHI transmitted to external services
Cost Efficiency: Eliminate per-token API costs for large-scale processing
Latency Optimization: Sub-second processing for typical clinical documents
Offline Capability: Process data in air-gapped environments

Hardware Recommendations

Minimum: 16GB RAM, modern CPU (M1/M2 Mac, Intel i7/AMD Ryzen 7)
Recommended: 32GB RAM, GPU acceleration (RTX 4080/4090, M2 Max/Ultra)
High-throughput: 64GB+ RAM, multiple GPUs for batch processing

🔍 Troubleshooting Guide

Installation Issues

Python Version Compatibility:

# Check Python version (requires 3.11+)
python --version

# Install specific Python version if needed
pyenv install 3.12.0
pyenv global 3.12.0

Dependency Conflicts:

# Use virtual environment to isolate dependencies
python -m venv mosaicx-env
source mosaicx-env/bin/activate  # macOS/Linux
# mosaicx-env\Scripts\activate    # Windows

pip install mosaicx

Runtime Issues

Issue	Cause	Solution
`Connection refused`	Ollama not running	`ollama serve`
`Model not found`	Model not downloaded	`ollama pull model-name`
`Empty extraction`	Poor model/temperature	Try `gpt-oss:120b` with `--temperature 0.0`
`PDF processing error`	Scanned PDF without text	Use OCR: `tesseract input.pdf output.pdf`
`Memory error`	Model too large	Use quantized model: `llama3.1:8b-instruct-q4_0`
`JSON validation error`	Malformed output	Enable `--debug` and check model output
`Schema not found`	Registry out of sync	Run `mosaicx schemas --scan`

Debug Mode

Enable verbose logging to diagnose issues:

# Enable debug for all commands
mosaicx --debug generate --desc "Test schema"
mosaicx extract --document document.pdf --schema MySchema --debug
mosaicx summarize --dir ./reports --debug

# Check Ollama status
ollama list              # Show downloaded models
ollama ps               # Show running models
curl http://localhost:11434/api/tags  # API health check

Common Error Messages

"Schema class 'MySchema' not found"

# Check available schemas
mosaicx schemas

# Regenerate if missing
mosaicx generate --desc "Your schema description" --class-name MySchema

"No text extracted from PDF"

# Test PDF text extraction
python -c "
from docling.document_converter import DocumentConverter
converter = DocumentConverter()
result = converter.convert('your_file.pdf')
print(result.document.text)
"

"Temperature must be between 0.0 and 2.0"

# Fix temperature value
mosaicx generate --desc "Test" --temperature 0.2  # Valid range: 0.0-2.0

Performance Issues

Slow Processing:

Use smaller models: llama3.1:8b-instruct instead of gpt-oss:120b
Increase available RAM or use quantized models (q4_0 suffix)
Process documents in smaller batches

High Memory Usage:

Close other applications
Use quantized models
Process one document at a time

Inaccurate Results:

Use larger, more capable models
Lower temperature for more deterministic output
Improve schema descriptions with more specific field definitions
Review and refine extracted data manually

Getting Help

Log Analysis:

# Enable maximum verbosity
export MOSAICX_LOG_LEVEL=DEBUG
mosaicx extract --document document.pdf --schema MySchema --debug > debug.log 2>&1

System Information:

# Gather system info for bug reports
mosaicx --version
python --version
ollama --version
pip show mosaicx

For additional support:

GitHub Issues: Report bugs and feature requests
Research Inquiries: lalith.shiyam@med.uni-muenchen.de
Commercial Support: lalith@zenta.solutions

🎓 From DIGIT-X Lab

MOSAICX is developed by the DIGIT-X Lab at LMU Munich University, a research group focused on digital transformation in radiology and medical imaging. Our mission is to bridge the gap between clinical practice and computational methods through practical, privacy-preserving tools.

Research Focus Areas:

Medical Image Analysis & AI
Clinical Natural Language Processing
Healthcare Data Standardization
Privacy-Preserving Medical AI
Radiomics & Quantitative Imaging

Team: Led by researchers and clinicians who understand both the technical challenges and clinical requirements of modern healthcare data processing.

We are quietly ambitious about the hard things.

📝 License & Citation

MOSAICX is released under the AGPL-3.0 license for academic and open-source use. For commercial applications in healthcare organizations, please contact us for licensing options.

Citation

@software{mosaicx2025,
  title={MOSAICX: Medical cOmputational Suite for Advanced Intelligent eXtraction},
  author={Sundar, Lalith Kumar Shiyam and DIGIT-X Lab},
  year={2025},
  institution={LMU Munich University},
  url={https://github.com/LalithShiyam/MOSAICX},
  note={Developed at DIGIT-X Lab, Department of Radiology}
}

🤝 Contributing & Support

We welcome contributions from the medical informatics and clinical AI communities:

Bug Reports: Submit issues with minimal reproducible examples
Feature Requests: Propose new clinical use cases and requirements
Documentation: Improve clinical examples and best practices
Code Contributions: Follow our development guidelines and testing requirements

Contact:

Research Inquiries: lalith.shiyam@med.uni-muenchen.de
Commercial Licensing: lalith@zenta.solutions
DIGIT-X Lab: https://www.linkedin.com/company/digitx-lmu/

Built with ❤️ for the medical community by researchers who understand that great clinical AI starts with great data structure.

MOSAICX is infrastructure for clinical data: schema-driven, validated, local, and reproducible. Structure reports once, then reuse the same schemas and summarizers across departments and time—enabling longitudinal analysis, cross-modal integration, and downstream intelligence without sending data to the cloud.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github		.github
assets		assets
docker		docker
mosaicx		mosaicx
schemas		schemas
scripts		scripts
tests		tests
webapp		webapp
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Dockerfile.slim		Dockerfile.slim
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
publish.sh		publish.sh
pypi-publish.sh		pypi-publish.sh
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

License

DIGIT-X-Lab/MOSAICX

Folders and files

Latest commit

History

Repository files navigation