A multi-agent AI system for chest X-ray interpretation that produces clinically trustworthy, evidence-backed diagnoses with built-in safety guardrails, adversarial debate, and human-in-the-loop review.
VERIFAI orchestrates nine specialized AI agents through a LangGraph state machine to analyze chest X-rays, cross-reference patient history and medical literature, debate diagnostic uncertainty, and produce auditable diagnoses — all runnable on a single A100 / 24 GB+ GPU.
- Architecture Overview
- Agent Pipeline
- Monotonic Uncertainty Cascade (MUC)
- Key Features
- Project Structure
- Prerequisites
- Installation
- Environment Configuration
- Running the System
- Building the FHIR + FAISS Retrieval Index
- Frontend Dashboard
- Observability & Monitoring
- Testing
- API Reference
- Configuration Reference
- Technical Design Decisions
flowchart TD
A[🩻 Chest X-ray Input] --> B[Radiologist Agent\nMedGemma 4B-IT + MedSigLIP]
B --> C[CheXbert Agent\nStructured 14-label extraction]
C --> D{Parallel Evidence Gathering}
D --> E[Historian Agent\nFHIR + DuckDB + FAISS]
D --> F[Literature Agent\nPubMed + Europe PMC]
E --> G[Critic Agent\nAdversarial verification]
F --> G
G --> H[Debate\nDempster-Shafer fusion\nUp to 3 rounds]
H --> I[Validator\nCXR-RePaiR + RadGraph + Rules Engine]
I --> J{Verdict}
J -- FINALIZE --> K[✅ Final Diagnosis\nSHA-256 Hash + Evidence Packet]
J -- FLAG_FOR_HUMAN --> L[🔵 Human-in-the-Loop Review\nApprove / Reject + Re-enter at Critic]
L -- Approved --> K
L -- Rejected --> G
| # | Agent | Role | Model / Tool | Output |
|---|---|---|---|---|
| 1 | Radiologist | Analyze CXR images, generate free-text findings & impression | MedGemma 4B-IT (base) + MedSigLIP fine-tuned classifier + LRP heatmaps | Findings, impression, disease probabilities, attention heatmaps |
| 2 | CheXbert | Extract 14 structured pathology labels from the report text | f1chexbert BERT model |
Present / absent / uncertain per label |
| 3 | Historian | Retrieve supporting & contradicting patient history from EHR | DuckDB + FAISS vector search + shared MedGemma | Clinical facts, FHIR resource IDs, clinical summary |
| 4 | Literature | Search PubMed / Europe PMC for biomedical evidence | BioPython E-Utilities + Europe PMC REST API | Ranked citations, evidence synthesis |
| 5 | Critic | Adversarial validation — detect overconfidence, surface past mistakes | Rule-based analysis + Past-Mistakes FAISS | Safety score, concern flags, uncertainty delta |
| 6 | Debate | Multi-round structured debate with Dempster-Shafer fusion | LangGraph orchestration (up to 3 rounds) | Consensus direction, confidence adjustment |
| 7 | Validator | Three-layer quality gate before finalizing | MedSigLIP FAISS (visual) + RadGraph NLP + Rules Engine | FINALIZE / FINALIZE_LOW_CONFIDENCE / FLAG_FOR_HUMAN |
| 8 | Finalize | Build the final diagnosis with a reproducibility hash | SHA-256 + Pydantic | FinalDiagnosis (diagnosis, confidence, hash, evidence packet) |
| 9 | Human Review | Doctor approves or rejects with feedback | LangGraph interrupt() |
Approve → finalize; Reject → re-enter pipeline at Critic |
sequenceDiagram
participant U as Clinician
participant R as Radiologist
participant C as CheXbert
participant H as Historian
participant L as Literature
participant Cr as Critic
participant D as Debate
participant V as Validator
participant F as Finalize
U->>R: Upload CXR (AP/PA/Lateral)
R-->>C: Free-text findings + impression
C-->>H: 14 structured pathology labels
C-->>L: 14 structured pathology labels
par Evidence Gathering (parallel)
H-->>Cr: FHIR clinical facts
L-->>Cr: PubMed citations
end
Cr-->>D: Safety score + concerns
D-->>V: Consensus + adjusted confidence
V-->>F: Recommendation
F-->>U: Final diagnosis + SHA-256 hash
alt Flagged for Human Review
F-->>U: Review panel — Approve or Reject
U-->>Cr: Feedback injected, pipeline re-runs
end
Every agent updates a single shared value system_uncertainty ∈ [0.05, 0.95] through Bayesian log-odds updates. This gives the system a live, cascading confidence measure that rises when agents disagree and falls when they confirm each other.
Core update rule:
IG(k) = α · confidence(k) + β · alignment(k) + γ · direction(k)
U_sys(k) = clamp( U_sys(k-1) − IG(k), 0.05, 0.95 )
U_sys starts at 1.0 (maximum uncertainty) and decreases as confirming evidence arrives. A contradiction (direction = −1) or low confidence reduces IG, keeping entropy high.
graph LR
A["U_sys = 1.0\nat start"] --> B[Radiologist\naligns with image → ↓ U]
B --> C[CheXbert\nlabel match → ↓ U]
C --> D[Historian/Literature\ncontradiction → ↑ U]
D --> E[Critic\npast mistake found → ↑ U]
E --> F[Debate consensus → ↓ U]
F --> G["U_sys final\n(shown in UI chart)"]
| Agent | Confirming Signal | Contradicting Signal |
|---|---|---|
| Radiologist | High MedSigLIP disease probability | Low classification score |
| CheXbert | Label matches impression | Label absent in free text |
| Historian | FHIR conditions support the diagnosis | Contradicting clinical facts |
| Literature | Evidence corroborates findings | Literature contradicts impression |
| Critic | No overconfidence / no past mistakes | Overconfidence flag raised |
| Debate | All agents reach consensus | Agents violently disagree (K ≥ 0.99) |
| Validator | High visual retrieval similarity + entity match | Entity F1 weak / retrieval consensus mismatch |
The Debate stage uses Dempster-Shafer evidence fusion. When agents strongly conflict (K ≥ 0.99), the system remains uncertain rather than picking a winner arbitrarily.
The uncertainty history is streamed live to the frontend and rendered as an SVG line chart in the results page.
- Multi-Agent Orchestration — 9 specialized agents coordinated via LangGraph state machine with typed state
- Multi-View CXR Support — Accepts AP, PA, and Lateral views simultaneously via
<PA>,<AP>,<LATERAL>tokens - MUC Uncertainty Framework — Real-time, bidirectional cascading system entropy across the full pipeline
- Multi-Agent Debate — Up to 3 Dempster-Shafer-fused debate rounds before consensus
- Human-in-the-Loop — LangGraph interrupt-based doctor review; rejected diagnoses re-enter at the Critic with full context
- Medical Safety Guardrails — Rule-based + embedding-based checks for critical finding hallucinations
- Reproducibility Hash — SHA-256 fingerprint (image + FHIR + config) for FDA 21 CFR Part 11 audit trail
- Past Mistakes Memory — DuckDB HNSW vector index stores rejected diagnoses; Critic retrieves similar past errors via neural re-ranking (temporal recency + clinical relevance scoring)
- CheXbert Cross-Validation — Structured labels act as a second-opinion independent of the Radiologist's free text
- Validator Quality Gate — Three layers: visual FAISS retrieval (MedSigLIP), RadGraph NLP entity matching, clinical rules engine
- SSE Real-Time Streaming — Live agent progress streamed from backend to frontend via Server-Sent Events
- LRP Heatmaps — Transformer explainability via Chefer et al. (CVPR 2021) Layer-wise Relevance Propagation for MedSigLIP
- Observability Dashboard — Prometheus-style metrics (latency, confidence, safety scores, per-agent duration)
- Evidence Report Generator — HTML reports with citations, heatmaps, and full audit trail
- Shared Model Loader — Historian, Literature, and Critic share a single thread-safe MedGemma 4B instance (~9 GB VRAM total vs ~27 GB otherwise)
VERIFAI/
├── agents/ # AI Agent implementations
│ ├── radiologist/ # MedGemma 4B-IT VLM + MedSigLIP classifier + LRP
│ │ ├── model.py # Model loading (FP16) + VLM inference
│ │ ├── agent.py # Radiologist agent logic
│ │ ├── classifier.py # MedGemmaVisionHead: frozen SigLIP + trainable head
│ │ ├── lrp.py # Chefer et al. (CVPR 2021) LRP for SigLIP
│ │ └── prompts.py # Structured JSON generation prompts
│ ├── chexbert/ # CheXbert structured labeling
│ │ ├── agent.py # Extract 14 CXR condition labels
│ │ └── model.py # f1chexbert BERT model wrapper + transformers 5.x patch
│ ├── historian/ # FHIR patient history retrieval + clinical reasoning
│ │ ├── agent.py # DuckDB + FAISS vector search orchestrator
│ │ ├── fhir_client.py # FHIR R4 resource parser + FAISS retriever
│ │ └── reasoner.py # Clinical reasoning synthesizer (via shared MedGemma)
│ ├── literature/ # PubMed / Europe PMC search
│ │ ├── agent.py # Literature search orchestrator
│ │ ├── pubmed_entrez.py # BioPython E-Utilities wrapper
│ │ ├── europe_pmc.py # Europe PMC REST API
│ │ └── rate_limiter.py # Adaptive rate limiter (3 req/s NCBI)
│ ├── critic/ # Adversarial verification
│ │ ├── agent.py # Overconfidence detection + past mistakes retrieval
│ │ └── model.py # Rule-based linguistic certainty analysis
│ ├── debate/ # Multi-agent debate protocol
│ │ └── agent.py # 3-round Dempster-Shafer debate
│ ├── validator/ # Final quality gate
│ │ ├── agent.py # Validator orchestrator
│ │ ├── retrieval_tool.py # CXR-RePaiR: MedSigLIP FAISS visual retrieval
│ │ ├── radgraph_tool.py # RadGraph NLP entity matching + transformers 5.x patch
│ │ └── rules_engine.py # Clinical rules engine
│ └── feedback/ # Doctor feedback handler
│ └── agent.py # Rejection → re-enter pipeline at Critic
│
├── graph/ # LangGraph workflow
│ ├── state.py # VerifaiState TypedDict + Pydantic models
│ ├── workflow.py # Full graph + node wrappers + interrupt()
│ └── router.py # Uncertainty-based routing logic
│
├── app/ # FastAPI backend
│ ├── main.py # App entry point + middleware
│ ├── api.py # REST endpoints (start, status, resume, SSE)
│ ├── config.py # Settings (models, thresholds, feature flags)
│ ├── streaming.py # SSE event bus
│ └── shared_model_loader.py # Thread-safe MedGemma singleton
│
├── frontend/ # Next.js 14 dashboard (TypeScript + Tailwind CSS)
│ └── src/app/
│ ├── page.tsx # Landing page — "The Council"
│ ├── diagnose/page.tsx # Upload X-ray + start workflow
│ ├── results/[id]/page.tsx # Live results + SSE agent feed + HITL review
│ └── observability/page.tsx # Metrics dashboard
│
├── db/ # Database layer
│ ├── logger.py # Session-scoped structured logging (SQLite & Supabase)
│ ├── connection.py # SQLite connection pool + schema
│ ├── adapter.py # Database adapter (sqlite / supabase)
│ ├── past_mistakes.py # Past Mistakes DB (DuckDB + HNSW vector)
│ └── rerank_mistakes.py # Neural re-ranking (temporal decay + clinical relevance)
│
├── uncertainty/ # MUC framework
│ ├── muc.py # Monotonic Uncertainty Cascade (Information Gain)
│ └── kle.py # KL-divergence Epistemic uncertainty
│
├── safety/ # Medical safety guardrails
│ └── guardrails.py
│
├── monitoring/ # Observability
│ └── metrics.py # Prometheus-style counters + histograms
│
├── utils/
│ ├── evidence_report.py # HTML evidence report generator
│ └── inference.py # Robust JSON extraction from LLM output
│
├── tests/
│ ├── test_workflow.py # Primary end-to-end integration test
│ └── ... # Additional unit tests
│
├── scripts/
│ ├── build_retrieval_index.py # Build FAISS index from MIMIC-CXR
│ ├── install_radgraph_model.py# RadGraph model first-time setup
│ └── seed_pb.py # Seed past mistakes database
│
├── train_classifier.py # MedSigLIP classifier training
├── qlora_medgemma.py # QLoRA fine-tuning pipeline (optional)
└── docs/
└── PRINCIPLED_UNCERTAINTY.md # Full MUC derivation
| Requirement | Minimum | Recommended |
|---|---|---|
| Python | 3.10 | 3.10 |
| CUDA | 12.1 | 12.6 |
| GPU VRAM | 24 GB | 80 GB (A100) |
| RAM | 32 GB | 64 GB |
| Disk | 30 GB | 60 GB |
| Node.js | 18 | 20+ |
| OS | Ubuntu 22.04 | Ubuntu 22.04+ |
Required accounts:
- Hugging Face — Token for gated models (
google/medgemma-1.5-4b-it,google/medsiglip-448) - NCBI — API key for PubMed access (recommended)
- (Optional) Supabase — For cloud database logging
git clone https://github.com/your-org/VERIFAI.git
cd VERIFAIconda create -n verifai python=3.10 -y
conda activate verifaipip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Verify CUDA
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}, GPU: {torch.cuda.get_device_name(0)}')"pip install -r requirements.txtpython -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab')"python scripts/install_radgraph_model.pyRadGraph will auto-download from Hugging Face on first use if the local path is not found.
cd frontend
npm install
cd ..cp .env.example .envEdit .env with your values:
# ── Required ──────────────────────────────────────────────────────
HUGGINGFACE_TOKEN=hf_your_token_here # MedGemma/MedSigLIP access
# ── PubMed (Recommended) ──────────────────────────────────────────
NCBI_EMAIL=your.email@example.com # Required by NCBI policy
NCBI_API_KEY=your_ncbi_key # Enables 10 req/s (vs 3 req/s)
# ── Database ──────────────────────────────────────────────────────
DATABASE_MODE=sqlite # Use sqlite for local / supabase for cloud
# ── Models ────────────────────────────────────────────────────────
MEDGEMMA_4B_MODEL=google/medgemma-1.5-4b-it
MEDSIGLIP_BASE_MODEL=google/medsiglip-448
MEDSIGLIP_WEIGHTS_PATH=medsiglip_full_model.pt # Local fine-tuned classifier weights
# ── Optional: Supabase Cloud DB ───────────────────────────────────
SUPABASE_URL=https://xxx.supabase.co
SUPABASE_KEY=your_anon_key
SUPABASE_SERVICE_KEY=your_service_key
# ── Optional: Semantic Scholar ────────────────────────────────────
SEMANTIC_SCHOLAR_API_KEY=your_key
# ── Optional: Workflow Flags ──────────────────────────────────────
MOCK_MODELS=False # True = skip model download, use mocks (no GPU needed)
ENABLE_LLM_CRITIC=False # Extra MedGemma semantic critic pass (slower)Set
MOCK_MODELS=Truefor a first run to verify the pipeline wiring before downloading model weights.
conda activate verifai
export CUDA_VISIBLE_DEVICES=0
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
python tests/test_workflow.pyThis runs all 9 agents on a sample chest X-ray and prints:
- Radiologist findings & impression
- CheXbert structured labels
- Critic safety assessment
- Debate consensus
- Final diagnosis with confidence & reproducibility hash
- Full audit trace
# Terminal 1 — Backend
conda activate verifai
export CUDA_VISIBLE_DEVICES=0
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000
# Terminal 2 — Frontend
cd frontend
npm run dev
# → http://localhost:3000# Start workflow
curl -X POST http://localhost:8000/api/v1/workflows/start \
-F "images=@chest_xray_AP.jpg" \
-F "views=AP" \
-F "patient_id=patient-123"
# Response:
# {"session_id": "abc-123"}
# Poll status
curl http://localhost:8000/api/v1/workflows/abc-123/status
# Subscribe to live SSE progress
curl -N http://localhost:8000/api/v1/workflows/abc-123/stream| Model | Source | Purpose |
|---|---|---|
| MedGemma 4B-IT | google/medgemma-1.5-4b-it (HuggingFace, gated) |
Primary VLM for Radiologist, Historian, Literature synthesis |
| MedSigLIP | google/medsiglip-448 (HuggingFace, gated) + local medsiglip_full_model.pt |
Fine-tuned disease classifier + visual FAISS retrieval |
| CheXbert (f1chexbert) | f1chexbert pip package |
14-label structured pathology labeling |
| RadGraph | Auto-downloaded from HuggingFace | Clinical NLP entity extraction for Validator |
| all-MiniLM-L6-v2 | sentence-transformers/all-MiniLM-L6-v2 |
Semantic embeddings for FHIR retrieval + Critic |
MedGemma is used as the BASE model (no LoRA adapters). The fine-tuned MedSigLIP classifier (
medsiglip_full_model.pt) must be placed at the path specified byMEDSIGLIP_WEIGHTS_PATHin.env.
The Historian agent uses a FAISS vector index over patient FHIR records for semantic retrieval.
python extract_fhir_to_duckdb.py \
--fhir_dir path/to/fhir/bundles \
--output verifai_fhir.duckdbpython scripts/build_retrieval_index.py \
--duckdb_path verifai_fhir.duckdb \
--output_faiss verifai_fhir.faiss \
--output_mapping verifai_fhir_mapping.jsonThis produces:
verifai_fhir.faiss— Vector index for fast similarity searchverifai_fhir_mapping.json— Maps FAISS IDs to FHIR resource IDsverifai_fhir.duckdb— Structured patient data for SQL queries
python scripts/seed_pb.pyThe frontend is a Next.js 14 app (TypeScript + Tailwind CSS) with a premium dark-themed clinical interface.
| Route | Description |
|---|---|
/ |
Landing page — "The Council" of 6 agents |
/diagnose |
Upload CXR + optional FHIR report + patient ID |
/results/[session_id] |
Live SSE agent feed → final diagnosis with evidence tabs |
/observability |
System metrics dashboard (auto-refreshes every 15s) |
| Tab | Content |
|---|---|
| Visual Proof | Original DICOM + MedSigLIP LRP heatmap side-by-side |
| Clinical | FHIR supporting & contradicting facts from Historian |
| Literature | PubMed citations with relevance summaries |
| Safety | Safety guardrails report (score, flags, critical findings) |
| Audit Trail | SHA-256 hash + full execution trace |
When the Validator flags a case for human review, a Human-in-the-Loop panel appears on the results page. The clinician can:
- Approve the diagnosis — marks the workflow as completed
- Provide Feedback + Rerun — injects clinical notes, re-runs the pipeline from the Critic with a live SSE feed showing the rerun progress
# Metrics snapshot (JSON)
curl http://localhost:8000/api/v1/metrics/summaryReturns:
system— total workflows, active, deferrals, critical findingsagents— per-agent latency (mean, p95, p99), invocation countsdiagnostics— confidence, uncertainty, safety score distributions, debate roundssafety— safety flags, errors by component
Visual dashboard available at http://localhost:3000/observability.
When running via test_workflow.py, metrics are saved to metrics_snapshot.json and automatically served by the API.
| Mode | When to use | Tables |
|---|---|---|
DATABASE_MODE=sqlite |
Local testing | SQLite at verifai_logs.db |
DATABASE_MODE=supabase |
Production / cloud | Supabase PostgreSQL (see db/supabase_schema.sql) |
Key tables: workflow_sessions, agent_invocations, radiologist_logs, critic_logs, historian_logs, debate_logs, validator_logs, doctor_feedback, past_mistakes.
# Full end-to-end integration test (recommended first test)
conda activate verifai
export CUDA_VISIBLE_DEVICES=0
python tests/test_workflow.py
# Unit tests
pytest tests/ -v
# Frontend build verification
cd frontend && npx next build| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/workflows/start |
Upload image(s) + FHIR data, start async workflow |
GET |
/api/v1/workflows/{id}/status |
Poll workflow status + results |
GET |
/api/v1/workflows/{id}/stream |
SSE stream for live agent progress |
POST |
/api/v1/workflows/{id}/resume |
Submit doctor feedback (approve / reject) |
POST |
/api/v1/safety/validate |
Run safety guardrails on a diagnosis string |
GET |
/api/v1/metrics/summary |
Observability metrics |
GET |
/api/v1/health |
Server health check |
POST |
/api/past-mistakes/insert |
Insert a validated diagnostic mistake |
POST |
/api/past-mistakes/search |
Search for similar past mistakes |
GET |
/api/past-mistakes/statistics |
Aggregate past-mistakes statistics |
curl -X POST http://localhost:8000/api/v1/workflows/start \
-F "images=@chest_xray_AP.jpg" \
-F "images=@chest_xray_LAT.jpg" \
-F "views=AP" \
-F "views=LATERAL" \
-F "patient_id=patient-123" \
-F "fhir_report=@patient_fhir_bundle.json"{
"session_id": "abc-123",
"status": "completed",
"final_result": {
"diagnosis": "Right lower lobe pneumonia with associated pleural effusion",
"confidence": 0.87,
"reproducibility_hash": "a3f9c2e1b7d4082f...",
"evidence_packet": { "...": "..." },
"trace": ["[RAD] Findings generated", "[CHEXBERT] 3 labels found", "..."]
}
}All settings live in app/config.py and can be overridden via .env:
| Variable | Default | Description |
|---|---|---|
MEDGEMMA_4B_MODEL |
google/medgemma-1.5-4b-it |
Base MedGemma model (no LoRA) |
MEDSIGLIP_BASE_MODEL |
google/medsiglip-448 |
Vision encoder backbone |
MEDSIGLIP_WEIGHTS_PATH |
medsiglip_full_model.pt |
Local fine-tuned classifier checkpoint |
TEXT_EMBEDDING_MODEL |
sentence-transformers/all-MiniLM-L6-v2 |
Semantic embeddings |
MOCK_MODELS |
False |
Skip real models, use mocks |
| Variable | Default | Description |
|---|---|---|
DATABASE_MODE |
sqlite |
sqlite (local) or supabase (cloud) |
DEBATE_MAX_ROUNDS |
3 |
Maximum debate rounds |
DEBATE_CONSENSUS_THRESHOLD |
0.15 |
Max disagreement delta for consensus |
MAX_ROUTING_STEPS |
5 |
Prevent infinite routing loops |
ENABLE_LLM_CRITIC |
False |
Enable MedGemma semantic critic pass |
ENABLE_PAST_MISTAKES_MEMORY |
True |
Historical error retrieval at Critic |
ENABLE_DOCTOR_FEEDBACK |
True |
Enable feedback reprocessing loop |
USE_DEBATE_WORKFLOW |
True |
Enable multi-agent debate |
USE_PARALLEL_AGENTS |
True |
Run Historian + Literature in parallel |
| Variable | Required | Description |
|---|---|---|
HUGGINGFACE_TOKEN |
✅ | Access gated HuggingFace models |
NCBI_EMAIL |
✅ | PubMed API policy requirement |
NCBI_API_KEY |
Recommended | Higher rate limits (10 req/s) |
SEMANTIC_SCHOLAR_API_KEY |
Optional | Semantic Scholar access |
SUPABASE_URL / SUPABASE_KEY |
Optional | Cloud database logging |
A single LLM producing diagnoses has no internal checks — it can hallucinate confidently. VERIFAI uses structured adversarial verification:
graph LR
A["Radiologist\n(may hallucinate)"] -->|Free-text report| B["CheXbert\n(independent label check)"]
B -->|14 structured labels| C["Critic\n(challenges confidence)"]
A -->|Impression| D["Literature\n(evidence check)"]
A -->|Impression| E["Historian\n(FHIR cross-check)"]
C & D & E -->|Multi-agent debate| F["Validator\n(3-layer quality gate)"]
F --> G["Final Diagnosis\n(with SHA-256 audit trail)"]
Each agent provides an independent, adversarial check on the previous. No single model can silently propagate a hallucination through all layers.
- Typed state (
TypedDict) shared across all agents — no message-passing overhead - Checkpointing — workflow can be interrupted and resumed mid-run (critical for human review)
- Deterministic routing — graph edges, not LLM-decided next steps
- Thread safety — concurrent workflows with fully isolated state
FDA 21 CFR Part 11 requires electronic records to be auditable. The SHA-256 hash encodes which image was analyzed, what patient context was used, what model versions were active, and what configuration was set at the time of diagnosis. This provides provenance — not exact reproduction (LLMs are stochastic, but the context that produced the output is traceable).
This project is for research and educational purposes. Clinical deployment requires regulatory review and validation.