Enhancing Translational Abilities of Longitudinal Mental Health Applications: An Adaptive Approach to Idiographic Modelling by Leveraging Ontology-based Agentic AI

🐦‍🔥 PHOENIX Engine

The PHOENIX engine conceptualises mental health support as a closed-loop workflow that iteratively optimizes the digital intervention proposal based on multi-modal data from previous collection cycles.

(Personalized Hierarchical Optimization Engine for Navigating Insightful eXplorations)

🏛️ Academic Context

This research-grade software is being created for a Ghent University master's thesis that aims to enhance the clinical translation abilities of longitudinal mental health applications: toward an adaptive approach for idiographic modelling by using ontology-based multi-agentic workflows.

Field	Value
Institution	Ghent University
Author	Stijn Van Severen
Supervisors	Geert Crombez, Annick De Paepe

🧭 PHOENIX Scope

PHOENIX separates two concerns:

Core engine flow: clinical/analytic decision flow from intake to iterative model carry-over.
Research support flow: visualization, QA, and research reporting for validation and communication.

This separation keeps scientific validation transparent without mixing support tasks into core decision logic.

🔁 End-to-End Stage Map

PHOENIX is a modular, multi-agent system that starts from free-text complaints, builds an initial observation model, analyzes time-series dynamics, proposes targets/interventions, and packages iterative updates for the next cycle.

🐦‍🔥 PHOENIX Ontology with LLM-based Mappings

The following ontology was developed to support the PHOENIX engine's reasoning and decision-making processes.

🏗️ Technical Architecture

Five PHOENIX Ontologies

All stages are constrained by five stable ontologies that enforce structural guarantees across the full pipeline:

Ontology	Role	Source
CRITERION	Operationalized mental health variables (DSM-5-TR, RDoC)	`src/backend/SystemComponents/PHOENIX_ontology/separate/CRITERION/`
PREDICTOR	Hierarchically structured treatment-solution entities used to model actionable intervention pathways and candidate refinement space	`src/backend/SystemComponents/PHOENIX_ontology/separate/PREDICTOR/`
PERSON	Individual characteristics (demographics, comorbidity, history)	`src/backend/SystemComponents/PHOENIX_ontology/separate/PERSON/`
CONTEXT	Situational and environmental factors	`src/backend/SystemComponents/PHOENIX_ontology/separate/CONTEXT/`
HAPA	Health Action Process Approach (barriers, coping, phases)	`src/backend/SystemComponents/PHOENIX_ontology/separate/HAPA/`

Runtime Multi-Agent Design

In the real integrated pipeline, all decision-making stages use live LLM reasoning in normal operation, but they do so with different scaffolds. Step 01 combines always-on complaint decomposition, a local LLM critic loop, hybrid ontology retrieval, and optional final leaf adjudication, while later stages use explicit actor-critic loops with bounded iterations and heuristic fallback only for deterministic or degraded runs.

Integrated Step	Runtime Component	Live LLM Use	Critic Dimensions	Core Runtime Method
01	Complaint Operationalization Agent	Yes, always-on for free-text decomposition and local critic review; optional again for final leaf adjudication	schema_validity, coverage_grounding, atomicity_nonoverlap, granularity_fit, current_actionability	LLM decomposition + local LLM critic refinement + HTSSF retrieval (dense + BM25 + token overlap + fuzzy)
02	Initial Observation Model Constructor	Yes, real-time HyDE generation and structured model construction	predictor_grounding, criterion_continuity, ontology_strictness, evidence_quality	HyDE-based predictor RAG + actor-critic refinement
03	Treatment Target Identifier	Yes, real-time structured actor output when enabled	safety, domain_boundary, lineage_consistency	BFS candidate selector + idiographic-nomothetic fusion
04	Updated Observation Model Constructor	Yes, real-time structured actor output when enabled	safety, domain_boundary, lineage_consistency	BFS-guided hierarchical model update with ontology-constrained refinement
05	HAPA Intervention Mapper	Yes, real-time structured intervention generation when enabled	reasoning_quality, evidence_grounding, hapa_consistency, medical_safety	Barrier scoring: 0.60·predictor + 0.20·profile + 0.15·context + 0.05·complaint

Runtime interpretation:

Step 01 is not just retrieval: it always begins with LLM-based complaint decomposition, with no non-LLM fallback for that decomposition phase, and then runs a local structured LLM critic loop to review complaint coverage, granularity, overlap, and present-state actionability before mapping to ontology leaves.
Step 01 still uses hybrid retrieval for ontology grounding, and can optionally run an additional LLM adjudication pass to choose the final leaf or return UNMAPPED.
Steps 02, 03, 04, and 05 use live LLM calls in normal operation, with component-specific actor-critic prompts, bounded iterations, and heuristic fallback paths for deterministic or degraded runs.
The intervention module is Step 05 in the actual integrated pipeline, even if some earlier summaries compressed it into a four-stage abstraction.

Optional DAG orchestrator (src/backend/orchestrator.py): for complex tasks, a flexible orchestrator creates DAG-based parallel/sequential execution plans — otherwise the pipeline runs sequentially (primary evaluation path).

Hierarchical Updating Algorithm (HUA)

Quantitative backbone bridging EMA data to adaptive model weighting:

Readiness classifier — stationarity (ADF/KPSS), collinearity, effective sample size → tier selection (tv-gVAR / gVAR / GGM / correlation / descriptives)
Network time-series analyst — kernel-smoothed VAR(1), L1-penalized stationary gVAR, partial correlations (Ledoit-Wolf shrinkage), time-varying GIF animations
Momentary impact quantifier — leave-one-predictor-out MSE delta + coefficient magnitude composite
BFS candidate selector — score = 0.45·mapping + 0.25·HyDE + 0.20·idiographic_anchor + 0.10·domain_bonus

Adaptive idiographic-nomothetic weighting per cycle:

idiographic_weight = clamp(0.30 + 0.50 × readiness_score / 100)
nomothetic_weight  = 1.0 - idiographic_weight

Iterative Cycle Design

PHOENIX implements a breadth-first iterative algorithm across cycles:

Cycle N produces: criterion leaf, initial model, pseudodata, HUA results, treatment targets, HAPA intervention
Cycle N+1 seeds from Cycle N via a history ledger: impact scores → idiographic_anchor in BFS; prior cycle scores modulate domain_bonus; composite_score = 0.35·similarity + 0.25·impact[N] + 0.15·target_scores + 0.10·priority_scores + 0.15·quality_scores

🚀 Quick Setup of PHOENIX engine

1. Clone repository

git clone https://github.com/stvsever/ThesisMaster.git
cd MASTERPROEF

2. Create Python environment (3.11+)

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

3. Configure `.env` for LLM-enabled runs

Create or update .env in repository root:

OPENROUTER_API_KEY=<your_openrouter_key>
OPENAI_BASE_URL=https://openrouter.ai/api/v1

Runtime behavior:

OPENROUTER_API_KEY is primary.
Runtime mirrors it to OPENAI_API_KEY for backward-compatible scripts.
Default model is gpt-5-nano (resolved as openai/gpt-5-nano when routed via OpenRouter).

4. Optional smoke validation

If you want to quickly validate the integrated pipeline on a single profile with minimal iterations, you can run the smoke test:

make pipeline-smoke

🗂️ Repository Structure

A client-side graph creator (GitNexus) was used to generate a comprehensive knowledge graph of the entire codebase; its component interactions are provided below:

The main codebase is organized around src/ and evaluation/. Inside src/, the canonical runtime split is now src/frontend/ for the Flask application and src/backend/ for the engine, ontologies, shared runtime utilities, and architecture assets.

MASTERPROEF/
├── src/                            # Canonical application source tree
│   ├── backend/                      # Engine runtime, SystemComponents, utils, orchestrator, overview assets
│   ├── frontend/                     # Flask app, UI routes, runtime workspace integration
│   ├── __init__.py                   # Package root for `src.frontend` and `src.backend`
│   └── README.md                     # Architecture overview for the `src/` tree
├── evaluation/                     # Sequential scripts + integrated pipeline + QA/research
│   ├── sequential/                    # Stage-wise run_step.py scripts (00..08)
│   ├── integrated_pipeline/           # run_pipeline.py and run_engine_pipeline.py
│   ├── survey_analysis/               # 6-study evaluation framework with analysis scripts
│   └── quality_and_research/          # pytest suites, schema contracts, research reporting
├── docker/                         # Dockerfile + docker-compose for reproducible deployment
├── .github/                        # CI/CD workflows
├── pyproject.toml                  # Python package metadata and constraints
├── requirements.txt                # Dependency baseline
└── README.md                       # Root documentation

💻 Run PHOENIX from CLI

A. Standard integrated run

The following command executes the full PHOENIX pipeline with default settings, processing the synthetic_v1 dataset through all stages and generating comprehensive outputs:

python evaluation/integrated_pipeline/run_pipeline.py --mode synthetic_v1

B. Single profile selection

The following command runs the pipeline on the synthetic_v1 dataset but limits the execution to a single profile matching the pattern pseudoprofile_FTC_ID001. This allows for focused testing and debugging on a specific case:

python evaluation/integrated_pipeline/run_pipeline.py --mode synthetic_v1 \
  --pattern pseudoprofile_FTC_ID001 \
  --max-profiles 1

C. Iterative run (2 cycles)

The following command executes the PHOENIX pipeline for 2 complete cycles, allowing you to observe how the system iteratively refines its outputs based on previous cycle data. The --profile-memory-window 3 flag enables the system to retain information from the last 3 profiles for informed decision-making in subsequent cycles:

python evaluation/integrated_pipeline/run_pipeline.py --mode synthetic_v1 \
  --cycles 2 \
  --profile-memory-window 3

D. Deterministic mode (no LLM)

python evaluation/integrated_pipeline/run_pipeline.py --mode synthetic_v1 --disable-llm

Runtime note:

If a cycle is readiness_aligned and only contemporaneous correlation analysis is feasible, PHOENIX now applies a correlation-baseline impact fallback so downstream Step-03/04/05 and communication stages still execute and persist outputs.
If Step-02 model generation fails (for example provider/network failure), PHOENIX now builds complaint-grounded fallback Step-02 artifacts directly from Step-01 operationalization output, instead of copying unrelated historical profile artifacts.
For iterative cycles started via --start-from-pseudodata, PHOENIX now resolves initial_model_runs_root from the active run lineage (same run id) so Step-03/04 stay anchored to the current cycle history.

🖥️ Run PHOENIX from Frontend

Use the following command to start the Flask frontend:

python src/frontend/app.py
# or
python evaluation/integrated_pipeline/run_pipeline.py --ui

Open http://127.0.0.1:5050.

Frontend provides:

Intake for complaint/person/environment context
Live component status and streaming logs
One-click full end-to-end run from free-text complaint (with iterative cycle controls)
Step-level run controls and advanced configuration toggles
Wizard-style iterative execution: INTAKE → MODEL → DATA → ANALYSIS → INTERVENTION → MODEL (cycle N+1)
Interactive Chart.js dashboard — all visualizations are dynamic (no static PNGs in UI)
Canvas-based animated network visualization with per-frame scrubbing
Session persistence and cohort batch execution

📦 Outputs and Validation Targets

Integrated outputs are saved under:

evaluation/integrated_pipeline/runs/<run_id>/

Key artifacts to inspect:

00_operationalization/ through 10_research_reports/
pipeline_summary.json
llm_startup_health_check.json
Stage logs (stage.log, stage_events.jsonl, stage_trace.json)
Profile-specific JSON/CSV outputs per step
Profile-specific human-readable summaries:
- 07_hapa_digital_intervention/<profile_id>/step05_hapa_intervention.md
- 08_treatment_translation_communication/<profile_id>/treatment_translation_communication.md
Time-varying network animation: 04_time_series_analysis/<profile_id>/tv_network_animation.gif
Publication-ready PNGs: 09_impact_visualizations/<profile_id>/ (for human healthcare expert comparison)

📊 Survey Evaluation Framework

The evaluation/survey_analysis/ directory contains a complete 7-study statistical evaluation framework:

Study	Name	Participants	Method
00	Momentary Impact Quantification	30 non-experts	Repeated-measures mixed model on Spearman footrule
01	Operationalization	10 HCPs	Dimension-wise crossed mixed models + Bonferroni
02	Initial Observational Model	10 HCPs	Dimension-wise crossed mixed models + Bonferroni
03	Treatment Target Identification	30 non-experts	Repeated-measures mixed model on Spearman footrule
04	Updated Observational Model	10 HCPs	Dimension-wise crossed mixed models + Bonferroni
05	Tailored Intervention	5 HCP generators + 30 lay raters	Dimension-wise crossed mixed models + Bonferroni
06	Holistic Pipeline Quality	Aggregate	Study-adjusted crossed mixed model with participant, task, and answer-block clustering

In short, the survey framework now treats the evaluation as a repeated-measures problem rather than a collection of independent ratings. The holistic study pools studies 01, 02, 04, and 05, adjusts for study and dimension, and includes participant-level, task-level, and answer-block dependence so the PHOENIX-versus-healthcare-expert comparison is statistically aligned with how the data are actually generated.

For feasibility, PHOENIX assumes participant-pool reuse where methodologically acceptable rather than fully new recruitment for every study. In practice, the same healthcare-expert pool is intended to be reused across the expert-facing studies (01, 02, 04, and the expert-generation part of 05), and the same non-expert pool can be reused across the layperson-facing studies (00, 03, and the lay-rating part of 05) when burden and scheduling allow.

Run all studies:

bash evaluation/survey_analysis/run_all_studies.sh

Results are saved under evaluation/survey_analysis/results/study_XX_*/ with publication-ready PNGs and statistical reports.

✅ Quality Assurance and CI/CD

Run locally:

make qa-unit
make qa-integration
make qa-smoke
make qa-all

Automated workflows:

.github/workflows/ci.yml
.github/workflows/smoke_pipeline.yml

Schema/contract validation entrypoint:

evaluation/quality_and_research/quality_assurance/validate_contract_schemas.py

Contract validation: 7 JSON schemas enforce structural guarantees on every stage output: readiness_report, network_comparison_summary, momentary_impact, step03_target_selection, step04_updated_model, step05_hapa_intervention, pipeline_summary.

🐳 Docker

PHOENIX ships with a ready-to-use Docker configuration for reproducible execution:

git clone https://github.com/stvsever/ThesisMaster.git
cd MASTERPROEF

# Optional for LLM-enabled runs; deterministic mode can skip this.
cat > .env <<'EOF'
OPENROUTER_API_KEY=<your_openrouter_key>
OPENAI_BASE_URL=https://openrouter.ai/api/v1
EOF

cd docker
docker compose up --build

This starts the Flask frontend on http://127.0.0.1:5050. The Docker setup bundles the project dependencies, mounts integrated-pipeline outputs back to the host, and also supports CLI runs through the phoenix-cli service. See docker/README.md for the full workflow.

📜️ License

This project is licensed under GNU General Public License v3.0. See LICENSE.

What this means in practice:

You may use, study, modify, and redistribute this code.
If you distribute modified versions (or software that includes GPL-covered parts), you must:
- keep it under GPL-compatible terms,
- provide corresponding source code,
- preserve copyright and license notices,
- document meaningful changes.
The software is provided without warranty.

For academic reuse, cite the thesis context appropriately and keep provenance of methodological changes explicit.

Caution

EU MDR / PRE-CLINICAL DISCLAIMER PHOENIX is a Clinical Decision Support System (CDSS) prototype designed for research purposes. It is NOT a certified medical device under the EU Medical Device Regulation (MDR 2017/745) or FDA guidelines. Do not use for primary diagnostic decisions. All outputs must be verified by a qualified clinician.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
docker		docker
evaluation		evaluation
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
stijnvanseveren_masterproef1.pdf		stijnvanseveren_masterproef1.pdf

Folders and files

Latest commit

History

Repository files navigation

Enhancing Translational Abilities of Longitudinal Mental Health Applications: An Adaptive Approach to Idiographic Modelling by Leveraging Ontology-based Agentic AI

🐦‍🔥 PHOENIX Engine

📋 Table of Contents

🏛️ Academic Context

🧭 PHOENIX Scope

🔁 End-to-End Stage Map

🐦‍🔥 PHOENIX Ontology with LLM-based Mappings

🏗️ Technical Architecture

Five PHOENIX Ontologies

Runtime Multi-Agent Design

Hierarchical Updating Algorithm (HUA)

Iterative Cycle Design

🚀 Quick Setup of PHOENIX engine

1. Clone repository

2. Create Python environment (3.11+)

3. Configure .env for LLM-enabled runs

4. Optional smoke validation

🗂️ Repository Structure

💻 Run PHOENIX from CLI

A. Standard integrated run

B. Single profile selection

C. Iterative run (2 cycles)

D. Deterministic mode (no LLM)

🖥️ Run PHOENIX from Frontend

📦 Outputs and Validation Targets

📊 Survey Evaluation Framework

✅ Quality Assurance and CI/CD

🐳 Docker

📜️ License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. Configure `.env` for LLM-enabled runs

Packages