Enhancing Translational Abilities of Longitudinal Mental Health Applications: An Adaptive Approach to Idiographic Modelling by Leveraging Ontology-based Agentic AI
The PHOENIX engine conceptualises mental health support as a closed-loop workflow that iteratively optimizes the digital intervention proposal based on multi-modal data from previous collection cycles.
(Personalized Hierarchical Optimization Engine for Navigating Insightful eXplorations)
- 🏛️ Academic Context
- 🧭 PHOENIX Scope
- 🔁 End-to-End Stage Map
- 🐦🔥 PHOENIX Ontology
- 🏗️ Technical Architecture
- 🚀 Quick Setup
- 🗂️ Repository Structure
- 💻 Run from CLI
- 🖥️ Run from Frontend
- 📦 Outputs and Validation
- 📊 Survey Evaluation Framework
- ✅ Quality Assurance and CI/CD
- 🐳 Docker
- 📜 License
This research-grade software is being created for a Ghent University master's thesis that aims to enhance the clinical translation abilities of longitudinal mental health applications: toward an adaptive approach for idiographic modelling by using ontology-based multi-agentic workflows.
| Field | Value |
|---|---|
| Institution | Ghent University |
| Author | Stijn Van Severen |
| Supervisors | Geert Crombez, Annick De Paepe |
PHOENIX separates two concerns:
- Core engine flow: clinical/analytic decision flow from intake to iterative model carry-over.
- Research support flow: visualization, QA, and research reporting for validation and communication.
This separation keeps scientific validation transparent without mixing support tasks into core decision logic.
PHOENIX is a modular, multi-agent system that starts from free-text complaints, builds an initial observation model, analyzes time-series dynamics, proposes targets/interventions, and packages iterative updates for the next cycle.
The following ontology was developed to support the PHOENIX engine's reasoning and decision-making processes.
All stages are constrained by five stable ontologies that enforce structural guarantees across the full pipeline:
| Ontology | Role | Source |
|---|---|---|
| CRITERION | Operationalized mental health variables (DSM-5-TR, RDoC) | src/backend/SystemComponents/PHOENIX_ontology/separate/CRITERION/ |
| PREDICTOR | Hierarchically structured treatment-solution entities used to model actionable intervention pathways and candidate refinement space | src/backend/SystemComponents/PHOENIX_ontology/separate/PREDICTOR/ |
| PERSON | Individual characteristics (demographics, comorbidity, history) | src/backend/SystemComponents/PHOENIX_ontology/separate/PERSON/ |
| CONTEXT | Situational and environmental factors | src/backend/SystemComponents/PHOENIX_ontology/separate/CONTEXT/ |
| HAPA | Health Action Process Approach (barriers, coping, phases) | src/backend/SystemComponents/PHOENIX_ontology/separate/HAPA/ |
In the real integrated pipeline, all decision-making stages use live LLM reasoning in normal operation, but they do so with different scaffolds. Step 01 combines always-on complaint decomposition, a local LLM critic loop, hybrid ontology retrieval, and optional final leaf adjudication, while later stages use explicit actor-critic loops with bounded iterations and heuristic fallback only for deterministic or degraded runs.
| Integrated Step | Runtime Component | Live LLM Use | Critic Dimensions | Core Runtime Method |
|---|---|---|---|---|
| 01 | Complaint Operationalization Agent | Yes, always-on for free-text decomposition and local critic review; optional again for final leaf adjudication | schema_validity, coverage_grounding, atomicity_nonoverlap, granularity_fit, current_actionability | LLM decomposition + local LLM critic refinement + HTSSF retrieval (dense + BM25 + token overlap + fuzzy) |
| 02 | Initial Observation Model Constructor | Yes, real-time HyDE generation and structured model construction | predictor_grounding, criterion_continuity, ontology_strictness, evidence_quality | HyDE-based predictor RAG + actor-critic refinement |
| 03 | Treatment Target Identifier | Yes, real-time structured actor output when enabled | safety, domain_boundary, lineage_consistency | BFS candidate selector + idiographic-nomothetic fusion |
| 04 | Updated Observation Model Constructor | Yes, real-time structured actor output when enabled | safety, domain_boundary, lineage_consistency | BFS-guided hierarchical model update with ontology-constrained refinement |
| 05 | HAPA Intervention Mapper | Yes, real-time structured intervention generation when enabled | reasoning_quality, evidence_grounding, hapa_consistency, medical_safety | Barrier scoring: 0.60·predictor + 0.20·profile + 0.15·context + 0.05·complaint |
Runtime interpretation:
- Step 01 is not just retrieval: it always begins with LLM-based complaint decomposition, with no non-LLM fallback for that decomposition phase, and then runs a local structured LLM critic loop to review complaint coverage, granularity, overlap, and present-state actionability before mapping to ontology leaves.
- Step 01 still uses hybrid retrieval for ontology grounding, and can optionally run an additional LLM adjudication pass to choose the final leaf or return
UNMAPPED. - Steps 02, 03, 04, and 05 use live LLM calls in normal operation, with component-specific actor-critic prompts, bounded iterations, and heuristic fallback paths for deterministic or degraded runs.
- The intervention module is Step 05 in the actual integrated pipeline, even if some earlier summaries compressed it into a four-stage abstraction.
Optional DAG orchestrator (src/backend/orchestrator.py): for complex tasks, a flexible orchestrator creates DAG-based parallel/sequential execution plans — otherwise the pipeline runs sequentially (primary evaluation path).
Quantitative backbone bridging EMA data to adaptive model weighting:
- Readiness classifier — stationarity (ADF/KPSS), collinearity, effective sample size → tier selection (tv-gVAR / gVAR / GGM / correlation / descriptives)
- Network time-series analyst — kernel-smoothed VAR(1), L1-penalized stationary gVAR, partial correlations (Ledoit-Wolf shrinkage), time-varying GIF animations
- Momentary impact quantifier — leave-one-predictor-out MSE delta + coefficient magnitude composite
- BFS candidate selector —
score = 0.45·mapping + 0.25·HyDE + 0.20·idiographic_anchor + 0.10·domain_bonus
Adaptive idiographic-nomothetic weighting per cycle:
idiographic_weight = clamp(0.30 + 0.50 × readiness_score / 100)
nomothetic_weight = 1.0 - idiographic_weight
PHOENIX implements a breadth-first iterative algorithm across cycles:
- Cycle N produces: criterion leaf, initial model, pseudodata, HUA results, treatment targets, HAPA intervention
- Cycle N+1 seeds from Cycle N via a history ledger: impact scores →
idiographic_anchorin BFS; prior cycle scores modulatedomain_bonus;composite_score = 0.35·similarity + 0.25·impact[N] + 0.15·target_scores + 0.10·priority_scores + 0.15·quality_scores
git clone https://github.com/stvsever/ThesisMaster.git
cd MASTERPROEFpython3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txtCreate or update .env in repository root:
OPENROUTER_API_KEY=<your_openrouter_key>
OPENAI_BASE_URL=https://openrouter.ai/api/v1Runtime behavior:
OPENROUTER_API_KEYis primary.- Runtime mirrors it to
OPENAI_API_KEYfor backward-compatible scripts. - Default model is
gpt-5-nano(resolved asopenai/gpt-5-nanowhen routed via OpenRouter).
If you want to quickly validate the integrated pipeline on a single profile with minimal iterations, you can run the smoke test:
make pipeline-smokeA client-side graph creator (GitNexus) was used to generate a comprehensive knowledge graph of the entire codebase; its component interactions are provided below:
The main codebase is organized around src/ and evaluation/. Inside src/, the canonical runtime split is now src/frontend/ for the Flask application and src/backend/ for the engine, ontologies, shared runtime utilities, and architecture assets.
MASTERPROEF/
├── src/ # Canonical application source tree
│ ├── backend/ # Engine runtime, SystemComponents, utils, orchestrator, overview assets
│ ├── frontend/ # Flask app, UI routes, runtime workspace integration
│ ├── __init__.py # Package root for `src.frontend` and `src.backend`
│ └── README.md # Architecture overview for the `src/` tree
├── evaluation/ # Sequential scripts + integrated pipeline + QA/research
│ ├── sequential/ # Stage-wise run_step.py scripts (00..08)
│ ├── integrated_pipeline/ # run_pipeline.py and run_engine_pipeline.py
│ ├── survey_analysis/ # 6-study evaluation framework with analysis scripts
│ └── quality_and_research/ # pytest suites, schema contracts, research reporting
├── docker/ # Dockerfile + docker-compose for reproducible deployment
├── .github/ # CI/CD workflows
├── pyproject.toml # Python package metadata and constraints
├── requirements.txt # Dependency baseline
└── README.md # Root documentation
The following command executes the full PHOENIX pipeline with default settings, processing the synthetic_v1 dataset through all stages and generating comprehensive outputs:
python evaluation/integrated_pipeline/run_pipeline.py --mode synthetic_v1The following command runs the pipeline on the synthetic_v1 dataset but limits the execution to a single profile matching the pattern pseudoprofile_FTC_ID001. This allows for focused testing and debugging on a specific case:
python evaluation/integrated_pipeline/run_pipeline.py --mode synthetic_v1 \
--pattern pseudoprofile_FTC_ID001 \
--max-profiles 1The following command executes the PHOENIX pipeline for 2 complete cycles, allowing you to observe how the system iteratively refines its outputs based on previous cycle data. The --profile-memory-window 3 flag enables the system to retain information from the last 3 profiles for informed decision-making in subsequent cycles:
python evaluation/integrated_pipeline/run_pipeline.py --mode synthetic_v1 \
--cycles 2 \
--profile-memory-window 3python evaluation/integrated_pipeline/run_pipeline.py --mode synthetic_v1 --disable-llmRuntime note:
- If a cycle is
readiness_alignedand only contemporaneous correlation analysis is feasible, PHOENIX now applies a correlation-baseline impact fallback so downstream Step-03/04/05 and communication stages still execute and persist outputs. - If Step-02 model generation fails (for example provider/network failure), PHOENIX now builds complaint-grounded fallback Step-02 artifacts directly from Step-01 operationalization output, instead of copying unrelated historical profile artifacts.
- For iterative cycles started via
--start-from-pseudodata, PHOENIX now resolvesinitial_model_runs_rootfrom the active run lineage (same run id) so Step-03/04 stay anchored to the current cycle history.
Use the following command to start the Flask frontend:
python src/frontend/app.py
# or
python evaluation/integrated_pipeline/run_pipeline.py --uiOpen http://127.0.0.1:5050.
Frontend provides:
- Intake for complaint/person/environment context
- Live component status and streaming logs
- One-click full end-to-end run from free-text complaint (with iterative cycle controls)
- Step-level run controls and advanced configuration toggles
- Wizard-style iterative execution: INTAKE → MODEL → DATA → ANALYSIS → INTERVENTION → MODEL (cycle N+1)
- Interactive Chart.js dashboard — all visualizations are dynamic (no static PNGs in UI)
- Canvas-based animated network visualization with per-frame scrubbing
- Session persistence and cohort batch execution
Integrated outputs are saved under:
evaluation/integrated_pipeline/runs/<run_id>/
Key artifacts to inspect:
00_operationalization/through10_research_reports/pipeline_summary.jsonllm_startup_health_check.json- Stage logs (
stage.log,stage_events.jsonl,stage_trace.json) - Profile-specific JSON/CSV outputs per step
- Profile-specific human-readable summaries:
07_hapa_digital_intervention/<profile_id>/step05_hapa_intervention.md08_treatment_translation_communication/<profile_id>/treatment_translation_communication.md
- Time-varying network animation:
04_time_series_analysis/<profile_id>/tv_network_animation.gif - Publication-ready PNGs:
09_impact_visualizations/<profile_id>/(for human healthcare expert comparison)
The evaluation/survey_analysis/ directory contains a complete 7-study statistical evaluation framework:
| Study | Name | Participants | Method |
|---|---|---|---|
| 00 | Momentary Impact Quantification | 30 non-experts | Repeated-measures mixed model on Spearman footrule |
| 01 | Operationalization | 10 HCPs | Dimension-wise crossed mixed models + Bonferroni |
| 02 | Initial Observational Model | 10 HCPs | Dimension-wise crossed mixed models + Bonferroni |
| 03 | Treatment Target Identification | 30 non-experts | Repeated-measures mixed model on Spearman footrule |
| 04 | Updated Observational Model | 10 HCPs | Dimension-wise crossed mixed models + Bonferroni |
| 05 | Tailored Intervention | 5 HCP generators + 30 lay raters | Dimension-wise crossed mixed models + Bonferroni |
| 06 | Holistic Pipeline Quality | Aggregate | Study-adjusted crossed mixed model with participant, task, and answer-block clustering |
In short, the survey framework now treats the evaluation as a repeated-measures problem rather than a collection of independent ratings. The holistic study pools studies 01, 02, 04, and 05, adjusts for study and dimension, and includes participant-level, task-level, and answer-block dependence so the PHOENIX-versus-healthcare-expert comparison is statistically aligned with how the data are actually generated.
For feasibility, PHOENIX assumes participant-pool reuse where methodologically acceptable rather than fully new recruitment for every study. In practice, the same healthcare-expert pool is intended to be reused across the expert-facing studies (01, 02, 04, and the expert-generation part of 05), and the same non-expert pool can be reused across the layperson-facing studies (00, 03, and the lay-rating part of 05) when burden and scheduling allow.
Run all studies:
bash evaluation/survey_analysis/run_all_studies.shResults are saved under evaluation/survey_analysis/results/study_XX_*/ with publication-ready PNGs and statistical reports.
Run locally:
make qa-unit
make qa-integration
make qa-smoke
make qa-allAutomated workflows:
.github/workflows/ci.yml.github/workflows/smoke_pipeline.yml
Schema/contract validation entrypoint:
evaluation/quality_and_research/quality_assurance/validate_contract_schemas.py
Contract validation: 7 JSON schemas enforce structural guarantees on every stage output: readiness_report, network_comparison_summary, momentary_impact, step03_target_selection, step04_updated_model, step05_hapa_intervention, pipeline_summary.
PHOENIX ships with a ready-to-use Docker configuration for reproducible execution:
git clone https://github.com/stvsever/ThesisMaster.git
cd MASTERPROEF
# Optional for LLM-enabled runs; deterministic mode can skip this.
cat > .env <<'EOF'
OPENROUTER_API_KEY=<your_openrouter_key>
OPENAI_BASE_URL=https://openrouter.ai/api/v1
EOF
cd docker
docker compose up --buildThis starts the Flask frontend on http://127.0.0.1:5050. The Docker setup bundles the project dependencies, mounts integrated-pipeline outputs back to the host, and also supports CLI runs through the phoenix-cli service. See docker/README.md for the full workflow.
This project is licensed under GNU General Public License v3.0. See LICENSE.
What this means in practice:
- You may use, study, modify, and redistribute this code.
- If you distribute modified versions (or software that includes GPL-covered parts), you must:
- keep it under GPL-compatible terms,
- provide corresponding source code,
- preserve copyright and license notices,
- document meaningful changes.
- The software is provided without warranty.
For academic reuse, cite the thesis context appropriately and keep provenance of methodological changes explicit.
Caution
EU MDR / PRE-CLINICAL DISCLAIMER PHOENIX is a Clinical Decision Support System (CDSS) prototype designed for research purposes. It is NOT a certified medical device under the EU Medical Device Regulation (MDR 2017/745) or FDA guidelines. Do not use for primary diagnostic decisions. All outputs must be verified by a qualified clinician.


