Add cross-attention fusion to cookbook config #28

theoschiff · 2025-12-11T16:47:15Z

This PR updates the cookbook to reflect the new cross-attention fusion pipeline and routing configuration for the MoE experts.

Adds cross-attention (fusion_method: cross_attn) as the fusion strategy in the multimodal cookbook.
Updates base model configuration and training output paths.
Introduces generalist_idx for expert routing in the MoE vision encoder.
Fixes small typo in image_modality_moe_pep.py

Details

Base LLM & model
- base_llm: meta-llama/Llama-3.1-8B-Instruct
- base_model for end2end: /capstor/store/cscs/swissai/a127/homes/meditron/models/multimeditron/freeze/attn_pep/MultiMeditron-8B-attn-pep-alignment/checkpoint-...
- resume_from_checkpoint: true
MoE vision stack
- fusion_method: cross_attn for moe_meditron_clip_pep
- Added generalist_idx: -1 to allow selection of generalist expert (by default put at the end of the list)
- Kept existing experts:
  - ClosedMeditron/MedExpert-CT
  - ClosedMeditron/MedExpert-MRI
  - ClosedMeditron/MedExpert-Ultrasound
  - ClosedMeditron/MedExpert-Xray
  - ClosedMeditron/clip-vit-base-patch32
Training / cookbook updates
- Updated output_dir and run_name accordingly (e.g MultiMeditron-8B-attn-pep-end2end)
- Ensured tokenizer and sequence settings remain consistent (tokenizer_type: llama, max_sequence_length: 4096, truncation: true).

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…ep_integration

for sequence_append, need to add `top_k_experts` param in modalities in config_alignment.yaml

2 fusion strategies: avg=weighted average, cat=sequence_append 2 types of projections: shared= 1 projection for all experts, pep=Per-Expert-Projection, one projection per expert

…kbook

MichelDucartier and others added 30 commits October 3, 2025 14:18

Add MoE with single gating

3c17619

Add ability to load resnet directly

d4479eb

Remove config

1c272dd

Update src/multimeditron/model/modalities/moe/gating.py

cd1caa8

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Add explicit None weights

b258a6f

Add notebook to generate selector + moe docs

26be8bb

Fix num_labels

7c4b0b9

removed unused param

3f3cd53

Per-expert projection added

cff3eaa

enforced order of expert for training alignment

12f0f76

quick fixes

70f24fd

fix and ready for full training

f0a605a

add per-expert projection (PEP) path

127e94d

Merge branch 'master' of github.com:EPFLiGHT/MultiMeditron into moe_p…

7894130

…ep_integration

added logging build_datasets

7136a2d

aded support for sequence_append fusion

1590c98

quick fixes

568494d

added sequence_append to single proj MoE

113cdc5

moe fixes fusion

096289f

tiny fix

a855196

fixes in init

952eb7e

for sequence_append, need to add `top_k_experts` param in modalities in config_alignment.yaml

fix pep and ready for training

acc8ee9

Add cookbook

d1d96d3

Add unfreeze

15d67f1

Fix learning rate

e6f41e7

Clean

68b9f7c

Fix gating unfrozen

1fb9146

Remove print

bf4ac32

Add docs + clean

844847d

Cookbook for Qwen3-8B finetuning

c531706

2 fusion strategies: avg=weighted average, cat=sequence_append 2 types of projections: shared= 1 projection for all experts, pep=Per-Expert-Projection, one projection per expert

theoschiff added 10 commits October 28, 2025 18:14

Merge branch 'cookbook' of github.com:EPFLiGHT/MultiMeditron into coo…

268139e

…kbook

fix tokenizer_type

8e9a471

Merge branch 'master' of github.com:EPFLiGHT/MultiMeditron into cookbook

6856511

addrd attn to alignment + fixes

0c53b81

quick fixes

1a948c1

fix typo

2559592

fixes

3dbf9e4

yet some more fixes

40b8d8b

update end2end and full

471c48e

resume_from_checkpoint bool set

2fcb911

MichelDucartier merged commit bc80a44 into master Dec 11, 2025
1 check failed

MichelDucartier deleted the cookbook branch December 11, 2025 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add cross-attention fusion to cookbook config #28

Add cross-attention fusion to cookbook config #28

Uh oh!

theoschiff commented Dec 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add cross-attention fusion to cookbook config #28

Add cross-attention fusion to cookbook config #28

Uh oh!

Conversation

theoschiff commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

theoschiff commented Dec 11, 2025 •

edited

Loading