Skip to content

Conversation

@theoschiff
Copy link
Contributor

@theoschiff theoschiff commented Dec 11, 2025

This PR updates the cookbook to reflect the new cross-attention fusion pipeline and routing configuration for the MoE experts.

  • Adds cross-attention (fusion_method: cross_attn) as the fusion strategy in the multimodal cookbook.
  • Updates base model configuration and training output paths.
  • Introduces generalist_idx for expert routing in the MoE vision encoder.
  • Fixes small typo in image_modality_moe_pep.py

Details

  • Base LLM & model

    • base_llm: meta-llama/Llama-3.1-8B-Instruct
    • base_model for end2end: /capstor/store/cscs/swissai/a127/homes/meditron/models/multimeditron/freeze/attn_pep/MultiMeditron-8B-attn-pep-alignment/checkpoint-...
    • resume_from_checkpoint: true
  • MoE vision stack

    • fusion_method: cross_attn for moe_meditron_clip_pep

    • Added generalist_idx: -1 to allow selection of generalist expert (by default put at the end of the list)

    • Kept existing experts:

      • ClosedMeditron/MedExpert-CT
      • ClosedMeditron/MedExpert-MRI
      • ClosedMeditron/MedExpert-Ultrasound
      • ClosedMeditron/MedExpert-Xray
      • ClosedMeditron/clip-vit-base-patch32
  • Training / cookbook updates

    • Updated output_dir and run_name accordingly (e.g MultiMeditron-8B-attn-pep-end2end)
    • Ensured tokenizer and sequence settings remain consistent (tokenizer_type: llama, max_sequence_length: 4096, truncation: true).

@MichelDucartier MichelDucartier merged commit bc80a44 into master Dec 11, 2025
1 check failed
@MichelDucartier MichelDucartier deleted the cookbook branch December 11, 2025 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants