GitHub - EPFLiGHT/MultiMeditron: MultiMeditron is a multimodal LLM built by students and researchers from LiGHT (https://light-laboratory.org)

MultiMeditron is a modular multimodal large language model (LLM) built by students and researchers from LiGHT Lab. It is designed to seamlessly integrate multiple modalities such as text, images, or other data types into a single unified model architecture.

🚀 Key Features

🔗 Modular Design: Easily plug in new modalities by following our well-documented interface. Each modality embedder (e.g., CLIP, Whisper, etc.) can be independently developed and added to the model.
🧩 Modality Interleaving: Supports interleaved multimodal inputs (e.g., text-image-text sequences), enabling complex reasoning across different data types.
⚡ Scalable Architecture: Designed for distributed and multi-node environments — ideal for large-scale training or inference.
🧠 Flexible Model Backbone: Combine any modality embedder (like CLIP or SigLIP) with any LLM (like Llama, Qwen, or custom fine-tuned models).

🏗️ Model Architecture

⚙️ Setup

Using Docker (recommended)

On AMD64 architecture:

docker pull michelducartier24/multimeditron-git:latest-amd64

On ARM64 architecture:

docker pull michelducartier24/multimeditron-git:latest-arm64

Using uv

Prerequisite: To install the right version of torch with your CUDA driver, please refer to this documentation

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Clone the repository:

git clone https://github.com/EPFLiGHT/MultiMeditron.git
cd MultiMeditron

Install dependencies:

uv pip install -e ".[flash-attn]"

💬 Inference Example

Here’s an example showing how to use MultiMeditron with Llama 3.1 (8B) and a single image input.

import torch
from transformers import AutoTokenizer
import os
from multimeditron.dataset.preprocessor import modality_preprocessor
from multimeditron.dataset.loader import FileSystemImageLoader
from multimeditron.model.model import MultiModalModelForCausalLM
from multimeditron.dataset.preprocessor.modality_preprocessor import ModalityRetriever, SamplePreprocessor
from multimeditron.model.data_loader import DataCollatorForMultimodal

ATTACHMENT_TOKEN = "<|reserved_special_token_0|>"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct", dtype=torch.bfloat16)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_special_tokens({'additional_special_tokens': [ATTACHMENT_TOKEN]})
attachment_token_idx = tokenizer.convert_tokens_to_ids(ATTACHMENT_TOKEN)

# Load model
model = MultiModalModelForCausalLM.from_pretrained("path/to/trained/model")
model.to("cuda")

# Define input
modalities = [{"type": "image", "value": "path/to/image"}]
conversations = [{
    "role": "user",
    "content": f"{ATTACHMENT_TOKEN} Describe the image."
}]
sample = {"conversations": conversations, "modalities": modalities}

loader = FileSystemImageLoader(base_path=os.getcwd())

collator = DataCollatorForMultimodal(
    tokenizer=tokenizer,
    tokenizer_type="llama",
    modality_processors=model.processors(),
    modality_loaders={"image": loader},
    attachment_token_idx=attachment_token_idx,
    add_generation_prompt=True,
)

batch = collator([sample])

with torch.no_grad():
    outputs = model.generate(batch=batch, temperature=0.1)

print(tokenizer.batch_decode(outputs, skip_special_tokens=True, clean_up_tokenization_spaces=True)[0])

🧩 Adding a New Modality

MultiMeditron’s architecture is fully extensible. To add a new modality, see the developer documentation for a step-by-step guide.

⚖️ License

This project is licensed under the Apache 2.0 License, see the LICENSE 🎓 file for details.

📖 Cite us

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.github/workflows		.github/workflows
.vscode		.vscode
assets		assets
config		config
cookbook/sft		cookbook/sft
docker		docker
docs		docs
examples		examples
mock_dataset		mock_dataset
notebooks		notebooks
scripts		scripts
src/multimeditron		src/multimeditron
third-party		third-party
ui		ui
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
merge_inputs.py		merge_inputs.py
pylock.toml		pylock.toml
pyproject.toml		pyproject.toml
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Key Features

🏗️ Model Architecture

⚙️ Setup

Using Docker (recommended)

Using uv

💬 Inference Example

🧩 Adding a New Modality

⚖️ License

📖 Cite us

About

Uh oh!

Releases

Packages

Contributors 7

Uh oh!

Languages

License

EPFLiGHT/MultiMeditron

Folders and files

Latest commit

History

Repository files navigation

🚀 Key Features

🏗️ Model Architecture

⚙️ Setup

Using Docker (recommended)

Using uv

💬 Inference Example

🧩 Adding a New Modality

⚖️ License

📖 Cite us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Uh oh!

Languages

Packages