Skip to content

Conversation

@MichelDucartier
Copy link
Contributor

This pull request introduces several important improvements to the project documentation and repository structure, focusing on enhancing usability, clarity, and compliance. The most significant updates include a major overhaul of the README.md for better onboarding, the addition of a license file, and new or improved documentation for configuration and extensibility.

Documentation and usability improvements:

  • Revamped README.md with a clearer project introduction, feature highlights, setup instructions (including Docker and uv), an updated inference example, and simplified guidance for adding new modalities. The new format also includes project badges and improved visuals
  • Added a new section in the documentation (docs/source/guides/configuration.rst) providing a detailed YAML configuration reference for model training and usage.
  • Added an anchor for the "add modality" guide to improve navigation in the developer documentation.

Repository and compliance updates:

  • Added an Apache 2.0 license file to the repository, ensuring clear open-source licensing and compliance.

@MichelDucartier MichelDucartier marked this pull request as ready for review November 3, 2025 10:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates documentation and adds key supporting files for the MultiMeditron project. The changes enhance user-facing documentation with comprehensive guides, improve branding with new logos, and add the Apache 2.0 license file.

  • Restructured documentation with enhanced branding (dual-theme logos, centered banner) and improved navigation
  • Added comprehensive training guide, dataset format documentation, and configuration reference
  • Updated README with cleaner structure, feature highlights, and complete setup/inference examples

Reviewed Changes

Copilot reviewed 11 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
docs/source/index.rst Enhanced documentation landing page with dual-theme logos, centered banner, and improved table of contents structure
docs/source/guides/training.rst Added comprehensive training guide with YAML configuration examples, DeepSpeed setup, and multi-node deployment instructions
docs/source/guides/quickstart.rst Corrected inline code formatting for placeholders using :code: directive
docs/source/guides/known_issues.rst Simplified section title by removing redundant "when mounting volumes" text
docs/source/guides/guide.rst Added includehidden directive and new configuration page to table of contents
docs/source/guides/dataset_format.rst Added detailed dataset format documentation covering Arrow and JSONL formats for both pretraining and instruction-tuning
docs/source/guides/configuration.rst Created new configuration reference with comprehensive YAML parameter documentation
docs/source/guides/add_modality.rst Fixed plural agreement ("steps" → "step") in modality processing pipeline description
docs/source/conf.py Added blank line for formatting consistency
assets/architecture.png Added architecture diagram PNG for documentation
README.md Complete rewrite with improved structure, feature highlights, installation instructions, and corrected code examples
LICENSE Added Apache 2.0 license file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MichelDucartier and others added 2 commits November 3, 2025 11:29
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
[{"type": "modality_type", "value" : some_modality}]
For instance, for image type, :code:`some_modality` must contains the bytes of the image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some_modality -> value, no?


.. warning::

Please note that JSONL format is not recommended! We provide scripts to convert JSONL-formatted dataset into Arrow dataset. If your dataset is in a JSONL format, you need to convert it first to Arrow before training.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to convert a JSONL-formatted dataset into an Arrow dataset"
extra space in "if your dataset is in"
Would be nice to specify the path to the scritps

{
"text": "Let's compare the first image: <|reserved_special_token_0|>, and the second 3D image: <|reserved_special_token_0|>",
"modalities": [{"type" : "image", "value" : "path/to/png"}, {"type" : "image_3d", "value" : "path/to/npy"}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path -> absolute or relative to what?

Launch the training
-------------------

Once the training configuration are done, we are ready to launch a training. We support both single node and multi node training.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configurations

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@MichelDucartier MichelDucartier merged commit 4b43371 into master Nov 17, 2025
1 check failed
@MichelDucartier MichelDucartier deleted the more-docs branch November 17, 2025 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants