-
Notifications
You must be signed in to change notification settings - Fork 2
Add more documentation #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates documentation and adds key supporting files for the MultiMeditron project. The changes enhance user-facing documentation with comprehensive guides, improve branding with new logos, and add the Apache 2.0 license file.
- Restructured documentation with enhanced branding (dual-theme logos, centered banner) and improved navigation
- Added comprehensive training guide, dataset format documentation, and configuration reference
- Updated README with cleaner structure, feature highlights, and complete setup/inference examples
Reviewed Changes
Copilot reviewed 11 out of 15 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/source/index.rst | Enhanced documentation landing page with dual-theme logos, centered banner, and improved table of contents structure |
| docs/source/guides/training.rst | Added comprehensive training guide with YAML configuration examples, DeepSpeed setup, and multi-node deployment instructions |
| docs/source/guides/quickstart.rst | Corrected inline code formatting for placeholders using :code: directive |
| docs/source/guides/known_issues.rst | Simplified section title by removing redundant "when mounting volumes" text |
| docs/source/guides/guide.rst | Added includehidden directive and new configuration page to table of contents |
| docs/source/guides/dataset_format.rst | Added detailed dataset format documentation covering Arrow and JSONL formats for both pretraining and instruction-tuning |
| docs/source/guides/configuration.rst | Created new configuration reference with comprehensive YAML parameter documentation |
| docs/source/guides/add_modality.rst | Fixed plural agreement ("steps" → "step") in modality processing pipeline description |
| docs/source/conf.py | Added blank line for formatting consistency |
| assets/architecture.png | Added architecture diagram PNG for documentation |
| README.md | Complete rewrite with improved structure, feature highlights, installation instructions, and corrected code examples |
| LICENSE | Added Apache 2.0 license file |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| [{"type": "modality_type", "value" : some_modality}] | ||
| For instance, for image type, :code:`some_modality` must contains the bytes of the image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some_modality -> value, no?
|
|
||
| .. warning:: | ||
|
|
||
| Please note that JSONL format is not recommended! We provide scripts to convert JSONL-formatted dataset into Arrow dataset. If your dataset is in a JSONL format, you need to convert it first to Arrow before training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to convert a JSONL-formatted dataset into an Arrow dataset"
extra space in "if your dataset is in"
Would be nice to specify the path to the scritps
| { | ||
| "text": "Let's compare the first image: <|reserved_special_token_0|>, and the second 3D image: <|reserved_special_token_0|>", | ||
| "modalities": [{"type" : "image", "value" : "path/to/png"}, {"type" : "image_3d", "value" : "path/to/npy"}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path -> absolute or relative to what?
| Launch the training | ||
| ------------------- | ||
|
|
||
| Once the training configuration are done, we are ready to launch a training. We support both single node and multi node training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
configurations
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This pull request introduces several important improvements to the project documentation and repository structure, focusing on enhancing usability, clarity, and compliance. The most significant updates include a major overhaul of the
README.mdfor better onboarding, the addition of a license file, and new or improved documentation for configuration and extensibility.Documentation and usability improvements:
README.mdwith a clearer project introduction, feature highlights, setup instructions (including Docker anduv), an updated inference example, and simplified guidance for adding new modalities. The new format also includes project badges and improved visualsdocs/source/guides/configuration.rst) providing a detailed YAML configuration reference for model training and usage.Repository and compliance updates:
Apache 2.0license file to the repository, ensuring clear open-source licensing and compliance.