docs(qnn_backend): update AOT execution flow documentation#628
docs(qnn_backend): update AOT execution flow documentation#628chenghuaWang merged 2 commits intoUbiquitousLearning:mainfrom
Conversation
📝 WalkthroughWalkthroughDocumentation update to the AOT execution guide reflecting changes in the model export and compilation workflow. The three-stage flow description now documents use of .safetensor and .mllm file formats instead of .mir, with mllm-convertor converting between formats during the Model Quantization and Export step. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/qnn_backend/aot_execute.rst`:
- Line 21: Replace the incorrect singular extension ".safetensor" with the
correct plural ".safetensors" in the sentence that describes exporting the
quantized model (the line mentioning "export it to ``.safetensor`` file");
ensure this matches the other occurrences of ".safetensors" in the document for
consistency.
|
|
||
| 1. **Model Quantization and Export (Python)**: On the host machine, a Python script is used to quantize the pre-trained floating-point model and export it to the MLLM IR (``.mir``) format. | ||
| 2. **Offline Compilation (C++)**: On the host machine, a C++ compiler program loads the ``.mir`` file, invokes the QNN toolchain for model compilation, graph optimization, and quantization parameter adjustment, and finally generates a QNN Context Binary. | ||
| 1. **Model Quantization and Export (Python)**: On the host machine, a Python script is used to quantize the pre-trained floating-point model and export it to ``.safetensor`` file. The ``.safetensor`` is then converted to ``.mllm`` file using mllm-convertor. |
There was a problem hiding this comment.
Fix file extension inconsistency.
The file extension should be .safetensors (plural) to match the standard format name and maintain consistency with the rest of the document (lines 33, 42, 44, 49 all use .safetensors).
📝 Proposed fix
-1. **Model Quantization and Export (Python)**: On the host machine, a Python script is used to quantize the pre-trained floating-point model and export it to ``.safetensor`` file. The ``.safetensor`` is then converted to ``.mllm`` file using mllm-convertor.
+1. **Model Quantization and Export (Python)**: On the host machine, a Python script is used to quantize the pre-trained floating-point model and export it to ``.safetensors`` file. The ``.safetensors`` is then converted to ``.mllm`` file using mllm-convertor.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 1. **Model Quantization and Export (Python)**: On the host machine, a Python script is used to quantize the pre-trained floating-point model and export it to ``.safetensor`` file. The ``.safetensor`` is then converted to ``.mllm`` file using mllm-convertor. | |
| 1. **Model Quantization and Export (Python)**: On the host machine, a Python script is used to quantize the pre-trained floating-point model and export it to ``.safetensors`` file. The ``.safetensors`` is then converted to ``.mllm`` file using mllm-convertor. |
🤖 Prompt for AI Agents
In `@docs/qnn_backend/aot_execute.rst` at line 21, Replace the incorrect singular
extension ".safetensor" with the correct plural ".safetensors" in the sentence
that describes exporting the quantized model (the line mentioning "export it to
``.safetensor`` file"); ensure this matches the other occurrences of
".safetensors" in the document for consistency.
Please check Guidelines for Contributing.
Summary by CodeRabbit