diff --git a/README-ZH.md b/README-ZH.md index 71c16e74..a8df6543 100644 --- a/README-ZH.md +++ b/README-ZH.md @@ -17,8 +17,9 @@ mllm ## 最新动态 +- [2026 年 2 月 3 日] 🔥🔥🔥 MLLM Qnn AOT 已支持在 NPU 上全图执行![快速开始](https://ubiquitouslearning.github.io/mllm/qnn_backend/aot_execute.html), [技术报告](https://chenghuawang.github.io/News/2026-01-29-mllm-qnn-aot-support/) - [2025 年 11 月 27 日] Android Demo 更新:通过一种全新的 In-App Go 服务架构,在 Android 上实现了 Qwen3 和 DeepSeek-OCR 的稳定流式推理。 -- [2025 年 11 月 23 日] 🔥🔥🔥 MLLM v2 发布! +- [2025 年 11 月 23 日] MLLM v2 发布! - [2025 年 8 月 28 日] 即将停止对 MLLM V1 的支持。在弃用前,V1 将集成以下功能:GPT-OSS。随后 MLLM 将迁移至 V2(可在 V2 分支查看)。V2 将带来全新的能力: - 更加 Pythonic 的模型编写方式,支持即时执行(eager execution) diff --git a/README.md b/README.md index c68c2020..e34b56bd 100644 --- a/README.md +++ b/README.md @@ -17,8 +17,9 @@ mllm ## Latest News +- [2026 Feb 03] 🔥🔥🔥 MLLM Qnn AOT Support for Full Graph Execution on NPU! [Quick Start](https://ubiquitouslearning.github.io/mllm/qnn_backend/aot_execute.html), [Technical Report](https://chenghuawang.github.io/News/2026-01-29-mllm-qnn-aot-support-en/) - [2025 Nov 27] Android Demo Update: Enabled stable Qwen3 and DeepSeek-OCR streaming on Android via a novel In-App Go Server Architecture. -- [2025 Nov 23] 🔥🔥🔥 MLLM v2 released! +- [2025 Nov 23] MLLM v2 released! - [2025 Aug 28] Support for MLLM V1 is ending soon. Before its retirement, V1 will integrate the following features: GPT-OSS. MLLM will then transition to V2, which can be viewed on the V2 branch. V2 will include brand-new capabilities: - A more Pythonic model authoring approach with eager execution - Compilation support for easier NPU integration diff --git a/docs/qnn_backend/aot_execute.rst b/docs/qnn_backend/aot_execute.rst index 92945f34..6b03834c 100644 --- a/docs/qnn_backend/aot_execute.rst +++ b/docs/qnn_backend/aot_execute.rst @@ -89,6 +89,8 @@ Taking ``qwen3_qnn_aot`` as an example, the detailed steps are as follows. -m /path/to/output/qwen3_1.7b.mllm \ -c ./examples/qwen3_qnn_aot/config_1.7B.json \ --aot_config ./examples/qwen3_qnn_aot/qnn_aot_cfg_1.7B.json + # Optional, default value is /opt/qcom/aistack/qairt/2.41.0.251128/lib/x86_64-linux-clang/ + # --qnn_env_path path/to/qnn_sdk. This program reads the ``.mllm`` model file and the quantization recipe, and finally generates a QNN context binary file named ``qwen3-1.7B-lpbq-sha.bin``. This file contains all the information needed to execute inference on the target device.