From 2eb23d66160e7800ff8b78aabc570eedda654351 Mon Sep 17 00:00:00 2001 From: chenghuaWang <2923277184@qq.com> Date: Tue, 3 Feb 2026 07:37:57 +0000 Subject: [PATCH 1/2] docs: update latest news section in README files for MLLM v2 release and new NPU support --- README-ZH.md | 3 ++- README.md | 3 ++- docs/qnn_backend/aot_execute.rst | 2 ++ 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/README-ZH.md b/README-ZH.md index 71c16e741..f55e6e678 100644 --- a/README-ZH.md +++ b/README-ZH.md @@ -17,8 +17,9 @@ mllm ## 最新动态 +- [2026 年 2 月 3 日] 🔥🔥🔥 MLLM Qnn AOT 已支持在 NPU 上全图执行![技术报告](https://chenghuawang.github.io/News/2026-01-29-mllm-qnn-aot-support/) - [2025 年 11 月 27 日] Android Demo 更新:通过一种全新的 In-App Go 服务架构,在 Android 上实现了 Qwen3 和 DeepSeek-OCR 的稳定流式推理。 -- [2025 年 11 月 23 日] 🔥🔥🔥 MLLM v2 发布! +- [2025 年 11 月 23 日] MLLM v2 发布! - [2025 年 8 月 28 日] 即将停止对 MLLM V1 的支持。在弃用前,V1 将集成以下功能:GPT-OSS。随后 MLLM 将迁移至 V2(可在 V2 分支查看)。V2 将带来全新的能力: - 更加 Pythonic 的模型编写方式,支持即时执行(eager execution) diff --git a/README.md b/README.md index c68c20209..abf7aa5ec 100644 --- a/README.md +++ b/README.md @@ -17,8 +17,9 @@ mllm ## Latest News +- [2026 Feb 03] 🔥🔥🔥 MLLM Qnn AOT Support for Full Graph Execution on NPU! [Technical Report](https://chenghuawang.github.io/News/2026-01-29-mllm-qnn-aot-support-en/) - [2025 Nov 27] Android Demo Update: Enabled stable Qwen3 and DeepSeek-OCR streaming on Android via a novel In-App Go Server Architecture. -- [2025 Nov 23] 🔥🔥🔥 MLLM v2 released! +- [2025 Nov 23] MLLM v2 released! - [2025 Aug 28] Support for MLLM V1 is ending soon. Before its retirement, V1 will integrate the following features: GPT-OSS. MLLM will then transition to V2, which can be viewed on the V2 branch. V2 will include brand-new capabilities: - A more Pythonic model authoring approach with eager execution - Compilation support for easier NPU integration diff --git a/docs/qnn_backend/aot_execute.rst b/docs/qnn_backend/aot_execute.rst index 92945f34b..6b03834c0 100644 --- a/docs/qnn_backend/aot_execute.rst +++ b/docs/qnn_backend/aot_execute.rst @@ -89,6 +89,8 @@ Taking ``qwen3_qnn_aot`` as an example, the detailed steps are as follows. -m /path/to/output/qwen3_1.7b.mllm \ -c ./examples/qwen3_qnn_aot/config_1.7B.json \ --aot_config ./examples/qwen3_qnn_aot/qnn_aot_cfg_1.7B.json + # Optional, default value is /opt/qcom/aistack/qairt/2.41.0.251128/lib/x86_64-linux-clang/ + # --qnn_env_path path/to/qnn_sdk. This program reads the ``.mllm`` model file and the quantization recipe, and finally generates a QNN context binary file named ``qwen3-1.7B-lpbq-sha.bin``. This file contains all the information needed to execute inference on the target device. From bbca7ed01ae5bb9f41b640fd5dfdae21c17e2918 Mon Sep 17 00:00:00 2001 From: chenghuaWang <2923277184@qq.com> Date: Tue, 3 Feb 2026 07:39:42 +0000 Subject: [PATCH 2/2] docs: update latest news section in README files to include Quick Start link for MLLM Qnn AOT support --- README-ZH.md | 2 +- README.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/README-ZH.md b/README-ZH.md index f55e6e678..a8df65437 100644 --- a/README-ZH.md +++ b/README-ZH.md @@ -17,7 +17,7 @@ mllm ## 最新动态 -- [2026 年 2 月 3 日] 🔥🔥🔥 MLLM Qnn AOT 已支持在 NPU 上全图执行![技术报告](https://chenghuawang.github.io/News/2026-01-29-mllm-qnn-aot-support/) +- [2026 年 2 月 3 日] 🔥🔥🔥 MLLM Qnn AOT 已支持在 NPU 上全图执行![快速开始](https://ubiquitouslearning.github.io/mllm/qnn_backend/aot_execute.html), [技术报告](https://chenghuawang.github.io/News/2026-01-29-mllm-qnn-aot-support/) - [2025 年 11 月 27 日] Android Demo 更新:通过一种全新的 In-App Go 服务架构,在 Android 上实现了 Qwen3 和 DeepSeek-OCR 的稳定流式推理。 - [2025 年 11 月 23 日] MLLM v2 发布! - [2025 年 8 月 28 日] 即将停止对 MLLM V1 的支持。在弃用前,V1 将集成以下功能:GPT-OSS。随后 MLLM 将迁移至 V2(可在 V2 分支查看)。V2 将带来全新的能力: diff --git a/README.md b/README.md index abf7aa5ec..e34b56bd1 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ mllm ## Latest News -- [2026 Feb 03] 🔥🔥🔥 MLLM Qnn AOT Support for Full Graph Execution on NPU! [Technical Report](https://chenghuawang.github.io/News/2026-01-29-mllm-qnn-aot-support-en/) +- [2026 Feb 03] 🔥🔥🔥 MLLM Qnn AOT Support for Full Graph Execution on NPU! [Quick Start](https://ubiquitouslearning.github.io/mllm/qnn_backend/aot_execute.html), [Technical Report](https://chenghuawang.github.io/News/2026-01-29-mllm-qnn-aot-support-en/) - [2025 Nov 27] Android Demo Update: Enabled stable Qwen3 and DeepSeek-OCR streaming on Android via a novel In-App Go Server Architecture. - [2025 Nov 23] MLLM v2 released! - [2025 Aug 28] Support for MLLM V1 is ending soon. Before its retirement, V1 will integrate the following features: GPT-OSS. MLLM will then transition to V2, which can be viewed on the V2 branch. V2 will include brand-new capabilities: