Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README-ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@ mllm

## 最新动态

- [2026 年 2 月 3 日] 🔥🔥🔥 MLLM Qnn AOT 已支持在 NPU 上全图执行![快速开始](https://ubiquitouslearning.github.io/mllm/qnn_backend/aot_execute.html), [技术报告](https://chenghuawang.github.io/News/2026-01-29-mllm-qnn-aot-support/)
- [2025 年 11 月 27 日] Android Demo 更新:通过一种全新的 In-App Go 服务架构,在 Android 上实现了 Qwen3 和 DeepSeek-OCR 的稳定流式推理。
- [2025 年 11 月 23 日] 🔥🔥🔥 MLLM v2 发布!
- [2025 年 11 月 23 日] MLLM v2 发布!
- [2025 年 8 月 28 日] 即将停止对 MLLM V1 的支持。在弃用前,V1 将集成以下功能:GPT-OSS。随后 MLLM 将迁移至 V2(可在 V2 分支查看)。V2 将带来全新的能力:

- 更加 Pythonic 的模型编写方式,支持即时执行(eager execution)
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@ mllm

## Latest News

- [2026 Feb 03] 🔥🔥🔥 MLLM Qnn AOT Support for Full Graph Execution on NPU! [Quick Start](https://ubiquitouslearning.github.io/mllm/qnn_backend/aot_execute.html), [Technical Report](https://chenghuawang.github.io/News/2026-01-29-mllm-qnn-aot-support-en/)
- [2025 Nov 27] Android Demo Update: Enabled stable Qwen3 and DeepSeek-OCR streaming on Android via a novel In-App Go Server Architecture.
- [2025 Nov 23] 🔥🔥🔥 MLLM v2 released!
- [2025 Nov 23] MLLM v2 released!
- [2025 Aug 28] Support for MLLM V1 is ending soon. Before its retirement, V1 will integrate the following features: GPT-OSS. MLLM will then transition to V2, which can be viewed on the V2 branch. V2 will include brand-new capabilities:
- A more Pythonic model authoring approach with eager execution
- Compilation support for easier NPU integration
Expand Down
2 changes: 2 additions & 0 deletions docs/qnn_backend/aot_execute.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ Taking ``qwen3_qnn_aot`` as an example, the detailed steps are as follows.
-m /path/to/output/qwen3_1.7b.mllm \
-c ./examples/qwen3_qnn_aot/config_1.7B.json \
--aot_config ./examples/qwen3_qnn_aot/qnn_aot_cfg_1.7B.json
# Optional, default value is /opt/qcom/aistack/qairt/2.41.0.251128/lib/x86_64-linux-clang/
# --qnn_env_path path/to/qnn_sdk.


This program reads the ``.mllm`` model file and the quantization recipe, and finally generates a QNN context binary file named ``qwen3-1.7B-lpbq-sha.bin``. This file contains all the information needed to execute inference on the target device.
Expand Down