From 436ec2be8021564e52275a7691de50ec938418cc Mon Sep 17 00:00:00 2001
From: oreomaker <zh002919@outlook.com>
Date: Tue, 3 Feb 2026 10:07:54 +0800
Subject: [PATCH] docs(qnn_backend): update AOT execution flow documentation

---
 docs/qnn_backend/aot_execute.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/qnn_backend/aot_execute.rst b/docs/qnn_backend/aot_execute.rst
index 55addfef..5b6ac5e1 100644
--- a/docs/qnn_backend/aot_execute.rst
+++ b/docs/qnn_backend/aot_execute.rst
@@ -18,8 +18,8 @@ Overall Flow
 
 The QNN AOT execution flow is mainly divided into three stages:
 
-1.  **Model Quantization and Export (Python)**: On the host machine, a Python script is used to quantize the pre-trained floating-point model and export it to the MLLM IR (``.mir``) format.
-2.  **Offline Compilation (C++)**: On the host machine, a C++ compiler program loads the ``.mir`` file, invokes the QNN toolchain for model compilation, graph optimization, and quantization parameter adjustment, and finally generates a QNN Context Binary.
+1.  **Model Quantization and Export (Python)**: On the host machine, a Python script is used to quantize the pre-trained floating-point model and export it to ``.safetensor`` file. The ``.safetensor`` is then converted to ``.mllm`` file using mllm-convertor.
+2.  **Offline Compilation (C++)**: On the host machine, a C++ compiler program loads the ``.mllm`` file, invokes the QNN toolchain for model compilation, graph optimization, and quantization parameter adjustment, and finally generates a QNN Context Binary.
 3.  **On-Device Execution (C++)**: On the target device (e.g., a mobile phone), the AOT runner program loads the pre-compiled context binary and executes inference.