InternLM · CUHKSZzxy · Nov 26, 2025
diff --git a/docs/en/benchmark/evaluate_with_vlmevalkit.md b/docs/en/benchmark/evaluate_with_vlmevalkit.md
@@ -0,0 +1,65 @@
+# Multi-Modal Model Evaluation Guide
+
+This document describes how to evaluate multi-modal models' capabilities using VLMEvalKit and LMDeploy.
+
+## Environment Setup
+
+```shell
+pip install lmdeploy
+
+git clone https://github.com/open-compass/VLMEvalKit.git
+cd VLMEvalKit && pip install -e .
+```
+
+It is recommended to install LMDeploy and VLMEvalKit in separate Python virtual environments to avoid potential dependency conflicts.
+
+## Evaluations
+
+1. **Deploy Large Multi-Modality Models (LMMs)**
+
+```shell
+lmdeploy serve api_server <model_path> --server-port 23333 <--other-options>
+```
+
+2. **Config the Evaluation Settings**
+
+Modify `VLMEvalKit/vlmeval/config.py`, add following LMDeploy API configurations in the `api_models` dictionary.
+
+The `<task_name>` is a custom name for your evaluation task (e.g., `lmdeploy_qwen3vl-4b`). The `model` parameter should match the `<model_path>` used in the `lmdeploy serve` command.
+
+```python
+// filepath: VLMEvalKit/vlmeval/config.py
+// ...existing code...
+api_models = {
+    # lmdeploy api
+    ...,
+    "<task_name>": partial(
+        LMDeployAPI,
+        api_base="http://0.0.0.0:23333/v1/chat/completions",
+        model="<model_path>",
+        retry=4,
+        timeout=1200,
+        temperature=0.7, # modify if needed
+        max_new_tokens=16384, # modify if needed
+    ),
+    ...
+}
+// ...existing code...
+```
+
+3. **Start Evaluations**
+
+```shell
+cd VLMEvalKit
+python run.py --data OCRBench --model <task_name> --api-nproc 16 --reuse --verbose --api 123
+```
+
+The `<task_name>` should match the one used in the above config file.
+
+Parameter explanations:
+
+- `--data`: Specify the dataset for evaluation (e.g., `OCRBench`).
+- `--model`: Specify the model name, which must match the `<task_name>` in your `config.py`.
+- `--api-nproc`: Specify the number of parallel API calls.
+- `--reuse`: Reuse previous inference results to avoid re-running completed evaluations.
+- `--verbose`: Enable verbose logging.
diff --git a/docs/en/index.rst b/docs/en/index.rst
@@ -88,6 +88,7 @@ Documentation
 
    benchmark/benchmark.md
    benchmark/evaluate_with_opencompass.md
+   benchmark/evaluate_with_vlmevalkit.md
 
 .. toctree::
    :maxdepth: 1

diff --git a/docs/zh_cn/benchmark/evaluate_with_vlmevalkit.md b/docs/zh_cn/benchmark/evaluate_with_vlmevalkit.md
@@ -0,0 +1,65 @@
+# 多模态模型评测指南
+
+本文档介绍如何使用 VLMEvalKit 和 LMDeploy 评测多模态模型能力。
+
+## 环境准备
+
+```shell
+pip install lmdeploy
+
+git clone https://github.com/open-compass/VLMEvalKit.git
+cd VLMEvalKit && pip install -e .
+```
+
+建议在不同的 Python 虚拟环境中分别安装 LMDeploy 和 VLMEvalKit，以避免潜在的依赖冲突。
+
+## 评测
+
+1. **部署大语言多模态模型 (LMMs)**
+
+```shell
+lmdeploy serve api_server <model_path> --server-port 23333 <--other-options>
+```
+
+2. **配置评测设置**
+
+修改 `VLMEvalKit/vlmeval/config.py`，在 `api_models` 字典中添加以下 LMDeploy API 配置。
+
+`<task_name>` 是您评测任务的自定义名称（例如 `lmdeploy_qwen3vl-4b`）。`model` 参数应与 `lmdeploy serve` 命令中使用的 `<model_path>` 保持一致。
+
+```python
+// filepath: VLMEvalKit/vlmeval/config.py
+// ...existing code...
+api_models = {
+    # lmdeploy api
+    ...,
+    "<task_name>": partial(
+        LMDeployAPI,
+        api_base="http://0.0.0.0:23333/v1/chat/completions",
+        model="<model_path>",
+        retry=4,
+        timeout=1200,
+        temperature=0.7, # modify if needed
+        max_new_tokens=16384, # modify if needed
+    ),
+    ...
+}
+// ...existing code...
+```
+
+3. **开始评测**
+
+```shell
+cd VLMEvalKit
+python run.py --data OCRBench --model <task_name> --api-nproc 16 --reuse --verbose --api 123
+```
+
+`<task_name>` 应与上述配置文件中使用的名称保持一致。
+
+参数说明：
+
+- `--data`: 指定用于评测的数据集（例如 `OCRBench`）。
+- `--model`: 指定模型名称，必须与您在 `config.py` 中设置的 `<task_name>` 匹配。
+- `--api-nproc`: 指定并行的 API 调用数量。
+- `--reuse`: 复用先前的推理结果，以避免重新运行已完成的评测。
+- `--verbose`: 启用详细日志记录。
diff --git a/docs/zh_cn/index.rst b/docs/zh_cn/index.rst
@@ -89,6 +89,7 @@ LMDeploy 工具箱提供以下核心功能：
 
    benchmark/benchmark.md
    benchmark/evaluate_with_opencompass.md
+   benchmark/evaluate_with_vlmevalkit.md
 
 .. toctree::
    :maxdepth: 1