Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions docs/en/benchmark/evaluate_with_vlmevalkit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Multi-Modal Model Evaluation Guide

This document describes how to evaluate multi-modal models' capabilities using VLMEvalKit and LMDeploy.

## Environment Setup

```shell
pip install lmdeploy

git clone https://github.com/open-compass/VLMEvalKit.git
cd VLMEvalKit && pip install -e .
```

It is recommended to install LMDeploy and VLMEvalKit in separate Python virtual environments to avoid potential dependency conflicts.

## Evaluations

1. **Deploy Large Multi-Modality Models (LMMs)**

```shell
lmdeploy serve api_server <model_path> --server-port 23333 <--other-options>
```

2. **Config the Evaluation Settings**

Modify `VLMEvalKit/vlmeval/config.py`, add following LMDeploy API configurations in the `api_models` dictionary.

The `<task_name>` is a custom name for your evaluation task (e.g., `lmdeploy_qwen3vl-4b`). The `model` parameter should match the `<model_path>` used in the `lmdeploy serve` command.

```python
// filepath: VLMEvalKit/vlmeval/config.py
// ...existing code...
api_models = {
# lmdeploy api
...,
"<task_name>": partial(
LMDeployAPI,
api_base="http://0.0.0.0:23333/v1/chat/completions",
model="<model_path>",
retry=4,
timeout=1200,
temperature=0.7, # modify if needed
max_new_tokens=16384, # modify if needed
),
...
}
// ...existing code...
```

3. **Start Evaluations**

```shell
cd VLMEvalKit
python run.py --data OCRBench --model <task_name> --api-nproc 16 --reuse --verbose --api 123
```

The `<task_name>` should match the one used in the above config file.

Parameter explanations:

- `--data`: Specify the dataset for evaluation (e.g., `OCRBench`).
- `--model`: Specify the model name, which must match the `<task_name>` in your `config.py`.
- `--api-nproc`: Specify the number of parallel API calls.
- `--reuse`: Reuse previous inference results to avoid re-running completed evaluations.
- `--verbose`: Enable verbose logging.
1 change: 1 addition & 0 deletions docs/en/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ Documentation

benchmark/benchmark.md
benchmark/evaluate_with_opencompass.md
benchmark/evaluate_with_vlmevalkit.md

.. toctree::
:maxdepth: 1
Expand Down
65 changes: 65 additions & 0 deletions docs/zh_cn/benchmark/evaluate_with_vlmevalkit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# 多模态模型评测指南

本文档介绍如何使用 VLMEvalKit 和 LMDeploy 评测多模态模型能力。

## 环境准备

```shell
pip install lmdeploy

git clone https://github.com/open-compass/VLMEvalKit.git
cd VLMEvalKit && pip install -e .
```

建议在不同的 Python 虚拟环境中分别安装 LMDeploy 和 VLMEvalKit,以避免潜在的依赖冲突。

## 评测

1. **部署大语言多模态模型 (LMMs)**

```shell
lmdeploy serve api_server <model_path> --server-port 23333 <--other-options>
```

2. **配置评测设置**

修改 `VLMEvalKit/vlmeval/config.py`,在 `api_models` 字典中添加以下 LMDeploy API 配置。

`<task_name>` 是您评测任务的自定义名称(例如 `lmdeploy_qwen3vl-4b`)。`model` 参数应与 `lmdeploy serve` 命令中使用的 `<model_path>` 保持一致。

```python
// filepath: VLMEvalKit/vlmeval/config.py
// ...existing code...
api_models = {
# lmdeploy api
...,
"<task_name>": partial(
LMDeployAPI,
api_base="http://0.0.0.0:23333/v1/chat/completions",
model="<model_path>",
retry=4,
timeout=1200,
temperature=0.7, # modify if needed
max_new_tokens=16384, # modify if needed
),
...
}
// ...existing code...
```

3. **开始评测**

```shell
cd VLMEvalKit
python run.py --data OCRBench --model <task_name> --api-nproc 16 --reuse --verbose --api 123
```

`<task_name>` 应与上述配置文件中使用的名称保持一致。

参数说明:

- `--data`: 指定用于评测的数据集(例如 `OCRBench`)。
- `--model`: 指定模型名称,必须与您在 `config.py` 中设置的 `<task_name>` 匹配。
- `--api-nproc`: 指定并行的 API 调用数量。
- `--reuse`: 复用先前的推理结果,以避免重新运行已完成的评测。
- `--verbose`: 启用详细日志记录。
1 change: 1 addition & 0 deletions docs/zh_cn/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ LMDeploy 工具箱提供以下核心功能:

benchmark/benchmark.md
benchmark/evaluate_with_opencompass.md
benchmark/evaluate_with_vlmevalkit.md

.. toctree::
:maxdepth: 1
Expand Down