diff --git a/docs/en/benchmark/evaluate_with_vlmevalkit.md b/docs/en/benchmark/evaluate_with_vlmevalkit.md new file mode 100644 index 0000000000..da71b5f684 --- /dev/null +++ b/docs/en/benchmark/evaluate_with_vlmevalkit.md @@ -0,0 +1,65 @@ +# Multi-Modal Model Evaluation Guide + +This document describes how to evaluate multi-modal models' capabilities using VLMEvalKit and LMDeploy. + +## Environment Setup + +```shell +pip install lmdeploy + +git clone https://github.com/open-compass/VLMEvalKit.git +cd VLMEvalKit && pip install -e . +``` + +It is recommended to install LMDeploy and VLMEvalKit in separate Python virtual environments to avoid potential dependency conflicts. + +## Evaluations + +1. **Deploy Large Multi-Modality Models (LMMs)** + +```shell +lmdeploy serve api_server --server-port 23333 <--other-options> +``` + +2. **Config the Evaluation Settings** + +Modify `VLMEvalKit/vlmeval/config.py`, add following LMDeploy API configurations in the `api_models` dictionary. + +The `` is a custom name for your evaluation task (e.g., `lmdeploy_qwen3vl-4b`). The `model` parameter should match the `` used in the `lmdeploy serve` command. + +```python +// filepath: VLMEvalKit/vlmeval/config.py +// ...existing code... +api_models = { + # lmdeploy api + ..., + "": partial( + LMDeployAPI, + api_base="http://0.0.0.0:23333/v1/chat/completions", + model="", + retry=4, + timeout=1200, + temperature=0.7, # modify if needed + max_new_tokens=16384, # modify if needed + ), + ... +} +// ...existing code... +``` + +3. **Start Evaluations** + +```shell +cd VLMEvalKit +python run.py --data OCRBench --model --api-nproc 16 --reuse --verbose --api 123 +``` + +The `` should match the one used in the above config file. + +Parameter explanations: + +- `--data`: Specify the dataset for evaluation (e.g., `OCRBench`). +- `--model`: Specify the model name, which must match the `` in your `config.py`. +- `--api-nproc`: Specify the number of parallel API calls. +- `--reuse`: Reuse previous inference results to avoid re-running completed evaluations. +- `--verbose`: Enable verbose logging. diff --git a/docs/en/index.rst b/docs/en/index.rst index b28042a977..cc326e4dc7 100644 --- a/docs/en/index.rst +++ b/docs/en/index.rst @@ -88,6 +88,7 @@ Documentation benchmark/benchmark.md benchmark/evaluate_with_opencompass.md + benchmark/evaluate_with_vlmevalkit.md .. toctree:: :maxdepth: 1 diff --git a/docs/zh_cn/benchmark/evaluate_with_vlmevalkit.md b/docs/zh_cn/benchmark/evaluate_with_vlmevalkit.md new file mode 100644 index 0000000000..da55a38dd4 --- /dev/null +++ b/docs/zh_cn/benchmark/evaluate_with_vlmevalkit.md @@ -0,0 +1,65 @@ +# 多模态模型评测指南 + +本文档介绍如何使用 VLMEvalKit 和 LMDeploy 评测多模态模型能力。 + +## 环境准备 + +```shell +pip install lmdeploy + +git clone https://github.com/open-compass/VLMEvalKit.git +cd VLMEvalKit && pip install -e . +``` + +建议在不同的 Python 虚拟环境中分别安装 LMDeploy 和 VLMEvalKit,以避免潜在的依赖冲突。 + +## 评测 + +1. **部署大语言多模态模型 (LMMs)** + +```shell +lmdeploy serve api_server --server-port 23333 <--other-options> +``` + +2. **配置评测设置** + +修改 `VLMEvalKit/vlmeval/config.py`,在 `api_models` 字典中添加以下 LMDeploy API 配置。 + +`` 是您评测任务的自定义名称(例如 `lmdeploy_qwen3vl-4b`)。`model` 参数应与 `lmdeploy serve` 命令中使用的 `` 保持一致。 + +```python +// filepath: VLMEvalKit/vlmeval/config.py +// ...existing code... +api_models = { + # lmdeploy api + ..., + "": partial( + LMDeployAPI, + api_base="http://0.0.0.0:23333/v1/chat/completions", + model="", + retry=4, + timeout=1200, + temperature=0.7, # modify if needed + max_new_tokens=16384, # modify if needed + ), + ... +} +// ...existing code... +``` + +3. **开始评测** + +```shell +cd VLMEvalKit +python run.py --data OCRBench --model --api-nproc 16 --reuse --verbose --api 123 +``` + +`` 应与上述配置文件中使用的名称保持一致。 + +参数说明: + +- `--data`: 指定用于评测的数据集(例如 `OCRBench`)。 +- `--model`: 指定模型名称,必须与您在 `config.py` 中设置的 `` 匹配。 +- `--api-nproc`: 指定并行的 API 调用数量。 +- `--reuse`: 复用先前的推理结果,以避免重新运行已完成的评测。 +- `--verbose`: 启用详细日志记录。 diff --git a/docs/zh_cn/index.rst b/docs/zh_cn/index.rst index 733bfc585e..3387c70c80 100644 --- a/docs/zh_cn/index.rst +++ b/docs/zh_cn/index.rst @@ -89,6 +89,7 @@ LMDeploy 工具箱提供以下核心功能: benchmark/benchmark.md benchmark/evaluate_with_opencompass.md + benchmark/evaluate_with_vlmevalkit.md .. toctree:: :maxdepth: 1