From a513bffebf49c48085506665af292239aba3735a Mon Sep 17 00:00:00 2001
From: brown <1041206149@qq.com>
Date: Fri, 3 Apr 2026 14:26:10 +0800
Subject: [PATCH 1/4] docs: Update README

---
 README.md    | 24 ++++++++++++++++++++++++
 README_zh.md | 24 ++++++++++++++++++++++++
 2 files changed, 48 insertions(+)
diff --git a/README.md b/README.md
index 56e0bf7..9aad153 100644
--- a/README.md
+++ b/README.md
@@ -92,6 +92,30 @@ All scores are in **[0, 1]**; higher is better.
 
 ### ROUGE-N F1 on Full Dataset (7,809 samples)
 
+**How to reproduce:** Use the evaluation scripts in the [MinerU-HTML](https://github.com/opendatalab/MinerU-HTML) repository:
+
+```bash
+# Clone MinerU-HTML and prepare the full dataset (WebMainBench_7809.jsonl)
+git clone https://github.com/opendatalab/MinerU-HTML.git
+cd MinerU-HTML
+
+# Run evaluation (example for MinerU-HTML extractor)
+python eval_baselines.py \
+    --bench benchmark/WebMainBench_7809.jsonl \
+    --task_dir benchmark_results/mineru_html-html-md \
+    --extractor_name mineru_html-html-md \
+    --model_path YOUR_MODEL_PATH \
+    --default_config gpu
+
+# For CPU-based extractors (e.g. trafilatura, resiliparse, magic-html)
+python eval_baselines.py \
+    --bench benchmark/WebMainBench_7809.jsonl \
+    --task_dir benchmark_results/trafilatura-html-md \
+    --extractor_name trafilatura-html-md
+```
+
+Results are written to `benchmark_results/<extractor>/mean_eval_result.json`. See `run_eval.sh` for a complete multi-extractor example.
+
 Results from the [Dripper paper](https://arxiv.org/abs/2511.23119) (Table 2):
 
 | Extractor | Mode | All | Simple | Mid | Hard |
diff --git a/README_zh.md b/README_zh.md
index 19e225c..f83582e 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -92,6 +92,30 @@ WebMainBench 支持两套互补的评测协议：
 
 ### ROUGE-N F1 — 全量数据集（7,809 条）
 
+**复现方法：** 使用 [MinerU-HTML](https://github.com/opendatalab/MinerU-HTML) 仓库中的评测脚本：
+
+```bash
+# 克隆 MinerU-HTML 并准备全量数据集（WebMainBench_7809.jsonl）
+git clone https://github.com/opendatalab/MinerU-HTML.git
+cd MinerU-HTML
+
+# 运行评测（以 MinerU-HTML 抽取器为例）
+python eval_baselines.py \
+    --bench benchmark/WebMainBench_7809.jsonl \
+    --task_dir benchmark_results/mineru_html-html-md \
+    --extractor_name mineru_html-html-md \
+    --model_path YOUR_MODEL_PATH \
+    --default_config gpu
+
+# 对于基于 CPU 的抽取器（如 trafilatura、resiliparse、magic-html）
+python eval_baselines.py \
+    --bench benchmark/WebMainBench_7809.jsonl \
+    --task_dir benchmark_results/trafilatura-html-md \
+    --extractor_name trafilatura-html-md
+```
+
+结果写入 `benchmark_results/<extractor>/mean_eval_result.json`。完整的多抽取器示例见 `run_eval.sh`。
+
 来自 [Dripper 论文](https://arxiv.org/abs/2511.23119)（表 2）：
 
 | 抽取器 | 模式 | All | Simple | Mid | Hard |

From 07cf3dfffe125750555af11b11fba166263bd902 Mon Sep 17 00:00:00 2001
From: brown <1041206149@qq.com>
Date: Fri, 3 Apr 2026 17:44:22 +0800
Subject: [PATCH 2/4] docs: Update README

---
 README.md    | 2 +-
 README_zh.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 9aad153..1a8fe18 100644
--- a/README.md
+++ b/README.md
@@ -92,7 +92,7 @@ All scores are in **[0, 1]**; higher is better.
 
 ### ROUGE-N F1 on Full Dataset (7,809 samples)
 
-**How to reproduce:** Use the evaluation scripts in the [MinerU-HTML](https://github.com/opendatalab/MinerU-HTML) repository:
+**Execution Method:** Use the evaluation scripts in the [MinerU-HTML](https://github.com/opendatalab/MinerU-HTML) repository:
 
 ```bash
 # Clone MinerU-HTML and prepare the full dataset (WebMainBench_7809.jsonl)
diff --git a/README_zh.md b/README_zh.md
index f83582e..911c283 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -92,7 +92,7 @@ WebMainBench 支持两套互补的评测协议：
 
 ### ROUGE-N F1 — 全量数据集（7,809 条）
 
-**复现方法：** 使用 [MinerU-HTML](https://github.com/opendatalab/MinerU-HTML) 仓库中的评测脚本：
+**执行方法：** 使用 [MinerU-HTML](https://github.com/opendatalab/MinerU-HTML) 仓库中的评测脚本：
 
 ```bash
 # 克隆 MinerU-HTML 并准备全量数据集（WebMainBench_7809.jsonl）

From 64a81ee91eac1cd068699435c4388a3be726999c Mon Sep 17 00:00:00 2001
From: brown <1041206149@qq.com>
Date: Fri, 3 Apr 2026 19:14:48 +0800
Subject: [PATCH 3/4] docs: Update README

---
 README.md    | 67 +++++++++++++++++++++++++++++++---------------------
 README_zh.md | 67 +++++++++++++++++++++++++++++++---------------------
 2 files changed, 80 insertions(+), 54 deletions(-)

diff --git a/README.md b/README.md
index 1a8fe18..31a4540 100644
--- a/README.md
+++ b/README.md
@@ -92,30 +92,6 @@ All scores are in **[0, 1]**; higher is better.
 
 ### ROUGE-N F1 on Full Dataset (7,809 samples)
 
-**Execution Method:** Use the evaluation scripts in the [MinerU-HTML](https://github.com/opendatalab/MinerU-HTML) repository:
-
-```bash
-# Clone MinerU-HTML and prepare the full dataset (WebMainBench_7809.jsonl)
-git clone https://github.com/opendatalab/MinerU-HTML.git
-cd MinerU-HTML
-
-# Run evaluation (example for MinerU-HTML extractor)
-python eval_baselines.py \
-    --bench benchmark/WebMainBench_7809.jsonl \
-    --task_dir benchmark_results/mineru_html-html-md \
-    --extractor_name mineru_html-html-md \
-    --model_path YOUR_MODEL_PATH \
-    --default_config gpu
-
-# For CPU-based extractors (e.g. trafilatura, resiliparse, magic-html)
-python eval_baselines.py \
-    --bench benchmark/WebMainBench_7809.jsonl \
-    --task_dir benchmark_results/trafilatura-html-md \
-    --extractor_name trafilatura-html-md
-```
-
-Results are written to `benchmark_results/<extractor>/mean_eval_result.json`. See `run_eval.sh` for a complete multi-extractor example.
-
 Results from the [Dripper paper](https://arxiv.org/abs/2511.23119) (Table 2):
 
 | Extractor | Mode | All | Simple | Mid | Hard |
@@ -164,6 +140,15 @@ The dataset is hosted on Hugging Face: [opendatalab/WebMainBench](https://huggin
 ```python
 from huggingface_hub import hf_hub_download
 
+# Full dataset (7,809 samples) — used for ROUGE-N F1 evaluation
+hf_hub_download(
+    repo_id="opendatalab/WebMainBench",
+    repo_type="dataset",
+    filename="WebMainBench_7809.jsonl",
+    local_dir="data/",
+)
+
+# 545-sample subset — used for Fine-Grained Edit-Distance Metrics evaluation
 hf_hub_download(
     repo_id="opendatalab/WebMainBench",
     repo_type="dataset",
@@ -172,7 +157,35 @@ hf_hub_download(
 )
 ```
 
-### Configure LLM (Optional)
+### ROUGE-N F1 Evaluation (WebMainBench_7809.jsonl)
+
+Use the evaluation scripts in the [MinerU-HTML](https://github.com/opendatalab/MinerU-HTML) repository:
+
+```bash
+# Clone MinerU-HTML and prepare the full dataset (WebMainBench_7809.jsonl)
+git clone https://github.com/opendatalab/MinerU-HTML.git
+cd MinerU-HTML
+
+# Run evaluation (example for MinerU-HTML extractor)
+python eval_baselines.py \
+    --bench benchmark/WebMainBench_7809.jsonl \
+    --task_dir benchmark_results/mineru_html-html-md \
+    --extractor_name mineru_html-html-md \
+    --model_path YOUR_MODEL_PATH \
+    --default_config gpu
+
+# For CPU-based extractors (e.g. trafilatura, resiliparse, magic-html)
+python eval_baselines.py \
+    --bench benchmark/WebMainBench_7809.jsonl \
+    --task_dir benchmark_results/trafilatura-html-md \
+    --extractor_name trafilatura-html-md
+```
+
+Results are written to `benchmark_results/<extractor>/mean_eval_result.json`. See `run_eval.sh` for a complete multi-extractor example.
+
+### Fine-Grained Edit-Distance Metrics Evaluation (WebMainBench_545.jsonl)
+
+#### Configure LLM (Optional)
 
 LLM-enhanced content splitting improves formula/table/code extraction accuracy. To enable it, copy `.env.example` to `.env` and fill in your API credentials:
 
@@ -181,7 +194,7 @@ cp .env.example .env
 # Edit .env and set LLM_BASE_URL, LLM_API_KEY, LLM_MODEL
 ```
 
-### Run an Evaluation
+#### Run an Evaluation
 
 ```python
 from webmainbench import DataLoader, Evaluator, ExtractorFactory
@@ -194,7 +207,7 @@ m = result.overall_metrics
 print(f"Overall Score: {result.overall_metrics['overall']:.4f}")
 ```
 
-### Compare Multiple Extractors
+#### Compare Multiple Extractors
 
 ```python
 extractors = ["trafilatura", "resiliparse", "magic-html"]
diff --git a/README_zh.md b/README_zh.md
index 911c283..8092d18 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -92,30 +92,6 @@ WebMainBench 支持两套互补的评测协议：
 
 ### ROUGE-N F1 — 全量数据集（7,809 条）
 
-**执行方法：** 使用 [MinerU-HTML](https://github.com/opendatalab/MinerU-HTML) 仓库中的评测脚本：
-
-```bash
-# 克隆 MinerU-HTML 并准备全量数据集（WebMainBench_7809.jsonl）
-git clone https://github.com/opendatalab/MinerU-HTML.git
-cd MinerU-HTML
-
-# 运行评测（以 MinerU-HTML 抽取器为例）
-python eval_baselines.py \
-    --bench benchmark/WebMainBench_7809.jsonl \
-    --task_dir benchmark_results/mineru_html-html-md \
-    --extractor_name mineru_html-html-md \
-    --model_path YOUR_MODEL_PATH \
-    --default_config gpu
-
-# 对于基于 CPU 的抽取器（如 trafilatura、resiliparse、magic-html）
-python eval_baselines.py \
-    --bench benchmark/WebMainBench_7809.jsonl \
-    --task_dir benchmark_results/trafilatura-html-md \
-    --extractor_name trafilatura-html-md
-```
-
-结果写入 `benchmark_results/<extractor>/mean_eval_result.json`。完整的多抽取器示例见 `run_eval.sh`。
-
 来自 [Dripper 论文](https://arxiv.org/abs/2511.23119)（表 2）：
 
 | 抽取器 | 模式 | All | Simple | Mid | Hard |
@@ -164,6 +140,15 @@ pip install -e .
 ```python
 from huggingface_hub import hf_hub_download
 
+# 全量数据集（7,809 条）— 用于 ROUGE-N F1 评测
+hf_hub_download(
+    repo_id="opendatalab/WebMainBench",
+    repo_type="dataset",
+    filename="WebMainBench_7809.jsonl",
+    local_dir="data/",
+)
+
+# 545 条样本子集 — 用于细粒度编辑距离指标评测
 hf_hub_download(
     repo_id="opendatalab/WebMainBench",
     repo_type="dataset",
@@ -172,7 +157,35 @@ hf_hub_download(
 )
 ```
 
-### 配置 LLM（可选）
+### ROUGE-N F1 评测（WebMainBench_7809.jsonl）
+
+使用 [MinerU-HTML](https://github.com/opendatalab/MinerU-HTML) 仓库中的评测脚本：
+
+```bash
+# 克隆 MinerU-HTML 并准备全量数据集（WebMainBench_7809.jsonl）
+git clone https://github.com/opendatalab/MinerU-HTML.git
+cd MinerU-HTML
+
+# 运行评测（以 MinerU-HTML 抽取器为例）
+python eval_baselines.py \
+    --bench benchmark/WebMainBench_7809.jsonl \
+    --task_dir benchmark_results/mineru_html-html-md \
+    --extractor_name mineru_html-html-md \
+    --model_path YOUR_MODEL_PATH \
+    --default_config gpu
+
+# 对于基于 CPU 的抽取器（如 trafilatura、resiliparse、magic-html）
+python eval_baselines.py \
+    --bench benchmark/WebMainBench_7809.jsonl \
+    --task_dir benchmark_results/trafilatura-html-md \
+    --extractor_name trafilatura-html-md
+```
+
+结果写入 `benchmark_results/<extractor>/mean_eval_result.json`。完整的多抽取器示例见 `run_eval.sh`。
+
+### 细粒度编辑距离指标评测（WebMainBench_545.jsonl）
+
+#### 配置 LLM（可选）
 
 LLM 增强内容拆分可提升公式/表格/代码的抽取精度。如需启用，将 `.env.example` 复制为 `.env` 并填写 API 信息：
 
@@ -181,7 +194,7 @@ cp .env.example .env
 # 编辑 .env，设置 LLM_BASE_URL、LLM_API_KEY、LLM_MODEL
 ```
 
-### 运行评测
+#### 运行评测
 
 ```python
 from webmainbench import DataLoader, Evaluator, ExtractorFactory
@@ -194,7 +207,7 @@ m = result.overall_metrics
 print(f"Overall Score: {result.overall_metrics['overall']:.4f}")
 ```
 
-### 多抽取器对比
+#### 多抽取器对比
 
 ```python
 extractors = ["trafilatura", "resiliparse", "magic-html"]

From 3b617f3fce84fa907c774ddbe9fb64478edce343 Mon Sep 17 00:00:00 2001
From: brown <1041206149@qq.com>
Date: Fri, 3 Apr 2026 19:20:17 +0800
Subject: [PATCH 4/4] docs: Update README

---
 README.md    | 10 +++++-----
 README_zh.md | 10 +++++-----
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/README.md b/README.md
index 31a4540..71d6779 100644
--- a/README.md
+++ b/README.md
@@ -144,7 +144,7 @@ from huggingface_hub import hf_hub_download
 hf_hub_download(
     repo_id="opendatalab/WebMainBench",
     repo_type="dataset",
-    filename="WebMainBench_7809.jsonl",
+    filename="webmainbench.jsonl",
     local_dir="data/",
 )
 
@@ -157,18 +157,18 @@ hf_hub_download(
 )
 ```
 
-### ROUGE-N F1 Evaluation (WebMainBench_7809.jsonl)
+### ROUGE-N F1 Evaluation (webmainbench.jsonl)
 
 Use the evaluation scripts in the [MinerU-HTML](https://github.com/opendatalab/MinerU-HTML) repository:
 
 ```bash
-# Clone MinerU-HTML and prepare the full dataset (WebMainBench_7809.jsonl)
+# Clone MinerU-HTML and prepare the full dataset (webmainbench.jsonl)
 git clone https://github.com/opendatalab/MinerU-HTML.git
 cd MinerU-HTML
 
 # Run evaluation (example for MinerU-HTML extractor)
 python eval_baselines.py \
-    --bench benchmark/WebMainBench_7809.jsonl \
+    --bench benchmark/webmainbench.jsonl \
     --task_dir benchmark_results/mineru_html-html-md \
     --extractor_name mineru_html-html-md \
     --model_path YOUR_MODEL_PATH \
@@ -176,7 +176,7 @@ python eval_baselines.py \
 
 # For CPU-based extractors (e.g. trafilatura, resiliparse, magic-html)
 python eval_baselines.py \
-    --bench benchmark/WebMainBench_7809.jsonl \
+    --bench benchmark/webmainbench.jsonl \
     --task_dir benchmark_results/trafilatura-html-md \
     --extractor_name trafilatura-html-md
 ```
diff --git a/README_zh.md b/README_zh.md
index 8092d18..0601bdc 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -144,7 +144,7 @@ from huggingface_hub import hf_hub_download
 hf_hub_download(
     repo_id="opendatalab/WebMainBench",
     repo_type="dataset",
-    filename="WebMainBench_7809.jsonl",
+    filename="webmainbench.jsonl",
     local_dir="data/",
 )
 
@@ -157,18 +157,18 @@ hf_hub_download(
 )
 ```
 
-### ROUGE-N F1 评测（WebMainBench_7809.jsonl）
+### ROUGE-N F1 评测（webmainbench.jsonl）
 
 使用 [MinerU-HTML](https://github.com/opendatalab/MinerU-HTML) 仓库中的评测脚本：
 
 ```bash
-# 克隆 MinerU-HTML 并准备全量数据集（WebMainBench_7809.jsonl）
+# 克隆 MinerU-HTML 并准备全量数据集（webmainbench.jsonl）
 git clone https://github.com/opendatalab/MinerU-HTML.git
 cd MinerU-HTML
 
 # 运行评测（以 MinerU-HTML 抽取器为例）
 python eval_baselines.py \
-    --bench benchmark/WebMainBench_7809.jsonl \
+    --bench benchmark/webmainbench.jsonl \
     --task_dir benchmark_results/mineru_html-html-md \
     --extractor_name mineru_html-html-md \
     --model_path YOUR_MODEL_PATH \
@@ -176,7 +176,7 @@ python eval_baselines.py \
 
 # 对于基于 CPU 的抽取器（如 trafilatura、resiliparse、magic-html）
 python eval_baselines.py \
-    --bench benchmark/WebMainBench_7809.jsonl \
+    --bench benchmark/webmainbench.jsonl \
     --task_dir benchmark_results/trafilatura-html-md \
     --extractor_name trafilatura-html-md
 ```