Skip to content

metunlp/ocrturk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCRTurk Benchmark 🇹🇷

A comprehensive evaluation framework for comparing OCR model outputs against Turkish Real-World data. This tool provides detailed metrics for text, equations, tables, and images extracted from documents.

Features

  • Text Metrics: Normalized Edit Distance (NED) and Turkish character similarity
  • Equation Metrics: BLEU-4, Character Dice Metric (CDM), and NED for LaTeX equations
  • Table Metrics: NED and TEDS-like similarity for extracted tables
  • Image Metrics: MSE, and DreamSim

Installation

Requirements

pip install -r requirements.txt

Quick Start

Basic Usage

python eval.py <ground_truth_path> <model_output_path> [results_path]

With Image Metrics

python eval.py <ground_truth_path> <model_output_path> [results_path] --images

Example

python eval.py ./data/ground_truth ./data/model_outputs ./results --images

Directory Structure

Expected Input Structure

ground_truth/
├── data_1/
│   ├── document.md
│   └── figures/
│       ├── figure_1.png
│       └── figure_2.png
├── data_2/
│   ├── document.md
│   └── figures/
│       └── figure_1.png
└── ...

model_outputs/
├── data_1/
│   ├── result.md (or document.md)
│   └── images/ (or fig/, imgs/)
│       ├── figure_1.png
│       └── figure_2.png
└── ...

Output Structure

results/
├── per_doc_metrics.csv      # Metrics for each document
├── per_image_metrics.csv    # Metrics for each image pair
└── summary_metrics.csv      # Aggregated summary statistics

Metrics Explained

Text Metrics

  • NED (Normalized Edit Distance): Levenshtein distance normalized by length (lower is better, 0 = perfect match)
  • Turkish Character Similarity: Specialized metric for Turkish diacritics (higher is better, 1 = perfect)

Equation Metrics

  • BLEU-4: Standard BLEU score for LaTeX equations (higher is better, 1 = perfect)
  • CDM (Character Dice Metric): F1-like metric for character overlap (higher is better, 1 = perfect)
  • Equation NED: Edit distance for LaTeX strings (lower is better)

Table Metrics

  • Table NED: Edit distance on CSV-serialized tables (lower is better)
  • TEDS-like: Tree Edit Distance-based similarity for table structure (higher is better, 1 = perfect)

Image Metrics

  • MSE: Mean Squared Error (lower is better)
  • DreamSim: Perceptual similarity metric (lower is better)

Output Files

per_doc_metrics.csv

Per-document metrics including:

  • Text NED and Turkish character similarity
  • Equation metrics (NED, BLEU, CDM)
  • Table metrics (NED, TEDS)
  • Image metrics
  • Counts of extracted elements

per_image_metrics.csv

Per-image-pair metrics:

  • MSE, Dreamsim for each image pair
  • Source file paths

summary_metrics.csv

Aggregated statistics across all documents and images.

Citation

If you use this tool in your research, please cite:

@misc{yılmaz2026ocrturkcomprehensiveocrbenchmark,
      title={OCRTurk: A Comprehensive OCR Benchmark for Turkish}, 
      author={Deniz Yılmaz and Evren Ayberk Munis and Çağrı Toraman and Süha Kağan Köse and Burak Aktaş and Mehmet Can Baytekin and Bilge Kaan Görür},
      year={2026},
      eprint={2602.03693},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.03693}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors