Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition

This project is under CeCILL-C license (full details in LICENSE_CECILL-C.md).

This repository is a public implementation of the paper: "Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition", International Conference on Document Analysis and Recognition, 2023.

The paper is available on Arxiv.

Click to see the demo:

Pretrained model weights are available here and here

Table of contents:

Getting Started
Datasets
Training And Evaluation

Getting Started

We used Python 3.10.4, Pytorch 1.12.0 and CUDA 10.2.

Clone the repository:

git clone https://github.com/FactoDeepLearning/FasterDAN.git

Install the dependencies in conda env:

conda create --name fdan
conda activate fdan
cd FasterDAN
pip install -e .
cd faster_dan

Datasets

We used three datasets in the paper: RIMES 2009, READ 2016 and MAURDOR.

RIMES dataset at page level was distributed during the evaluation compaign of 2009.

The MAURDOR dataset was distributed during the evaluation compaign of 2013. It is now available here.

READ 2016 dataset corresponds to the one used in the ICFHR 2016 competition on handwritten text recognition. It can be found here

Raw dataset files must be placed in Datasets/raw/{dataset_name}
where dataset name is "READ 2016", "RIMES" or "Maurdor".

Training And Evaluation

Step 1: Download the datasets and place the raw files in the following folder: Datasets/raw/{dataset_name}

Step 2: Format the dataset

python3 Datasets/dataset_formatters/read2016_formatter.py
python3 Datasets/dataset_formatters/rimes_formatter.py
python3 Datasets/dataset_formatters/maurdor_formatter.py

Step 3: Add any font you want as .ttf file in the folder Fonts

Step 4 : Generate synthetic line dataset and pretrain on it

cd OCR/line_OCR/ctc/
python3 main_syn_line.py # generation
python3 main_line_ctc_syn.py # training

There are two lines in this script to adapt to the used dataset:

model.generate_syn_line_dataset("READ_2016_syn_line")
dataset_name = "READ_2016"

Weights and evaluation results are stored in OCR/line_OCR/ctc/outputs

Step 6 : Training the Faster DAN / DAN

cd OCR/document_OCR/faster_dan/
python3 main_faster_dan.py  # faster dan
python3 main_std_dan.py  # original dan

Weights and evaluation results are stored in OCR/document_OCR/dan/outputs

Remarks (for pre-training and training)

Scripts are given for the READ 2016 dataset and must be adapted for RIMES 2009 and MAURDOR (mostly dataset_name parameter, and pretraining paths) All hyperparameters are specified and editable in the training scripts (meaning are in comments).
Evaluation is performed just after training ending (training is stopped when the maximum elapsed time is reached or after a maximum number of epoch as specified in the training script).
The outputs files are split into two subfolders: "checkpoints" and "results".
"checkpoints" contains model weights for the last trained epoch and for the epoch giving the best CER on the validation set.
"results" contains tensorboard log for loss and metrics as well as text file for used hyperparameters and results of evaluation.

Citation

@inproceedings{Coquenet2023fasterdan,
  author = {Coquenet, Denis and Chatelain, Clément and Paquet, Thierry},
  title = {Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition},
  booktitle={International Conference on Document Analysis and Recognition (ICDAR)},
  year={2023},
  pages={182--199},
  series={Lecture Notes in Computer Science},
  volume={14190},
  doi={10.1007/978-3-031-41685-9_12},
  url={https://arxiv.org/abs/2301.10593},
}

License

This project is under CeCILL-C license.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
faster_dan		faster_dan
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE_CECILL-C.md		LICENSE_CECILL-C.md
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition

Getting Started

Datasets

Training And Evaluation

Step 1: Download the datasets and place the raw files in the following folder: Datasets/raw/{dataset_name}

Step 2: Format the dataset

Step 3: Add any font you want as .ttf file in the folder Fonts

Step 4 : Generate synthetic line dataset and pretrain on it

Step 6 : Training the Faster DAN / DAN

Remarks (for pre-training and training)

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

FactoDeepLearning/FasterDAN

Folders and files

Latest commit

History

Repository files navigation

Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition

Getting Started

Datasets

Training And Evaluation

Step 1: Download the datasets and place the raw files in the following folder: Datasets/raw/{dataset_name}

Step 2: Format the dataset

Step 3: Add any font you want as .ttf file in the folder Fonts

Step 4 : Generate synthetic line dataset and pretrain on it

Step 6 : Training the Faster DAN / DAN

Remarks (for pre-training and training)

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages