Summary

Generates training data with JSON annotations for training Tesseract OCR on custom text characters/codes and is fully compatible with Supervisely.

COMING SOON:

GUI app to generate/edit box and tiff files for Tesseract 4.0.0 training
One-click training as well at some point.

Output

Generates the outputdir directory with the following tree structure:

├── outputdir
│   ├── train
│   │   ├── img
│   │   ├── ann
│   ├── test
│   │   ├── img
│   │   ├── ann
│   ├── meta.json

With images stored in their respective train (or test) /img folder and annotations in the train (or test) /ann folder. The meta.json file contains data for Supervisely. The annotations Json contains the text in the image, and the Top Left and Bottom Right co-ordinates of the bounding box. The text can be accessed by json['objects'][0]['description'] and the points can be accessed by json['objects'][0]['points']['exterior'].

How to use

Clone the repo with git clone https://github.com/rafayk7/tesseractDataGenerator.git
Download requirements with pip install requirements.txt
Run run.py with python3 run.py

Parameters to change

In run.py

trainingAmt - Number of total images to be generated (default 1000)
trainTestSplit - Split between number of training and testing images in range [0,1] (default 0.7)
codesfile - path to the file with codes/chars to generate training images from (default codesKOR.txt)
outputdir - directory to store all generated data (default /data)

In Utils.py

getTextImgForTraining
1. minAngle - lower bound angle for skew generation
2. maxAngle - upper bound angle for skew generation
3. (W, H) - size of text image (not final training image - see getTrainingImage)
4. fontSize - size of font of text - change with getTrainingImage
getImgJson
1. format of JSON annotations file - currently in compatibility with Supervisely
2. width, height - width and height of training image, change with (Wimg, Himg) in getTrainingImage
getTrainingImg
1. fontSize - size of font of text - change with getTextImgForTraining
2. (Wimg, Himg) - size of final training image
3. backgroundImgPath - path to background image for training images

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
screenshots		screenshots
Utils.py		Utils.py
background.jpg		background.jpg
codesKOR.txt		codesKOR.txt
getSimilarity.py		getSimilarity.py
readme.md		readme.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Summary

COMING SOON:

Output

How to use

Parameters to change

In run.py

In Utils.py

Sample Results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

rafayk7/tesseractDataGenerator

Folders and files

Latest commit

History

Repository files navigation

Summary

COMING SOON:

Output

How to use

Parameters to change

In run.py

In Utils.py

Sample Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages