Skip to content

IshantPundir/LP-ICR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LP-ICR - License Plate Intelligent Character Recognition

LP-ICR is a Seq2Seq deep learning model for License Plate Intelligent Character Recognition, which is based on a CRNN (Convolutional Recurrent Neural Network) architecture with Attention and Spatial Transformer Network (STN) head. It has been trained using the Connectionist Temporal Classification (CTC) loss function and has achieved a Word Error Rate (WER) of 0.16 and a Character Error Rate (CER) of 0.03.

You can download the pretrained model weights & datasets from the following link: LP-ICR Model Weights & datasets

Table of Contents

Model Details

The LP-ICR model is designed to recognize characters and numbers on single & multi row license plates in images. It is based on a deep learning architecture that combines Convolutional Neural Networks (CNNs) for feature extraction, Recurrent Neural Networks (RNNs) for sequence modeling, Attention mechanisms for selective focus, and a Spatial Transformer Network (STN) for spatial transformation. LPICR model is an Seq2Seq End2End model.

Model Performance

  • Word Error Rate (WER): 0.16
  • Character Error Rate (CER): 0.03

The model is compact, with a weight size of approximately 250MB (pre-optimization like ONNX), making it suitable for various deployment scenarios. It also boasts impressive inference speed, with an inference time on a CPU of just 0.027 seconds.

Here are the performance metrics and loss curves for the LP-ICR model:

Word Error Rate (WER) Character Error Rate (CER) Loss Curve
WER CER Loss Curve

Results

Here is a table showcasing the results of LP-ICR on various license plate images:

Image Recognized Text
Image 1 KA05MT4918
Image 2 KA01AJ9528
Image 3 KA04A9383
Image 4 KA51AE0104
Image 5 KA05AC2170

The above table provides a snapshot of the model's recognition accuracy on different license plate images. Model is robust to diffrent plate backgrounds, dark images and 2 row licence plates.

Usage

To use the LP-ICR model for license plate recognition, follow these steps:

  1. Clone this GitHub repository to your local machine.

    git clone https://github.com/IshantPundir/LP-ICR.git
  2. Install the required dependencies (see Dependencies).

  3. Load the pretrained model weights.

  4. Use the model to recognize characters on license plates in your own images.

Example code snippet for license plate recognition:

from lpicr import LPICREngine

def load_image(path:str) -> np.ndarray:
    """Function to load the image"""
    image = cv2.imread(path)
    return image

# Initialize the engine....
lpicr = LPICREngine(model_path=model_path)

# Load an image...
image = load_image("image_path.jpg")

# Do the inference
text = lpicr(image)

print(f"Results: {text}")

You can run this command to test the model as well:

python lpicr.py model.pth image.png

Dependencies

You can install the dependencies using pip:

pip install -r requirements.txt

Training

You can start the training by running train.py like this:

python train.py --run_name TestRun --config path/to/config.yaml

The script will create a folder output/TestRun, where it will save the model and ceckpoints.

Config files can be found in config directory. Config file looks this this:

train:
  epochs: 1000
  batch_size: 64
  learning_rate: 0.0001
  train_denoiser: false

image_transform_config:
  image_width: 128
  image_height: 32

label_encoder:
  max_sequence_length: 32

model_config:
  embed_size: 512
  
datasets:
  downsample: 1
  make_2_row: false
  train_datasets:
    - data/INLP-RAW-augmented/train
    - data/voc_plate-augmented/train
    - data/2-row-lp-augmented/train
    - data/2-row-lp-v2-augmented/train
    - data/INLP-RAW/train
    - data/voc_plate/train
    - data/2-row-lp/train
    - data/2-row-lp-v2/train
  test_datasets:
    - data/INLP-RAW/test
    - data/2-row-lp/test
    - data/2-row-lp-v2/test
    - data/voc_plate/test
    - data/INLP-RAW-augmented/test

Most of the configurations are self-explanatory. You can list multiple datasets & the E2EDataset Dataset class will automatically combine them all in a single dataset object.

make_2_row is an optional arugment for data augmentation, is True it will randomly slip images in half and join them on the X axis, artifically creating 2-row text images. This is only used during pre-training on OCR dataset.

downsample is an artifact from previous architecture with an Denoiser layer, keep it's value to 1.

NOTE: Image transformer config, Label encoder config and Model's config are all stored in the final .pth weight files, allowing you to initiaize image transformer, label encoder and model from a single weight files.

Fine-tuning

You can resume the training using --resume argument like this

python train.py --run_name TestRun2 --config config/lpicr.yaml --resume output/TestRun1/model.pth

Logging

I support wandb for logging and monitoring the model's performance. you can start logging by passing --wandb flag.

python train.py --run_name TestRun --config config/ocr.yaml --wandb

Datasets

LP-ICR supports LMDB datsets, datasets can be found inside data directory As mentioned above, you can link many lmdb datasets together in configuration file.

Custom datasets:

you can label your own images by running the following command:

python data/lp-annotation-tool.py --images path/to/image/directory  --output path/to/output/directory --split --split_size 0.2

The labeled dataset will be stored in json format, you can run the script below to convert dataset to lmdb format:

python data/dataset_to_lmdb.py --data path/to/json/directory

This will save the dataset in Lbdb format, ready to be used for training.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published