Roman digits classification

Introduction

Config
Data structure
Data description and cleaning process
Data augmentation and splitting
Model structure
Instructions to run the model
1. Data preparation
2. Running model
Results
Authors

Config

Config files must be located at configs folder.

Parameters specific for data generation:

seed - seed for random shuffling.
image_size - size of the side for resulting image.
number_to_have - number of images to have in data_clean/_class_ directory.

Parameters specific for model:

exp_name - defines the model name and paths to write summaries and weights in.
learning rate - learning rate to train model with.
use_dropout_block - if to use dropout between convolutions.
use_dropout_dense - if to use dropout after dense layer.
dropout_rate_block - dropout rate to use between convolutions.
dropout_rate_dense - dropout rate to use after dense layer.
write_histograms - if to write histograms for distribution of activations for each layer.

Data structure

data folder contains original 'dirty' data of Roman images, split by classes. The number of images in a single class folder can be any from 0 to Infinity.
For example, folder data/1 contains all images marked as class 1.

data_clean folder contains resized, grayscaled and augmented images, split by classes. The number of images in a single class folder must be set in the config file.
For example, the folder data_clean/1 contains resized, grayscaled and augmented images, marked as class 1.

data_splitted folder contains data, split into train and test datasets. The number of single class in train dataset must be set in the config file.
For example, data_clean/train/1 contains train images, marked as class 1.

Data description and cleaning process

It was decided to create about 120 images for each class (roman numbers from 1 to 8) and augment that data to get a more versatile dataset so that neural network could be more robust to different variations of data. To achieve a goal of 120 images per class, every team member wrote his variations of roman digits with his/her unique handwriting multiple times.

The process of data cleaning includes resizing images, grayscaling them and saving to folder data_clean.

Data augmentation and splitting

Data augmentation contains applying multiple transformations to data, such as:

Cropping;
Padding;
Blurring;
Different affine transformations

Data will be generated into the folder data_clean/_class_ near existing images.

Splitting data contains the process of reading filenames of images and resaving them into folder data_splitted/(train or test)/_class_.

Model structure

We had a task of image classification so that it was decided to use a convolutional network with some tricks and whistles.
The main graph of the model is shown below.

Graph

Our model consists of 5 blocks, where block_1 - block_4 are convolution blocks and dense block containing only dense layers. Architectures of each block are separately described below.

Blocks 1-3

Notice that convolution blocks 1-3 have the same next architecture and channel growth in them is caused by stacking pooling layers:

Conv2D -> BatchNormalization -> ReLU -> Dropout (if necessary) ->
-> Conv2D -> BatchNormalization -> ReLU -> concatenated MaxPooling and Average Pooling

Convolution kernels have sizes of 3x3 and stride of 1x1.
Max- and Average-pooling layers have sizes and strides of 2x2 each.

Block structure

Block 4

In 4th convolution block we do not use stacking of pooling layers, so that number of channels here grows naturally:

Conv2D -> BatchNormalization -> ReLU -> Dropout (if necessary) ->
-> Conv2D -> BatchNormalization -> ReLU -> MaxPooling

Convolution kernels have sizes of 3x3. The strides of the first one is 1x1 and the second one is 2x2.
Max-pooling layer has size and stride of 2x2.

Block structure

Dense block

Dense block consists of 2 hidden dense layers with batch norm and leaky relu between them:

Dense -> BatchNormalization -> LeakyReLU -> Dropout (if necessary) ->
-> Dense -> Softmax

Block structure

Calculated input and output sizes of each layer are under the spoiler.

Input and output sizes:

Block 1:
- Input size: (?, 128, 128, 3)
- Output size: (?, 62, 62, 32)
Block 2:
- Input size: (?, 62, 62, 32)
- Output size: (?, 29, 29, 64)
Block 3:
- Input size: (?, 29, 29, 64)
- Output size: (?, 12, 12, 128)
Block 4:
- Input size: (?, 12, 12, 128)
- Output size: (?, 4, 4, 256)
Reshape:
- Input size: (?, 4, 4, 256)
- Output size: (?, 1024)
Dense:
- Input size: (?, 1024)
- Output size: (?, 8)

Instructions to run the model

Our project is based on some libraries you might not have. To ensure everything is installed and install missed packages run next commands.

cd path/to/project/roman_clf
pip install -r requirements.txt

Data preparation

This script will do all the stuff related to data preparation fast and clean. To run it follow instructions below.

cd path/to/project/roman_clf
python prepare_data.py -c ../configs/roman.json

It is easy, isn't it? :)

Running model

To run the model follow instructions below.

Open and run jupyter notebook roman_clf/src/Best_Model.ipynb

All necessary libraries will be imported and configs will be set up automatically. All you need to do is just click run for all cells.

Model class is formatted into the notebook, as an example, but can be moved to a different file if it would be necessary.

To run the model you will need a lot of computation power. If you don't have it, try using a batch size of 1, it should help.

Results

After being trained for 1050 epochs, our CNN showed next results:

Train loss: 1.280041337

Train accuracy: 0.9940649271

Test loss: 1.2813329697

Test accuracy: 0.9926757812

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
configs		configs
data		data
figures		figures
logs/4bl_batch		logs/4bl_batch
src		src
utils		utils
weights/4bl_batch		weights/4bl_batch
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Roman digits classification

Introduction

Config

Data structure

Data description and cleaning process

Data augmentation and splitting

Model structure

Blocks 1-3

Block 4

Dense block

Instructions to run the model

Data preparation

Running model

Results

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Roman digits classification

Introduction

Config

Data structure

Data description and cleaning process

Data augmentation and splitting

Model structure

Blocks 1-3

Block 4

Dense block

Instructions to run the model

Data preparation

Running model

Results

Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages