Skip to content
Manuel Cuevas edited this page Mar 28, 2017 · 12 revisions

Using deep learning to read street signs

This is a convolution neural network with heavy data augmentation. It gets 95.3% on a holdout validation dataset of 4,410 images and 81.2% on the test dataset of 12,630 images

Architecture

INPUT => CONV => ELU => POOL => CONV => ELU CONV => ELU => POOL => FC => ELU => FC

I am using 3 convolution layers with filter size 7x7, 3x3, and 4x4. ELU activations, after every convolution layer and Max pooling layer. After the convolution layers I am using 2 hidden layers with dropout and a 43 output.

Layer Type Channels Size
Input 1 32x32
Convolution 16 7x7
Max pool 32 2x2
Convolution 32 3x3
Convolution 32 1x1
Convolution 64 4x4
Max pool 128 2x2
Fully connected 1024 to 688 -
Dropout - 0.4
Fully connected 688 to 172 -
Dropout - 0.4
Fully connected 172 to 43 -

Data augmentation

Data augmentation is fundamentally important for improving the performance of the networks, it allows the network to learn the important features that are invariant for the object classes, rather than the artifact of the training images. We have explored many different ways of doing augmentation to artificially increase the size of the dataset.

  • Random rotations between -20 and 20 degrees.
  • Zoom factor increase of 1.3.
  • Random Brightness factor
  • Shear
  • Translation

Knowing that traffic signs can dramatically change brightness dues to times of the day, or weather, jittered was used to randomizing brightness tones.

augment

  • Karen Simonyan, Andrew Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition”, link
  • Sebastian Rubers, “An overview of Gradient descent optimization algorithms”, link
  • Ren Wu, Shengen Yan, Hi Shan, Qingqing Dang, Gang Sun, “Deep Image: Scaling up Image Recognition”, link
  • Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks”, link
  • Djork-Arne Clevert, Thomas Unterthiner & Sepp Hochreiter, “Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)”, link
Written on Dec 20, 2016

Clone this wiki locally