Deep-learning Overview

Constructing, training and tuning multi-layer perceptrons

Back propogation

take error we calculated at end of network and back propagate to tune weights in each connection.
The gradient descent with reverse-mode autodiff used to perform backprogation
How it works:
- for each training step:
  - Compute the output error
  - compute how much each neuron in the previous hidden layer contributed
  - back-propogate that error in a reverse pass
  - Tweak weights to reduce the error using gradient descent

Activation functions (rectifier)

The function that determine the output of the neuron given some of it's input
Step functions don't work with gradient descent because there is no gradient! There is no slop
Alternatives:
- logistic function
- hyperbolic tangent function
- Exponential linear unit(ELU)
- ReLU function (Rectified Linear Unit): ReLU is common. Fast to compute and work wells. There are ReLU variants are Leaky ReLU or Noise ReLU activation functions

Optimisation functions

Gradient descent is the preferred way to optimize neural networks and many other machine learning algorithms and they are often embedded in libraries/tools.

Batch gradient descent
Stochastic gradient descent
Mini-batch gradient descent There are other popular gradient-based optimization algorithms such as Momentum, Adagrad, and Adam actually work.
The standard method used to train mode is gradient descent
however, there are faster learning optimisers that gradient descent
- Momentum Optimisation:
  - Introduce momentum term to the descent, so it slows down as curve start to flatten and speeds up as the slope is steep
- Nesterov Accelerated Gradient:
  - Tweak on momentum optimisation - computes momentum based on the gradient slightly ahead of you, not where you are
- RMSProp:
  - Adaptive learning rate to help point toward the minimum. It sophisticated way to find right direction to minimise error
- Adam:
  - Adaptive moment estimation - momentum + RMSProp combined
  - Popular choice today, easy to use
- AdaMax
- Nadam
- AMSGrad

Avoid Overfitting

When you start adding layers and neuron in a neural networks, number of weights starts adding up. With thousands of weights to tune, overfitting is a problem. There are ways to deal with it

Early stopping, when performance starts dropping
Regularisation terms added to cost function during training
Dropout - which works very well. It ignore says 50% of neurons randomly at each training step. Forces your model to spread our it's learning

Tuning your topology of network

Trial an error
- Evaluate a smaller network with less neurons in the hidden layers
- Evaluate a larger network with more layers
  - try reducing the size of each layer as you progress, make funnel like structure
More layers can yield faster learning
Use more lauers and neurons than you need and dont care. But then you use early stopping
Use model zoos - pretrained neural network models

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.idea		.idea
ANN		ANN
images		images
keras/notebooks		keras/notebooks
project		project
tensorflow		tensorflow
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep-learning Overview

Constructing, training and tuning multi-layer perceptrons

Back propogation

Activation functions (rectifier)

Optimisation functions

Avoid Overfitting

Tuning your topology of network

About

Uh oh!

Releases

Packages

Languages

AnilSorathiya/deep-learning

Folders and files

Latest commit

History

Repository files navigation

Deep-learning Overview

Constructing, training and tuning multi-layer perceptrons

Back propogation

Activation functions (rectifier)

Optimisation functions

Avoid Overfitting

Tuning your topology of network

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages