-
Notifications
You must be signed in to change notification settings - Fork 32
Basics
In this page, we will explain the basic concepts of neuron and how neuron works.
Neural Network (NN) has received renewed attention in recent years in the area of machine learning, computer vision, and signal processing. The revival of neural network attributes to its flexibilities in designing heterogeneous architectures, customizing objective, and choosing training tricks. There already exist many libraries that partially support deep learning, for example, theano, torch7, etc.
Neuron, as yet another library for deep learning, was (originally) created for my personal research, but also aimed at more general deep learning practitioners. It does not focus on one or two types of special neural network structures, and instead provides more generic mechanism for designing difficult structures and handling heterogeneous data. This means, it may not be computational optimized for certain tasks, such as large-scale CNN for image recognition or so. At a cost, it would be more easy to implement a neural network with recursive topologies, or with hierarchically shared parameters using neuron, as will be explained in the next section.
There are two major aspects to master neuron. One is the separation/orthonormality between neural networks and data flows. The other is the "design-by-run" scheme underpinning the use of neural networks that substantially differs from the popularly used "design-and-run" scheme (policy-based design).
The separation/orthonormality between neural networks and data flows ensures that a neural network is its own an object and maintains its internal states (parameters, and aggregated gradients) without interfering external data flows (data input, caches, output costs etc.) It makes neuron convenient to handle data parallelization, because a neural network can simultaneously process two or more independent data flows.
The "Design-by-Run" scheme is also an important design decision to neuron that has both pros and cons. Most existing neural network library today follows "Design-and-Run" scheme or a declarative approach that first compile a pre-determined network structure and then train against a dataset. In such way, one can pre-allocate any necessary cache space to ensure that there are minimal memory reallocations in training, thus minimize the memory footprint whenever possible and have substantially optimized performance. Meanwhile, such a declarative approach also makes the underlying neural network static during the lifetime of training. The "Design-by-Run" scheme instead treats the neural networks morphologically, i.e. they are composed and (can) deform on-the-fly. At a cost, it will ask for onsite memory allocation whenever needed. Fortunately, Scala will take care of the memory management part and make it mostly invisible to the user.
Neuron currently uses breeze for numerical computations of vectors and matrices. It has two classes, NeuronVector and NeuronMatrix (from neuron.math._), which interface to DenseVector[Double] and DenseMatrix[Double] of breeze and are fed to neuron.core._. See LinearAlgebra.scala for more information.
Note: I made this decision to use an extra interface layer, because I want the neuron.core to be principally independent from its numerical implementations. So it can possibly integrate with other numerical solutions by re-implementing neuron.math.
A user of neuron will be able to compose complex neural networks on his/her own. The fundamental way for understanding how it works is knowing the distinction between template and module, which are NeuralNetwork class and InstanceOfNeuralNetwork class respectively. They both are inherited from Operationable class. (See NeuralNetwork.scala) Typically, the composition of neural networks is under the umbrella of Workspace:
import neuron.core._
object example extends Workspace{
// building complex neural networks ...
}NeuralNetwork class is a template class that defines the type and hyper-parameters of a neural building block. For example, SingleLayerNeuralNetwork specifies the activation function (e.g. sigmoid, tanh, ...) for element-wise nonlinear transform. It has a member function create() which returns a module with prescribed template, which is an instance of given template, as type of InstanceOfNeuralNetwork, that instantiates the internal parameters. For example, LinearNeuralNetwork::create() instantiates the internal weight and bias of a linear NN layer, which are data members of InstanceOfLinearNeuralNetwork class.
The distinction among template and module, and mixtures of them (as we will see), provides the possibilities for sharing parameters and reusing topologies. Let's look at the following example:
// a is template, b, c are modules
val a = new LinearNeuralNetwork(10,5) // inputDimension: 10; outputDimension: 5
val b = new SingleLayerNeuralNetwork(5).create()
val c = new LinearNeuralNetwork(5,10).create()
val d = (c ** b ** a) // type Operationable
val e = (d ++ d).create()
println(e) //print network structures by IDs of neural building blocksHere a serves as a template building block in d, while b and c are instantiated modules in d. Therefore, By duplicating d in building e, the internal parameters of c are shared, while the topological structure of d is reused without instantiating a. Thus, modules b and c are copied by references and the template a is copied in a way that yields two distanctive modules via a.create() by (d ++ d).create(). It is equivalent as follows
val e = ((c ** b ** a.create()) ++ (c ** b ** a.create())).create()We will explain the operators ** and ++ later.
The distinction between template and module recognizes the uniqueness of neuron in the family of deep learning libraries, while most others are founded on modules and provides imperative routines for sharing parameters, such as Torch. Neuron provides the inherent and unique support for sharing parameters and reusing topologies of neural network.
Note: One way to understand why it is so important to distinguish them when building an AI system can be revealed by the following metaphor: Suppose we are building AI systems for decision making. The outcome is a set of knowledge at different abstraction levels. Some knowledge is global (or methodological) that we have to share them no matter what particular situation we tackle. Some knowledge is domain specific that we should consider in solving a domain-restricted problem and properly integrate with the global knowledge. Nevertheless, the procedures of acquiring the domain specific knowledge can be so similar that we don't need to rebuild a brand-new AI system, but to reuse an existing one. (The philosophy is overwhelming in many machine learning models, for example, conditional random fields, context-aware recursive auto-encoder, etc.)
InstanceOfNeuralNetwork provides two major functions that serves for the training procedure of neural network. One is apply() and the other is backpropagate().
-
apply()takes in data (either a samplex: NeuronVectoror a batch:xs: NeuronMatrix) and a cache memory (mem: SetOfMemories) that store the temporal caches for backpropagation purpose. It returns the output of the neural network function. If no backpropagation is needed,memcan be set tonull. -
backpropagate()takes in the gradient data and the caches memory previously returned byapply(). Principally, it does the backpropagation and updates the gradient of internal parameters that are stored internally in the neural building blocks.
The apply() and backpropagation() facilities can be parallelized with data. For example, the following code snapshot from Utilities.scala computes the total cost of regression for a dataset of size points using data parallelization:
var sampleArray = (0 until size).toList.par // yeah! data parallelization is as simple as appending .par
// ...
totalCost = sampleArray.map(i => {
val mem = initMemory(nn)
val x = nn(xData(i), mem); val y = yData(i)
val z = distance.grad(x, yData(i))
nn.backpropagate(z, mem) // update dw !
distance(x,y)
}).reduce(_+_)Here suppose nn is an InstanceOfNeuralNetwork type object (aka, a trainable neural network) that has been composed in the way as described in previous section.
Besides, InstanceOfNeuralNetwork has serial member functions setWeights(), getWeights(), and getDerativeOfWeights() for updating and accessing the internal parameters of network, and has init() and allocate() for customized initialization.
Neuron provides functional operators for composing different networks from neural building blocks (template or module). Typically, the returned type is unspecified, belonging to Operationable class, therefore it always needs to call create() for instantiation before whenever we are to use the neural network. Here are some of them:
| Name | Operations | Math |
|---|---|---|
| PLUS | {f ++ g}(x,y) |
[f(x), g(y)] |
| TIMES | {f ** g}(x) |
f(g(x)) |
| SHARE | {f & g}(x) |
[f(x), g(x)] |
| ADD | {f + g}(x,y) |
f(x) + g(y) |
| MULT | {f * g}(x) |
f(x) ⊙ g(x) (element-wise multiplication) |
| TENSOR | {f \* g}(x,y) |
f(x) ⊗ g(y) (tensor product) |
| mTENSOR | {f \\* g}(x,y) |
[f(x) ⊗ g(y), f(x), g(y)] |
| REPEAT | {f :+ n}(x) |
[f(x), f(x), ... , f(x)] (repeat n times) |
The operators accepts NeuralNetwork, InstanceOfNeuralNetwork, and mixture of them as Operationable. Those operators are all compatible (after instantiation) with the member functions in InstanceOfNeuralNetwork. This provides the convenience that we only need to implement the most elementary building blocks, and complex network (as well as its apply(), backpropagate(), getDerativeOfWeights(), ... etc.) can then be assembled (and derived accordingly) by the operations of them.
In this section, we show how to build a multilayer perceptron (MLP) network by the following two neural building blocks
-
SingleLayerNeuralNetwork: nonlinear activation layer -
LinearNeuralNetwork: linear mapping layer
Unlike the traditional MLP, we design a special parameter sharing structure like
| P0 | P1 | P0 | P2 | ... | P0 | PN |
|---|
val P = new SingleLayerNeuralNetwork(5).create() ** new LinearNeuralNetwork(5, 5)
val P0 = P.create()
val PL: List[Operationable] = (0 until 10).toList.map(i => P ** P0)
println PL.reduce[Operationable](_ ** _).create()For more about types of neural network, See Types of Neural Networks
SingleLayerNeuralNetwork by default is a sigmoid element-wise transformation. But one is capable to customize it by using different activation function which is of class NeuronFunction:
val P = new SingleLayerNeuralNetwork(5, TanhFunction)class NeuronFunction has two major member calls: apply() and grad() that returns the value and gradient at given input vector (or batches), respectively. See Functions.scala
In training neural network, one usually needs to select a loss function and back-propagate the corresponding gradients through neural network. In particular, most of them require the loss function to be in a form of distance between predictor x and known outcome y. Neuron has DistanceFunction class that has apply(x,y), grad(x,y) and applyWithGrad(x,y) member calls that return loss value or/and partial gradient against x. For example, L2Distance is the Euclidean loss, and SoftMaxDistance is the softmax loss (sometimes called cross-entropy loss). See DistanceFunctions.scala
Parameter regularization is crucial for neural network training. There are a bunch of tricks that focus on training neural network with regularization. We describe the basic framework to work with regularization here.
Basically, each neural building block should take care of its own parameter regularization method, which stays invisible to an end user. For example, LinearNeuralNetwork has its regularized variant (sub-class) RegularizedLinearNN whose instantiation returns regularized loss (+ weight decay) and updates gradient by getDerativeOfWeights(): Double. Similarly, SingleLayerNeuralNetwork has its regularized variant (sub-class) SparseSingleLayerNN which enforces sparsity on activations and AutoEncoder can compute regularization by its reconstruction loss. For the conventional setting, regularized neural building block should have a hyper-parameter member, say regCost: Double, to control the strength of regularizations.
Besides energy-based regularization, neuron also provides the training tricks of dropout (DropoutSingleLayerNN class) and maxout (MaxoutSingleLayerNN class).
We also provides some utilities to work with neuron.core._, but keep in mind that they are very experimental.