The goal of this exercise is to implement a multilayer dense neural network using torch.
Type,
pip install -r requirements.txtinto the terminal to install the required software.
Torch takes care of our autograd needs. The documentation is available at https://pytorch.org/docs/stable/index.html. torch.nn provides all the necessary modules for neural network. https://pytorch.org/docs/stable/nn.html hosts the documentation.
To get a notion of how function learning of a dense layer network works on given data, we will first have a look at the example from the lecture. In the following task you will implement gradient descent learning of a dense neural network using torch and use it to learn a function, e.g. a cosine.
-
Open
src/denoise_cosine.pyand go to the__main__function. Look at the code that is already there. You can see that a cosine function with a signal length of$n = 200$ samples has already been created in torch. In the for loop, which will be our train loop, some noise is added to the cosine function withtorch.randn. This will be the noisy signal that the model is supposed to learn the underlying cosine from. -
Recall the definition of the sigmoid function
$\sigma$
-
Implement the
sigmoidfunction insrc/denoise_cosine.py. -
Implement a dense layer in the
netfunction ofsrc/denoise_cosine.py. The function should return
where W_1, W_2 and b. Use numpys @ notation for the matrix product.
-
Use
torch.normalto initialize your weights. This function will sample the values from a normal distribution. To ensure that the weights are not initialized too high, choose a mean of 0 and a standard deviation of 0.5. For a signal length of$200$ the$W_2$ matrix should have e.g. have the shape [200,hidden_neurons] and$W_1$ a shape of [hidden_neurons, 200]. -
Implement and test a squared error cost
-
**denotes squares in Python,torch.sumallows you to sum up all terms. -
Define the forward pass in the
net_costfunction. The forward pass evaluates the network and the cost function. -
Train your network to denoise a cosine. To do so, implement gradient descent on the noisy input signal and use e.g.
torch.grad_and_valueto gradient and compute cost at the same time. Remember the gradient descent update rule
-
In the equation above
$\mathbf{W} \in \mathbb{R}$ holds for weight matrices and biases$\epsilon$ denotes the step size and$\delta$ the gradient operation with respect to the following weight. Use the loop to repeat weight updates for multiple operations. Try to train for one hundred updates. -
At last, compute the network output
y_haton the final values to see if the network learned the underlying cosine function. Usematplotlib.pyplot.plotto plot the noisy signal and the network output$\mathbf{o}$ . -
Test your code with
nox -r -s testand run the script withpython ./src/denoise_cosine.pyor by pressingCtrl + F5in Vscode.
In this task we will go one step further. Instead of a cosine function, our neural network will learn how to identify handwritten digits from the MNSIT dataset. For that, we will be using the torch.nn module. To get started familiarize yourself with the torch.nn to train a fully connected network in src/mnist.py. In this script, some functions are already implemented and can be reused. Broadcasting is an elegant way to deal with data batches (Torch takes care of this for us). This task aims to compute gradients and update steps for all batches in the list. If you are coding on bender the function matplotlib.pyplot.show doesn't work if you are not connected to the X server of bender. Use e.g. plt.savefig to save the figure and view it in vscode.
- Implement the
normalize_batchfunction to ensure approximate standard-normal inputs. Make use of handy torch inbuilt methods. Normalization requires subtraction of the mean and division by the standard deviation with$i = 1, \dots w$ and$j = 1, \dots h$ with$w$ the image width and$h$ the image height and$k$ running through the batch dimension:
-
The forward step requires the
Netobject from its class. It is your fully connected neural network model. Implement a dense network inNetof your choosing using a combination oftorch.nn.Linearandth.nn.ReLUorth.nn.Sigmoid -
In
Netclass additionally, implement theforwardfunction to compute the network forwad pass. -
Write a
cross_entropycost function with$n_o$ the number of labels and$n_b$ in the batched case using
-
If you have chosen to work with ten output neurons. Use
torch.nn.functional.one_hotto encode the labels. -
Next we want to be able to do an optimization step with stochastic gradient descent (sgd). Implement
sgd_step. One way to do this is to iterate overmodel.parameters()and update each parameter individually with its gradient. One can access the gradient for each parameter with<param>.grad. -
To evaluate the network we calculate the accuracy of the network output. Implement
get_accto calculate the accuracy given a dataloader containing batches of images and corresponding labels. More about dataloaders is available here. -
Now is the time to move back to the main procedure. First, the train data is fetched via the torchvision
torchvision.MNIST. To be able to evaluate the network while it is being trained, we use a validation set. Here the train set is split into two disjoint sets: the training and the validation set usingtorch.utils.data.random_split. -
Initialize the network with the
Netobject (see thetorchdocumentation for help). -
Train your network for a fixed number of
EPOCHSover the entire dataset. Major steps in training loop include normalizing inputs, model prediction, loss calculation,.backward()over loss to compute gradients,sgd_stepandzero_grad. Validate model once per epoch. -
When model is trained, load the test data with
test_loaderand calculate the test accuracy. -
Optional: Plot the training and validation accuracies and add the test accuracy in the end.