diff --git a/examples/vision/forwardforward.py b/examples/vision/forwardforward.py new file mode 100644 index 0000000000..10041b36b0 --- /dev/null +++ b/examples/vision/forwardforward.py @@ -0,0 +1,429 @@ +""" +Title: Using Forward-Forward Algorithm for Image Classification +Author: [Suvaditya Mukherjee](https://twitter.com/halcyonrayes) +Date created: 2023/01/08 +Last modified: 2023/01/08 +Description: Training a Dense-layer model using the Forward-Forward algorithm. +Accelerator: GPU +""" + +""" +## Introduction + +The following example explores how to use the Forward-Forward algorithm to perform +training instead of the traditionally-used method of backpropagation, as proposed by +Hinton in +[The Forward-Forward Algorithm: Some Preliminary Investigations](https://www.cs.toronto.edu/~hinton/FFA13.pdf) +(2022). + +The concept was inspired by the understanding behind +[Boltzmann Machines](http://www.cs.toronto.edu/~fritz/absps/dbm.pdf). Backpropagation +involves calculating the difference between actual and predicted output via a cost +function to adjust network weights. On the other hand, the FF Algorithm suggests the +analogy of neurons which get "excited" based on looking at a certain recognized +combination of an image and its correct corresponding label. + +This method takes certain inspiration from the biological learning process that occurs in +the cortex. A significant advantage that this method brings is the fact that +backpropagation through the network does not need to be performed anymore, and that +weight updates are local to the layer itself. + +As this is yet still an experimental method, it does not yield state-of-the-art results. +But with proper tuning, it is supposed to come close to the same. +Through this example, we will examine a process that allows us to implement the +Forward-Forward algorithm within the layers themselves, instead of the traditional method +of relying on the global loss functions and optimizers. + +The tutorial is structured as follows: + +- Perform necessary imports +- Load the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) +- Visualize Random samples from the MNIST dataset +- Define a `FFDense` Layer to override `call` and implement a custom `forwardforward` +method which performs weight updates. +- Define a `FFNetwork` Layer to override `train_step`, `predict` and implement 2 custom +functions for per-sample prediction and overlaying labels +- Convert MNIST from `NumPy` arrays to `tf.data.Dataset` +- Fit the network +- Visualize results +- Perform inference on test samples + +As this example requires the customization of certain core functions with +`keras.layers.Layer` and `keras.models.Model`, refer to the following resources for +a primer on how to do so: + +- [Customizing what happens in `model.fit()`](https://www.tensorflow.org/guide/keras/customizing_what_happens_in_fit) +- [Making new Layers and Models via subclassing](https://www.tensorflow.org/guide/keras/custom_layers_and_models) +""" + +""" +## Setup imports +""" + +import tensorflow as tf +from tensorflow import keras +import numpy as np +import matplotlib.pyplot as plt +from sklearn.metrics import accuracy_score +import random +from tensorflow.compiler.tf2xla.python import xla + +""" +## Load the dataset and visualize the data + +We use the `keras.datasets.mnist.load_data()` utility to directly pull the MNIST dataset +in the form of `NumPy` arrays. We then arrange it in the form of the train and test +splits. + +Following loading the dataset, we select 4 random samples from within the training set +and visualize them using `matplotlib.pyplot`. +""" + +(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data() + +print("4 Random Training samples and labels") +idx1, idx2, idx3, idx4 = random.sample(range(0, x_train.shape[0]), 4) + +img1 = (x_train[idx1], y_train[idx1]) +img2 = (x_train[idx2], y_train[idx2]) +img3 = (x_train[idx3], y_train[idx3]) +img4 = (x_train[idx4], y_train[idx4]) + +imgs = [img1, img2, img3, img4] + +plt.figure(figsize=(10, 10)) + +for idx, item in enumerate(imgs): + image, label = item[0], item[1] + plt.subplot(2, 2, idx + 1) + plt.imshow(image, cmap="gray") + plt.title(f"Label : {label}") +plt.show() + +""" +## Define `FFDense` custom layer + +In this custom layer, we have a base `keras.layers.Dense` object which acts as the +base `Dense` layer within. Since weight updates will happen within the layer itself, we +add an `keras.optimizers.Optimizer` object that is accepted from the user. Here, we +use `Adam` as our optimizer with a rather higher learning rate of `0.03`. + +Following the algorithm's specifics, we must set a `threshold` parameter that will be +used to make the positive-negative decision in each prediction. This is set to a default +of 2.0. +As the epochs are localized to the layer itself, we also set a `num_epochs` parameter +(defaults to 50). + +We override the `call` method in order to perform a normalization over the complete +input space followed by running it through the base `Dense` layer as would happen in a +normal `Dense` layer call. + +We implement the Forward-Forward algorithm which accepts 2 kinds of input tensors, each +representing the positive and negative samples respectively. We write a custom training +loop here with the use of `tf.GradientTape()`, within which we calculate a loss per +sample by taking the distance of the prediction from the threshold to understand the +error and taking its mean to get a `mean_loss` metric. + +With the help of `tf.GradientTape()` we calculate the gradient updates for the trainable +base `Dense` layer and apply them using the layer's local optimizer. + +Finally, we return the `call` result as the `Dense` results of the positive and negative +samples while also returning the last `mean_loss` metric and all the loss values over a +certain all-epoch run. +""" + + +class FFDense(keras.layers.Layer): + """ + A custom ForwardForward-enabled Dense layer. It has an implementation of the + Forward-Forward network internally for use. + This layer must be used in conjunction with the `FFNetwork` model. + """ + + def __init__( + self, + units, + optimizer, + loss_metric, + num_epochs=50, + use_bias=True, + kernel_initializer="glorot_uniform", + bias_initializer="zeros", + kernel_regularizer=None, + bias_regularizer=None, + **kwargs, + ): + super().__init__(**kwargs) + self.dense = keras.layers.Dense( + units=units, + use_bias=use_bias, + kernel_initializer=kernel_initializer, + bias_initializer=bias_initializer, + kernel_regularizer=kernel_regularizer, + bias_regularizer=bias_regularizer, + ) + self.relu = keras.layers.ReLU() + self.optimizer = optimizer + self.loss_metric = loss_metric + self.threshold = 1.5 + self.num_epochs = num_epochs + + # We perform a normalization step before we run the input through the Dense + # layer. + + def call(self, x): + x_norm = tf.norm(x, ord=2, axis=1, keepdims=True) + x_norm = x_norm + 1e-4 + x_dir = x / x_norm + res = self.dense(x_dir) + return self.relu(res) + + # The Forward-Forward algorithm is below. We first perform the Dense-layer + # operation and then get a Mean Square value for all positive and negative + # samples respectively. + # The custom loss function finds the distance between the Mean-squared + # result and the threshold value we set (a hyperparameter) that will define + # whether the prediction is positive or negative in nature. Once the loss is + # calculated, we get a mean across the entire batch combined and perform a + # gradient calculation and optimization step. This does not technically + # qualify as backpropagation since there is no gradient being + # sent to any previous layer and is completely local in nature. + + def forward_forward(self, x_pos, x_neg): + for i in range(self.num_epochs): + with tf.GradientTape() as tape: + g_pos = tf.math.reduce_mean(tf.math.pow(self.call(x_pos), 2), 1) + g_neg = tf.math.reduce_mean(tf.math.pow(self.call(x_neg), 2), 1) + + loss = tf.math.log( + 1 + + tf.math.exp( + tf.concat([-g_pos + self.threshold, g_neg - self.threshold], 0) + ) + ) + mean_loss = tf.cast(tf.math.reduce_mean(loss), tf.float32) + self.loss_metric.update_state([mean_loss]) + gradients = tape.gradient(mean_loss, self.dense.trainable_weights) + self.optimizer.apply_gradients(zip(gradients, self.dense.trainable_weights)) + return ( + tf.stop_gradient(self.call(x_pos)), + tf.stop_gradient(self.call(x_neg)), + self.loss_metric.result(), + ) + + +""" +## Define the `FFNetwork` Custom Model + +With our custom layer defined, we also need to override the `train_step` method and +define a custom `keras.models.Model` that works with our `FFDense` layer. + +For this algorithm, we must 'embed' the labels onto the original image. To do so, we +exploit the structure of MNIST images where the top-left 10 pixels are always zeros. We +use that as a label space in order to visually one-hot-encode the labels within the image +itself. This action is performed by the `overlay_y_on_x` function. + +We break down the prediction function with a per-sample prediction function which is then +called over the entire test set by the overriden `predict()` function. The prediction is +performed here with the help of measuring the `excitation` of the neurons per layer for +each image. This is then summed over all layers to calculate a network-wide 'goodness +score'. The label with the highest 'goodness score' is then chosen as the sample +prediction. + +The `train_step` function is overriden to act as the main controlling loop for running +training on each layer as per the number of epochs per layer. +""" + + +class FFNetwork(keras.Model): + """ + A `keras.Model` that supports a `FFDense` network creation. This model + can work for any kind of classification task. It has an internal + implementation with some details specific to the MNIST dataset which can be + changed as per the use-case. + """ + + # Since each layer runs gradient-calculation and optimization locally, each + # layer has its own optimizer that we pass. As a standard choice, we pass + # the `Adam` optimizer with a default learning rate of 0.03 as that was + # found to be the best rate after experimentation. + # Loss is tracked using `loss_var` and `loss_count` variables. + + def __init__( + self, dims, layer_optimizer=keras.optimizers.Adam(learning_rate=0.03), **kwargs + ): + super().__init__(**kwargs) + self.layer_optimizer = layer_optimizer + self.loss_var = tf.Variable(0.0, trainable=False, dtype=tf.float32) + self.loss_count = tf.Variable(0.0, trainable=False, dtype=tf.float32) + self.layer_list = [keras.Input(shape=(dims[0],))] + for d in range(len(dims) - 1): + self.layer_list += [ + FFDense( + dims[d + 1], + optimizer=self.layer_optimizer, + loss_metric=keras.metrics.Mean(), + ) + ] + + # This function makes a dynamic change to the image wherein the labels are + # put on top of the original image (for this example, as MNIST has 10 + # unique labels, we take the top-left corner's first 10 pixels). This + # function returns the original data tensor with the first 10 pixels being + # a pixel-based one-hot representation of the labels. + + @tf.function(reduce_retracing=True) + def overlay_y_on_x(self, data): + X_sample, y_sample = data + max_sample = tf.reduce_max(X_sample, axis=0, keepdims=True) + max_sample = tf.cast(max_sample, dtype=tf.float64) + X_zeros = tf.zeros([10], dtype=tf.float64) + X_update = xla.dynamic_update_slice(X_zeros, max_sample, [y_sample]) + X_sample = xla.dynamic_update_slice(X_sample, X_update, [0]) + return X_sample, y_sample + + # A custom `predict_one_sample` performs predictions by passing the images + # through the network, measures the results produced by each layer (i.e. + # how high/low the output values are with respect to the set threshold for + # each label) and then simply finding the label with the highest values. + # In such a case, the images are tested for their 'goodness' with all + # labels. + + @tf.function(reduce_retracing=True) + def predict_one_sample(self, x): + goodness_per_label = [] + x = tf.reshape(x, [tf.shape(x)[0] * tf.shape(x)[1]]) + for label in range(10): + h, label = self.overlay_y_on_x(data=(x, label)) + h = tf.reshape(h, [-1, tf.shape(h)[0]]) + goodness = [] + for layer_idx in range(1, len(self.layer_list)): + layer = self.layer_list[layer_idx] + h = layer(h) + goodness += [tf.math.reduce_mean(tf.math.pow(h, 2), 1)] + goodness_per_label += [ + tf.expand_dims(tf.reduce_sum(goodness, keepdims=True), 1) + ] + goodness_per_label = tf.concat(goodness_per_label, 1) + return tf.cast(tf.argmax(goodness_per_label, 1), tf.float64) + + def predict(self, data): + x = data + preds = list() + preds = tf.map_fn(fn=self.predict_one_sample, elems=x) + return np.asarray(preds, dtype=int) + + # This custom `train_step` function overrides the internal `train_step` + # implementation. We take all the input image tensors, flatten them and + # subsequently produce positive and negative samples on the images. + # A positive sample is an image that has the right label encoded on it with + # the `overlay_y_on_x` function. A negative sample is an image that has an + # erroneous label present on it. + # With the samples ready, we pass them through each `FFLayer` and perform + # the Forward-Forward computation on it. The returned loss is the final + # loss value over all the layers. + + @tf.function(jit_compile=True) + def train_step(self, data): + x, y = data + + # Flatten op + x = tf.reshape(x, [-1, tf.shape(x)[1] * tf.shape(x)[2]]) + + x_pos, y = tf.map_fn(fn=self.overlay_y_on_x, elems=(x, y)) + + random_y = tf.random.shuffle(y) + x_neg, y = tf.map_fn(fn=self.overlay_y_on_x, elems=(x, random_y)) + + h_pos, h_neg = x_pos, x_neg + + for idx, layer in enumerate(self.layers): + if isinstance(layer, FFDense): + print(f"Training layer {idx+1} now : ") + h_pos, h_neg, loss = layer.forward_forward(h_pos, h_neg) + self.loss_var.assign_add(loss) + self.loss_count.assign_add(1.0) + else: + print(f"Passing layer {idx+1} now : ") + x = layer(x) + mean_res = tf.math.divide(self.loss_var, self.loss_count) + return {"FinalLoss": mean_res} + + +""" +## Convert MNIST `NumPy` arrays to `tf.data.Dataset` + +We now perform some preliminary processing on the `NumPy` arrays and then convert them +into the `tf.data.Dataset` format which allows for optimized loading. +""" + +x_train = x_train.astype(float) / 255 +x_test = x_test.astype(float) / 255 +y_train = y_train.astype(int) +y_test = y_test.astype(int) + +train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)) +test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)) + +train_dataset = train_dataset.batch(60000) +test_dataset = test_dataset.batch(10000) + +""" +## Fit the network and visualize results + +Having performed all previous set-up, we are now going to run `model.fit()` and run 250 +model epochs, which will perform 50*250 epochs on each layer. We get to see the plotted loss +curve as each layer is trained. +""" + +model = FFNetwork(dims=[784, 500, 500]) + +model.compile( + optimizer=keras.optimizers.Adam(learning_rate=0.03), + loss="mse", + jit_compile=True, + metrics=[keras.metrics.Mean()], +) + +epochs = 250 +history = model.fit(train_dataset, epochs=epochs) + +""" +## Perform inference and testing + +Having trained the model to a large extent, we now see how it performs on the +test set. We calculate the Accuracy Score to understand the results closely. +""" + +preds = model.predict(tf.convert_to_tensor(x_test)) + +preds = preds.reshape((preds.shape[0], preds.shape[1])) + +results = accuracy_score(preds, y_test) + +print(f"Test Accuracy score : {results*100}%") + +plt.plot(range(len(history.history["FinalLoss"])), history.history["FinalLoss"]) +plt.title("Loss over training") +plt.show() + +""" +## Conclusion + +This example has hereby demonstrated how the Forward-Forward algorithm works using +the TensorFlow and Keras packages. While the investigation results presented by Prof. Hinton +in their paper are currently still limited to smaller models and datasets like MNIST and +Fashion-MNIST, subsequent results on larger models like LLMs are expected in future +papers. + +Through the paper, Prof. Hinton has reported results of 1.36% test accuracy error with a +2000-units, 4 hidden-layer, fully-connected network run over 60 epochs (while mentioning +that backpropagation takes only 20 epochs to achieve similar performance). Another run of +doubling the learning rate and training for 40 epochs yields a slightly worse error rate +of 1.46% + +The current example does not yield state-of-the-art results. But with proper tuning of +the Learning Rate, model architecture (number of units in `Dense` layers, kernel +activations, initializations, regularization etc.), the results can be improved +to match the claims of the paper. +""" diff --git a/examples/vision/img/forwardforward/forwardforward_15_1.png b/examples/vision/img/forwardforward/forwardforward_15_1.png new file mode 100644 index 0000000000..0e95f8e1b8 Binary files /dev/null and b/examples/vision/img/forwardforward/forwardforward_15_1.png differ diff --git a/examples/vision/img/forwardforward/forwardforward_5_1.png b/examples/vision/img/forwardforward/forwardforward_5_1.png new file mode 100644 index 0000000000..8d7b00b6c0 Binary files /dev/null and b/examples/vision/img/forwardforward/forwardforward_5_1.png differ diff --git a/examples/vision/ipynb/forwardforward.ipynb b/examples/vision/ipynb/forwardforward.ipynb new file mode 100644 index 0000000000..9cf7183980 --- /dev/null +++ b/examples/vision/ipynb/forwardforward.ipynb @@ -0,0 +1,576 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "# Using Forward-Forward Algorithm for Image Classification\n", + "\n", + "**Author:** [Suvaditya Mukherjee](https://twitter.com/halcyonrayes)
\n", + "**Date created:** 2023/01/08
\n", + "**Last modified:** 2023/01/08
\n", + "**Description:** Training a Dense-layer model using the Forward-Forward algorithm." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "## Introduction\n", + "\n", + "The following example explores how to use the Forward-Forward algorithm to perform\n", + "training instead of the traditionally-used method of backpropagation, as proposed by\n", + "Hinton in\n", + "[The Forward-Forward Algorithm: Some Preliminary Investigations](https://www.cs.toronto.edu/~hinton/FFA13.pdf)\n", + "(2022).\n", + "\n", + "The concept was inspired by the understanding behind\n", + "[Boltzmann Machines](http://www.cs.toronto.edu/~fritz/absps/dbm.pdf). Backpropagation\n", + "involves calculating the difference between actual and predicted output via a cost\n", + "function to adjust network weights. On the other hand, the FF Algorithm suggests the\n", + "analogy of neurons which get \"excited\" based on looking at a certain recognized\n", + "combination of an image and its correct corresponding label.\n", + "\n", + "This method takes certain inspiration from the biological learning process that occurs in\n", + "the cortex. A significant advantage that this method brings is the fact that\n", + "backpropagation through the network does not need to be performed anymore, and that\n", + "weight updates are local to the layer itself.\n", + "\n", + "As this is yet still an experimental method, it does not yield state-of-the-art results.\n", + "But with proper tuning, it is supposed to come close to the same.\n", + "Through this example, we will examine a process that allows us to implement the\n", + "Forward-Forward algorithm within the layers themselves, instead of the traditional method\n", + "of relying on the global loss functions and optimizers.\n", + "\n", + "The tutorial is structured as follows:\n", + "\n", + "- Perform necessary imports\n", + "- Load the [MNIST dataset](http://yann.lecun.com/exdb/mnist/)\n", + "- Visualize Random samples from the MNIST dataset\n", + "- Define a `FFDense` Layer to override `call` and implement a custom `forwardforward`\n", + "method which performs weight updates.\n", + "- Define a `FFNetwork` Layer to override `train_step`, `predict` and implement 2 custom\n", + "functions for per-sample prediction and overlaying labels\n", + "- Convert MNIST from `NumPy` arrays to `tf.data.Dataset`\n", + "- Fit the network\n", + "- Visualize results\n", + "- Perform inference on test samples\n", + "\n", + "As this example requires the customization of certain core functions with\n", + "`keras.layers.Layer` and `keras.models.Model`, refer to the following resources for\n", + "a primer on how to do so:\n", + "\n", + "- [Customizing what happens in `model.fit()`](https://www.tensorflow.org/guide/keras/customizing_what_happens_in_fit)\n", + "- [Making new Layers and Models via subclassing](https://www.tensorflow.org/guide/keras/custom_layers_and_models)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "## Setup imports" + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "from tensorflow import keras\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.metrics import accuracy_score\n", + "import random\n", + "from tensorflow.compiler.tf2xla.python import xla" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "## Load the dataset and visualize the data\n", + "\n", + "We use the `keras.datasets.mnist.load_data()` utility to directly pull the MNIST dataset\n", + "in the form of `NumPy` arrays. We then arrange it in the form of the train and test\n", + "splits.\n", + "\n", + "Following loading the dataset, we select 4 random samples from within the training set\n", + "and visualize them using `matplotlib.pyplot`." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n", + "\n", + "print(\"4 Random Training samples and labels\")\n", + "idx1, idx2, idx3, idx4 = random.sample(range(0, x_train.shape[0]), 4)\n", + "\n", + "img1 = (x_train[idx1], y_train[idx1])\n", + "img2 = (x_train[idx2], y_train[idx2])\n", + "img3 = (x_train[idx3], y_train[idx3])\n", + "img4 = (x_train[idx4], y_train[idx4])\n", + "\n", + "imgs = [img1, img2, img3, img4]\n", + "\n", + "plt.figure(figsize=(10, 10))\n", + "\n", + "for idx, item in enumerate(imgs):\n", + " image, label = item[0], item[1]\n", + " plt.subplot(2, 2, idx + 1)\n", + " plt.imshow(image, cmap=\"gray\")\n", + " plt.title(f\"Label : {label}\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "## Define `FFDense` custom layer\n", + "\n", + "In this custom layer, we have a base `keras.layers.Dense` object which acts as the\n", + "base `Dense` layer within. Since weight updates will happen within the layer itself, we\n", + "add an `keras.optimizers.Optimizer` object that is accepted from the user. Here, we\n", + "use `Adam` as our optimizer with a rather higher learning rate of `0.03`.\n", + "\n", + "Following the algorithm's specifics, we must set a `threshold` parameter that will be\n", + "used to make the positive-negative decision in each prediction. This is set to a default\n", + "of 2.0.\n", + "As the epochs are localized to the layer itself, we also set a `num_epochs` parameter\n", + "(defaults to 50).\n", + "\n", + "We override the `call` method in order to perform a normalization over the complete\n", + "input space followed by running it through the base `Dense` layer as would happen in a\n", + "normal `Dense` layer call.\n", + "\n", + "We implement the Forward-Forward algorithm which accepts 2 kinds of input tensors, each\n", + "representing the positive and negative samples respectively. We write a custom training\n", + "loop here with the use of `tf.GradientTape()`, within which we calculate a loss per\n", + "sample by taking the distance of the prediction from the threshold to understand the\n", + "error and taking its mean to get a `mean_loss` metric.\n", + "\n", + "With the help of `tf.GradientTape()` we calculate the gradient updates for the trainable\n", + "base `Dense` layer and apply them using the layer's local optimizer.\n", + "\n", + "Finally, we return the `call` result as the `Dense` results of the positive and negative\n", + "samples while also returning the last `mean_loss` metric and all the loss values over a\n", + "certain all-epoch run." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "\n", + "class FFDense(keras.layers.Layer):\n", + " \"\"\"\n", + " A custom ForwardForward-enabled Dense layer. It has an implementation of the\n", + " Forward-Forward network internally for use.\n", + " This layer must be used in conjunction with the `FFNetwork` model.\n", + " \"\"\"\n", + "\n", + " def __init__(\n", + " self,\n", + " units,\n", + " optimizer,\n", + " loss_metric,\n", + " num_epochs=50,\n", + " use_bias=True,\n", + " kernel_initializer=\"glorot_uniform\",\n", + " bias_initializer=\"zeros\",\n", + " kernel_regularizer=None,\n", + " bias_regularizer=None,\n", + " **kwargs,\n", + " ):\n", + " super().__init__(**kwargs)\n", + " self.dense = keras.layers.Dense(\n", + " units=units,\n", + " use_bias=use_bias,\n", + " kernel_initializer=kernel_initializer,\n", + " bias_initializer=bias_initializer,\n", + " kernel_regularizer=kernel_regularizer,\n", + " bias_regularizer=bias_regularizer,\n", + " )\n", + " self.relu = keras.layers.ReLU()\n", + " self.optimizer = optimizer\n", + " self.loss_metric = loss_metric\n", + " self.threshold = 1.5\n", + " self.num_epochs = num_epochs\n", + "\n", + " # We perform a normalization step before we run the input through the Dense\n", + " # layer.\n", + "\n", + " def call(self, x):\n", + " x_norm = tf.norm(x, ord=2, axis=1, keepdims=True)\n", + " x_norm = x_norm + 1e-4\n", + " x_dir = x / x_norm\n", + " res = self.dense(x_dir)\n", + " return self.relu(res)\n", + "\n", + " # The Forward-Forward algorithm is below. We first perform the Dense-layer\n", + " # operation and then get a Mean Square value for all positive and negative\n", + " # samples respectively.\n", + " # The custom loss function finds the distance between the Mean-squared\n", + " # result and the threshold value we set (a hyperparameter) that will define\n", + " # whether the prediction is positive or negative in nature. Once the loss is\n", + " # calculated, we get a mean across the entire batch combined and perform a\n", + " # gradient calculation and optimization step. This does not technically\n", + " # qualify as backpropagation since there is no gradient being\n", + " # sent to any previous layer and is completely local in nature.\n", + "\n", + " def forward_forward(self, x_pos, x_neg):\n", + " for i in range(self.num_epochs):\n", + " with tf.GradientTape() as tape:\n", + " g_pos = tf.math.reduce_mean(tf.math.pow(self.call(x_pos), 2), 1)\n", + " g_neg = tf.math.reduce_mean(tf.math.pow(self.call(x_neg), 2), 1)\n", + "\n", + " loss = tf.math.log(\n", + " 1\n", + " + tf.math.exp(\n", + " tf.concat([-g_pos + self.threshold, g_neg - self.threshold], 0)\n", + " )\n", + " )\n", + " mean_loss = tf.cast(tf.math.reduce_mean(loss), tf.float32)\n", + " self.loss_metric.update_state([mean_loss])\n", + " gradients = tape.gradient(mean_loss, self.dense.trainable_weights)\n", + " self.optimizer.apply_gradients(zip(gradients, self.dense.trainable_weights))\n", + " return (\n", + " tf.stop_gradient(self.call(x_pos)),\n", + " tf.stop_gradient(self.call(x_neg)),\n", + " self.loss_metric.result(),\n", + " )\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "## Define the `FFNetwork` Custom Model\n", + "\n", + "With our custom layer defined, we also need to override the `train_step` method and\n", + "define a custom `keras.models.Model` that works with our `FFDense` layer.\n", + "\n", + "For this algorithm, we must 'embed' the labels onto the original image. To do so, we\n", + "exploit the structure of MNIST images where the top-left 10 pixels are always zeros. We\n", + "use that as a label space in order to visually one-hot-encode the labels within the image\n", + "itself. This action is performed by the `overlay_y_on_x` function.\n", + "\n", + "We break down the prediction function with a per-sample prediction function which is then\n", + "called over the entire test set by the overriden `predict()` function. The prediction is\n", + "performed here with the help of measuring the `excitation` of the neurons per layer for\n", + "each image. This is then summed over all layers to calculate a network-wide 'goodness\n", + "score'. The label with the highest 'goodness score' is then chosen as the sample\n", + "prediction.\n", + "\n", + "The `train_step` function is overriden to act as the main controlling loop for running\n", + "training on each layer as per the number of epochs per layer." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "\n", + "class FFNetwork(keras.Model):\n", + " \"\"\"\n", + " A `keras.Model` that supports a `FFDense` network creation. This model\n", + " can work for any kind of classification task. It has an internal\n", + " implementation with some details specific to the MNIST dataset which can be\n", + " changed as per the use-case.\n", + " \"\"\"\n", + "\n", + " # Since each layer runs gradient-calculation and optimization locally, each\n", + " # layer has its own optimizer that we pass. As a standard choice, we pass\n", + " # the `Adam` optimizer with a default learning rate of 0.03 as that was\n", + " # found to be the best rate after experimentation.\n", + " # Loss is tracked using `loss_var` and `loss_count` variables.\n", + "\n", + " def __init__(\n", + " self, dims, layer_optimizer=keras.optimizers.Adam(learning_rate=0.03), **kwargs\n", + " ):\n", + " super().__init__(**kwargs)\n", + " self.layer_optimizer = layer_optimizer\n", + " self.loss_var = tf.Variable(0.0, trainable=False, dtype=tf.float32)\n", + " self.loss_count = tf.Variable(0.0, trainable=False, dtype=tf.float32)\n", + " self.layer_list = [keras.Input(shape=(dims[0],))]\n", + " for d in range(len(dims) - 1):\n", + " self.layer_list += [\n", + " FFDense(\n", + " dims[d + 1],\n", + " optimizer=self.layer_optimizer,\n", + " loss_metric=keras.metrics.Mean(),\n", + " )\n", + " ]\n", + "\n", + " # This function makes a dynamic change to the image wherein the labels are\n", + " # put on top of the original image (for this example, as MNIST has 10\n", + " # unique labels, we take the top-left corner's first 10 pixels). This\n", + " # function returns the original data tensor with the first 10 pixels being\n", + " # a pixel-based one-hot representation of the labels.\n", + "\n", + " @tf.function(reduce_retracing=True)\n", + " def overlay_y_on_x(self, data):\n", + " X_sample, y_sample = data\n", + " max_sample = tf.reduce_max(X_sample, axis=0, keepdims=True)\n", + " max_sample = tf.cast(max_sample, dtype=tf.float64)\n", + " X_zeros = tf.zeros([10], dtype=tf.float64)\n", + " X_update = xla.dynamic_update_slice(X_zeros, max_sample, [y_sample])\n", + " X_sample = xla.dynamic_update_slice(X_sample, X_update, [0])\n", + " return X_sample, y_sample\n", + "\n", + " # A custom `predict_one_sample` performs predictions by passing the images\n", + " # through the network, measures the results produced by each layer (i.e.\n", + " # how high/low the output values are with respect to the set threshold for\n", + " # each label) and then simply finding the label with the highest values.\n", + " # In such a case, the images are tested for their 'goodness' with all\n", + " # labels.\n", + "\n", + " @tf.function(reduce_retracing=True)\n", + " def predict_one_sample(self, x):\n", + " goodness_per_label = []\n", + " x = tf.reshape(x, [tf.shape(x)[0] * tf.shape(x)[1]])\n", + " for label in range(10):\n", + " h, label = self.overlay_y_on_x(data=(x, label))\n", + " h = tf.reshape(h, [-1, tf.shape(h)[0]])\n", + " goodness = []\n", + " for layer_idx in range(1, len(self.layer_list)):\n", + " layer = self.layer_list[layer_idx]\n", + " h = layer(h)\n", + " goodness += [tf.math.reduce_mean(tf.math.pow(h, 2), 1)]\n", + " goodness_per_label += [\n", + " tf.expand_dims(tf.reduce_sum(goodness, keepdims=True), 1)\n", + " ]\n", + " goodness_per_label = tf.concat(goodness_per_label, 1)\n", + " return tf.cast(tf.argmax(goodness_per_label, 1), tf.float64)\n", + "\n", + " def predict(self, data):\n", + " x = data\n", + " preds = list()\n", + " preds = tf.map_fn(fn=self.predict_one_sample, elems=x)\n", + " return np.asarray(preds, dtype=int)\n", + "\n", + " # This custom `train_step` function overrides the internal `train_step`\n", + " # implementation. We take all the input image tensors, flatten them and\n", + " # subsequently produce positive and negative samples on the images.\n", + " # A positive sample is an image that has the right label encoded on it with\n", + " # the `overlay_y_on_x` function. A negative sample is an image that has an\n", + " # erroneous label present on it.\n", + " # With the samples ready, we pass them through each `FFLayer` and perform\n", + " # the Forward-Forward computation on it. The returned loss is the final\n", + " # loss value over all the layers.\n", + "\n", + " @tf.function(jit_compile=True)\n", + " def train_step(self, data):\n", + " x, y = data\n", + "\n", + " # Flatten op\n", + " x = tf.reshape(x, [-1, tf.shape(x)[1] * tf.shape(x)[2]])\n", + "\n", + " x_pos, y = tf.map_fn(fn=self.overlay_y_on_x, elems=(x, y))\n", + "\n", + " random_y = tf.random.shuffle(y)\n", + " x_neg, y = tf.map_fn(fn=self.overlay_y_on_x, elems=(x, random_y))\n", + "\n", + " h_pos, h_neg = x_pos, x_neg\n", + "\n", + " for idx, layer in enumerate(self.layers):\n", + " if isinstance(layer, FFDense):\n", + " print(f\"Training layer {idx+1} now : \")\n", + " h_pos, h_neg, loss = layer.forward_forward(h_pos, h_neg)\n", + " self.loss_var.assign_add(loss)\n", + " self.loss_count.assign_add(1.0)\n", + " else:\n", + " print(f\"Passing layer {idx+1} now : \")\n", + " x = layer(x)\n", + " mean_res = tf.math.divide(self.loss_var, self.loss_count)\n", + " return {\"FinalLoss\": mean_res}\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "## Convert MNIST `NumPy` arrays to `tf.data.Dataset`\n", + "\n", + "We now perform some preliminary processing on the `NumPy` arrays and then convert them\n", + "into the `tf.data.Dataset` format which allows for optimized loading." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "x_train = x_train.astype(float) / 255\n", + "x_test = x_test.astype(float) / 255\n", + "y_train = y_train.astype(int)\n", + "y_test = y_test.astype(int)\n", + "\n", + "train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))\n", + "test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))\n", + "\n", + "train_dataset = train_dataset.batch(60000)\n", + "test_dataset = test_dataset.batch(10000)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "## Fit the network and visualize results\n", + "\n", + "Having performed all previous set-up, we are now going to run `model.fit()` and run 250\n", + "model epochs, which will perform 50*250 epochs on each layer. We get to see the plotted loss\n", + "curve as each layer is trained." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "model = FFNetwork(dims=[784, 500, 500])\n", + "\n", + "model.compile(\n", + " optimizer=keras.optimizers.Adam(learning_rate=0.03),\n", + " loss=\"mse\",\n", + " jit_compile=True,\n", + " metrics=[keras.metrics.Mean()],\n", + ")\n", + "\n", + "epochs = 250\n", + "history = model.fit(train_dataset, epochs=epochs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "## Perform inference and testing\n", + "\n", + "Having trained the model to a large extent, we now see how it performs on the\n", + "test set. We calculate the Accuracy Score to understand the results closely." + ] + }, + { + "cell_type": "code", + "execution_count": 0, + "metadata": { + "colab_type": "code" + }, + "outputs": [], + "source": [ + "preds = model.predict(tf.convert_to_tensor(x_test))\n", + "\n", + "preds = preds.reshape((preds.shape[0], preds.shape[1]))\n", + "\n", + "results = accuracy_score(preds, y_test)\n", + "\n", + "print(f\"Test Accuracy score : {results*100}%\")\n", + "\n", + "plt.plot(range(len(history.history[\"FinalLoss\"])), history.history[\"FinalLoss\"])\n", + "plt.title(\"Loss over training\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text" + }, + "source": [ + "## Conclusion\n", + "\n", + "This example has hereby demonstrated how the Forward-Forward algorithm works using\n", + "the TensorFlow and Keras packages. While the investigation results presented by Prof. Hinton\n", + "in their paper are currently still limited to smaller models and datasets like MNIST and\n", + "Fashion-MNIST, subsequent results on larger models like LLMs are expected in future\n", + "papers.\n", + "\n", + "Through the paper, Prof. Hinton has reported results of 1.36% test accuracy error with a\n", + "2000-units, 4 hidden-layer, fully-connected network run over 60 epochs (while mentioning\n", + "that backpropagation takes only 20 epochs to achieve similar performance). Another run of\n", + "doubling the learning rate and training for 40 epochs yields a slightly worse error rate\n", + "of 1.46%\n", + "\n", + "The current example does not yield state-of-the-art results. But with proper tuning of\n", + "the Learning Rate, model architecture (number of units in `Dense` layers, kernel\n", + "activations, initializations, regularization etc.), the results can be improved\n", + "to match the claims of the paper." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "collapsed_sections": [], + "name": "forwardforward", + "private_outputs": false, + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.0" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/examples/vision/md/forwardforward.md b/examples/vision/md/forwardforward.md new file mode 100644 index 0000000000..75eb37f093 --- /dev/null +++ b/examples/vision/md/forwardforward.md @@ -0,0 +1,973 @@ +# Using Forward-Forward Algorithm for Image Classification + +**Author:** [Suvaditya Mukherjee](https://twitter.com/halcyonrayes)
+**Date created:** 2023/01/08
+**Last modified:** 2023/01/08
+**Description:** Training a Dense-layer model using the Forward-Forward algorithm. + + + [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/vision/ipynb/forwardforward.ipynb) [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/vision/forwardforward.py) + + + +--- +## Introduction + +The following example explores how to use the Forward-Forward algorithm to perform +training instead of the traditionally-used method of backpropagation, as proposed by +Hinton in +[The Forward-Forward Algorithm: Some Preliminary Investigations](https://www.cs.toronto.edu/~hinton/FFA13.pdf) +(2022). + +The concept was inspired by the understanding behind +[Boltzmann Machines](http://www.cs.toronto.edu/~fritz/absps/dbm.pdf). Backpropagation +involves calculating the difference between actual and predicted output via a cost +function to adjust network weights. On the other hand, the FF Algorithm suggests the +analogy of neurons which get "excited" based on looking at a certain recognized +combination of an image and its correct corresponding label. + +This method takes certain inspiration from the biological learning process that occurs in +the cortex. A significant advantage that this method brings is the fact that +backpropagation through the network does not need to be performed anymore, and that +weight updates are local to the layer itself. + +As this is yet still an experimental method, it does not yield state-of-the-art results. +But with proper tuning, it is supposed to come close to the same. +Through this example, we will examine a process that allows us to implement the +Forward-Forward algorithm within the layers themselves, instead of the traditional method +of relying on the global loss functions and optimizers. + +The tutorial is structured as follows: + +- Perform necessary imports +- Load the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) +- Visualize Random samples from the MNIST dataset +- Define a `FFDense` Layer to override `call` and implement a custom `forwardforward` +method which performs weight updates. +- Define a `FFNetwork` Layer to override `train_step`, `predict` and implement 2 custom +functions for per-sample prediction and overlaying labels +- Convert MNIST from `NumPy` arrays to `tf.data.Dataset` +- Fit the network +- Visualize results +- Perform inference on test samples + +As this example requires the customization of certain core functions with +`keras.layers.Layer` and `keras.models.Model`, refer to the following resources for +a primer on how to do so: + +- [Customizing what happens in `model.fit()`](https://www.tensorflow.org/guide/keras/customizing_what_happens_in_fit) +- [Making new Layers and Models via subclassing](https://www.tensorflow.org/guide/keras/custom_layers_and_models) + +--- +## Setup imports + + +```python +import tensorflow as tf +from tensorflow import keras +import numpy as np +import matplotlib.pyplot as plt +from sklearn.metrics import accuracy_score +import random +from tensorflow.compiler.tf2xla.python import xla +``` + +--- +## Load the dataset and visualize the data + +We use the `keras.datasets.mnist.load_data()` utility to directly pull the MNIST dataset +in the form of `NumPy` arrays. We then arrange it in the form of the train and test +splits. + +Following loading the dataset, we select 4 random samples from within the training set +and visualize them using `matplotlib.pyplot`. + + +```python +(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data() + +print("4 Random Training samples and labels") +idx1, idx2, idx3, idx4 = random.sample(range(0, x_train.shape[0]), 4) + +img1 = (x_train[idx1], y_train[idx1]) +img2 = (x_train[idx2], y_train[idx2]) +img3 = (x_train[idx3], y_train[idx3]) +img4 = (x_train[idx4], y_train[idx4]) + +imgs = [img1, img2, img3, img4] + +plt.figure(figsize=(10, 10)) + +for idx, item in enumerate(imgs): + image, label = item[0], item[1] + plt.subplot(2, 2, idx + 1) + plt.imshow(image, cmap="gray") + plt.title(f"Label : {label}") +plt.show() +``` + +
+``` +Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz +11490434/11490434 [==============================] - 0s 0us/step +4 Random Training samples and labels + +``` +
+![png](/img/examples/vision/forwardforward/forwardforward_5_1.png) + + +--- +## Define `FFDense` custom layer + +In this custom layer, we have a base `keras.layers.Dense` object which acts as the +base `Dense` layer within. Since weight updates will happen within the layer itself, we +add an `keras.optimizers.Optimizer` object that is accepted from the user. Here, we +use `Adam` as our optimizer with a rather higher learning rate of `0.03`. + +Following the algorithm's specifics, we must set a `threshold` parameter that will be +used to make the positive-negative decision in each prediction. This is set to a default +of 2.0. +As the epochs are localized to the layer itself, we also set a `num_epochs` parameter +(defaults to 50). + +We override the `call` method in order to perform a normalization over the complete +input space followed by running it through the base `Dense` layer as would happen in a +normal `Dense` layer call. + +We implement the Forward-Forward algorithm which accepts 2 kinds of input tensors, each +representing the positive and negative samples respectively. We write a custom training +loop here with the use of `tf.GradientTape()`, within which we calculate a loss per +sample by taking the distance of the prediction from the threshold to understand the +error and taking its mean to get a `mean_loss` metric. + +With the help of `tf.GradientTape()` we calculate the gradient updates for the trainable +base `Dense` layer and apply them using the layer's local optimizer. + +Finally, we return the `call` result as the `Dense` results of the positive and negative +samples while also returning the last `mean_loss` metric and all the loss values over a +certain all-epoch run. + + +```python + +class FFDense(keras.layers.Layer): + """ + A custom ForwardForward-enabled Dense layer. It has an implementation of the + Forward-Forward network internally for use. + This layer must be used in conjunction with the `FFNetwork` model. + """ + + def __init__( + self, + units, + optimizer, + loss_metric, + num_epochs=50, + use_bias=True, + kernel_initializer="glorot_uniform", + bias_initializer="zeros", + kernel_regularizer=None, + bias_regularizer=None, + **kwargs, + ): + super().__init__(**kwargs) + self.dense = keras.layers.Dense( + units=units, + use_bias=use_bias, + kernel_initializer=kernel_initializer, + bias_initializer=bias_initializer, + kernel_regularizer=kernel_regularizer, + bias_regularizer=bias_regularizer, + ) + self.relu = keras.layers.ReLU() + self.optimizer = optimizer + self.loss_metric = loss_metric + self.threshold = 1.5 + self.num_epochs = num_epochs + + # We perform a normalization step before we run the input through the Dense + # layer. + + def call(self, x): + x_norm = tf.norm(x, ord=2, axis=1, keepdims=True) + x_norm = x_norm + 1e-4 + x_dir = x / x_norm + res = self.dense(x_dir) + return self.relu(res) + + # The Forward-Forward algorithm is below. We first perform the Dense-layer + # operation and then get a Mean Square value for all positive and negative + # samples respectively. + # The custom loss function finds the distance between the Mean-squared + # result and the threshold value we set (a hyperparameter) that will define + # whether the prediction is positive or negative in nature. Once the loss is + # calculated, we get a mean across the entire batch combined and perform a + # gradient calculation and optimization step. This does not technically + # qualify as backpropagation since there is no gradient being + # sent to any previous layer and is completely local in nature. + + def forward_forward(self, x_pos, x_neg): + for i in range(self.num_epochs): + with tf.GradientTape() as tape: + g_pos = tf.math.reduce_mean(tf.math.pow(self.call(x_pos), 2), 1) + g_neg = tf.math.reduce_mean(tf.math.pow(self.call(x_neg), 2), 1) + + loss = tf.math.log( + 1 + + tf.math.exp( + tf.concat([-g_pos + self.threshold, g_neg - self.threshold], 0) + ) + ) + mean_loss = tf.cast(tf.math.reduce_mean(loss), tf.float32) + self.loss_metric.update_state([mean_loss]) + gradients = tape.gradient(mean_loss, self.dense.trainable_weights) + self.optimizer.apply_gradients(zip(gradients, self.dense.trainable_weights)) + return ( + tf.stop_gradient(self.call(x_pos)), + tf.stop_gradient(self.call(x_neg)), + self.loss_metric.result(), + ) + +``` + +--- +## Define the `FFNetwork` Custom Model + +With our custom layer defined, we also need to override the `train_step` method and +define a custom `keras.models.Model` that works with our `FFDense` layer. + +For this algorithm, we must 'embed' the labels onto the original image. To do so, we +exploit the structure of MNIST images where the top-left 10 pixels are always zeros. We +use that as a label space in order to visually one-hot-encode the labels within the image +itself. This action is performed by the `overlay_y_on_x` function. + +We break down the prediction function with a per-sample prediction function which is then +called over the entire test set by the overriden `predict()` function. The prediction is +performed here with the help of measuring the `excitation` of the neurons per layer for +each image. This is then summed over all layers to calculate a network-wide 'goodness +score'. The label with the highest 'goodness score' is then chosen as the sample +prediction. + +The `train_step` function is overriden to act as the main controlling loop for running +training on each layer as per the number of epochs per layer. + + +```python + +class FFNetwork(keras.Model): + """ + A `keras.Model` that supports a `FFDense` network creation. This model + can work for any kind of classification task. It has an internal + implementation with some details specific to the MNIST dataset which can be + changed as per the use-case. + """ + + # Since each layer runs gradient-calculation and optimization locally, each + # layer has its own optimizer that we pass. As a standard choice, we pass + # the `Adam` optimizer with a default learning rate of 0.03 as that was + # found to be the best rate after experimentation. + # Loss is tracked using `loss_var` and `loss_count` variables. + + def __init__( + self, dims, layer_optimizer=keras.optimizers.Adam(learning_rate=0.03), **kwargs + ): + super().__init__(**kwargs) + self.layer_optimizer = layer_optimizer + self.loss_var = tf.Variable(0.0, trainable=False, dtype=tf.float32) + self.loss_count = tf.Variable(0.0, trainable=False, dtype=tf.float32) + self.layer_list = [keras.Input(shape=(dims[0],))] + for d in range(len(dims) - 1): + self.layer_list += [ + FFDense( + dims[d + 1], + optimizer=self.layer_optimizer, + loss_metric=keras.metrics.Mean(), + ) + ] + + # This function makes a dynamic change to the image wherein the labels are + # put on top of the original image (for this example, as MNIST has 10 + # unique labels, we take the top-left corner's first 10 pixels). This + # function returns the original data tensor with the first 10 pixels being + # a pixel-based one-hot representation of the labels. + + @tf.function(reduce_retracing=True) + def overlay_y_on_x(self, data): + X_sample, y_sample = data + max_sample = tf.reduce_max(X_sample, axis=0, keepdims=True) + max_sample = tf.cast(max_sample, dtype=tf.float64) + X_zeros = tf.zeros([10], dtype=tf.float64) + X_update = xla.dynamic_update_slice(X_zeros, max_sample, [y_sample]) + X_sample = xla.dynamic_update_slice(X_sample, X_update, [0]) + return X_sample, y_sample + + # A custom `predict_one_sample` performs predictions by passing the images + # through the network, measures the results produced by each layer (i.e. + # how high/low the output values are with respect to the set threshold for + # each label) and then simply finding the label with the highest values. + # In such a case, the images are tested for their 'goodness' with all + # labels. + + @tf.function(reduce_retracing=True) + def predict_one_sample(self, x): + goodness_per_label = [] + x = tf.reshape(x, [tf.shape(x)[0] * tf.shape(x)[1]]) + for label in range(10): + h, label = self.overlay_y_on_x(data=(x, label)) + h = tf.reshape(h, [-1, tf.shape(h)[0]]) + goodness = [] + for layer_idx in range(1, len(self.layer_list)): + layer = self.layer_list[layer_idx] + h = layer(h) + goodness += [tf.math.reduce_mean(tf.math.pow(h, 2), 1)] + goodness_per_label += [ + tf.expand_dims(tf.reduce_sum(goodness, keepdims=True), 1) + ] + goodness_per_label = tf.concat(goodness_per_label, 1) + return tf.cast(tf.argmax(goodness_per_label, 1), tf.float64) + + def predict(self, data): + x = data + preds = list() + preds = tf.map_fn(fn=self.predict_one_sample, elems=x) + return np.asarray(preds, dtype=int) + + # This custom `train_step` function overrides the internal `train_step` + # implementation. We take all the input image tensors, flatten them and + # subsequently produce positive and negative samples on the images. + # A positive sample is an image that has the right label encoded on it with + # the `overlay_y_on_x` function. A negative sample is an image that has an + # erroneous label present on it. + # With the samples ready, we pass them through each `FFLayer` and perform + # the Forward-Forward computation on it. The returned loss is the final + # loss value over all the layers. + + @tf.function(jit_compile=True) + def train_step(self, data): + x, y = data + + # Flatten op + x = tf.reshape(x, [-1, tf.shape(x)[1] * tf.shape(x)[2]]) + + x_pos, y = tf.map_fn(fn=self.overlay_y_on_x, elems=(x, y)) + + random_y = tf.random.shuffle(y) + x_neg, y = tf.map_fn(fn=self.overlay_y_on_x, elems=(x, random_y)) + + h_pos, h_neg = x_pos, x_neg + + for idx, layer in enumerate(self.layers): + if isinstance(layer, FFDense): + print(f"Training layer {idx+1} now : ") + h_pos, h_neg, loss = layer.forward_forward(h_pos, h_neg) + self.loss_var.assign_add(loss) + self.loss_count.assign_add(1.0) + else: + print(f"Passing layer {idx+1} now : ") + x = layer(x) + mean_res = tf.math.divide(self.loss_var, self.loss_count) + return {"FinalLoss": mean_res} + +``` + +--- +## Convert MNIST `NumPy` arrays to `tf.data.Dataset` + +We now perform some preliminary processing on the `NumPy` arrays and then convert them +into the `tf.data.Dataset` format which allows for optimized loading. + + +```python +x_train = x_train.astype(float) / 255 +x_test = x_test.astype(float) / 255 +y_train = y_train.astype(int) +y_test = y_test.astype(int) + +train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)) +test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)) + +train_dataset = train_dataset.batch(60000) +test_dataset = test_dataset.batch(10000) +``` + +--- +## Fit the network and visualize results + +Having performed all previous set-up, we are now going to run `model.fit()` and run 250 +model epochs, which will perform 50*250 epochs on each layer. We get to see the plotted loss +curve as each layer is trained. + + +```python +model = FFNetwork(dims=[784, 500, 500]) + +model.compile( + optimizer=keras.optimizers.Adam(learning_rate=0.03), + loss="mse", + jit_compile=True, + metrics=[keras.metrics.Mean()], +) + +epochs = 250 +history = model.fit(train_dataset, epochs=epochs) +``` + +
+``` +Epoch 1/250 +Training layer 1 now : +Training layer 2 now : +Training layer 1 now : +Training layer 2 now : +1/1 [==============================] - 72s 72s/step - FinalLoss: 0.7279 +Epoch 2/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.7082 +Epoch 3/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.7031 +Epoch 4/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.6806 +Epoch 5/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.6564 +Epoch 6/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.6333 +Epoch 7/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.6126 +Epoch 8/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.5946 +Epoch 9/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.5786 +Epoch 10/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.5644 +Epoch 11/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.5518 +Epoch 12/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.5405 +Epoch 13/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.5301 +Epoch 14/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.5207 +Epoch 15/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.5122 +Epoch 16/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.5044 +Epoch 17/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4972 +Epoch 18/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4906 +Epoch 19/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4845 +Epoch 20/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4787 +Epoch 21/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4734 +Epoch 22/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4685 +Epoch 23/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4639 +Epoch 24/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4596 +Epoch 25/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4555 +Epoch 26/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4516 +Epoch 27/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4479 +Epoch 28/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4445 +Epoch 29/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4411 +Epoch 30/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4380 +Epoch 31/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4350 +Epoch 32/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4322 +Epoch 33/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4295 +Epoch 34/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4269 +Epoch 35/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4245 +Epoch 36/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4222 +Epoch 37/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4199 +Epoch 38/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4178 +Epoch 39/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4157 +Epoch 40/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4136 +Epoch 41/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4117 +Epoch 42/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4098 +Epoch 43/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4079 +Epoch 44/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4062 +Epoch 45/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4045 +Epoch 46/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4028 +Epoch 47/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.4012 +Epoch 48/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3996 +Epoch 49/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3982 +Epoch 50/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3967 +Epoch 51/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3952 +Epoch 52/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3938 +Epoch 53/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3925 +Epoch 54/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3912 +Epoch 55/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3899 +Epoch 56/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3886 +Epoch 57/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3874 +Epoch 58/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3862 +Epoch 59/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3851 +Epoch 60/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3840 +Epoch 61/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3829 +Epoch 62/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3818 +Epoch 63/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3807 +Epoch 64/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3797 +Epoch 65/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3787 +Epoch 66/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3777 +Epoch 67/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3767 +Epoch 68/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3758 +Epoch 69/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3748 +Epoch 70/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3739 +Epoch 71/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3730 +Epoch 72/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3721 +Epoch 73/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3712 +Epoch 74/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3704 +Epoch 75/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3695 +Epoch 76/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3688 +Epoch 77/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3680 +Epoch 78/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3671 +Epoch 79/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3664 +Epoch 80/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3656 +Epoch 81/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3648 +Epoch 82/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3641 +Epoch 83/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3634 +Epoch 84/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3627 +Epoch 85/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3620 +Epoch 86/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3613 +Epoch 87/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3606 +Epoch 88/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3599 +Epoch 89/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3593 +Epoch 90/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3586 +Epoch 91/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3580 +Epoch 92/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3574 +Epoch 93/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3568 +Epoch 94/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3561 +Epoch 95/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3555 +Epoch 96/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3549 +Epoch 97/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3544 +Epoch 98/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3538 +Epoch 99/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3532 +Epoch 100/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3526 +Epoch 101/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3521 +Epoch 102/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3515 +Epoch 103/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3510 +Epoch 104/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3505 +Epoch 105/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3499 +Epoch 106/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3494 +Epoch 107/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3489 +Epoch 108/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3484 +Epoch 109/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3478 +Epoch 110/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3474 +Epoch 111/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3468 +Epoch 112/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3464 +Epoch 113/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3459 +Epoch 114/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3454 +Epoch 115/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3450 +Epoch 116/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3445 +Epoch 117/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3440 +Epoch 118/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3436 +Epoch 119/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3432 +Epoch 120/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3427 +Epoch 121/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3423 +Epoch 122/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3419 +Epoch 123/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3414 +Epoch 124/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3410 +Epoch 125/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3406 +Epoch 126/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3402 +Epoch 127/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3398 +Epoch 128/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3394 +Epoch 129/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3390 +Epoch 130/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3386 +Epoch 131/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3382 +Epoch 132/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3378 +Epoch 133/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3375 +Epoch 134/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3371 +Epoch 135/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3368 +Epoch 136/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3364 +Epoch 137/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3360 +Epoch 138/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3357 +Epoch 139/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3353 +Epoch 140/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3350 +Epoch 141/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3346 +Epoch 142/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3343 +Epoch 143/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3339 +Epoch 144/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3336 +Epoch 145/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3333 +Epoch 146/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3329 +Epoch 147/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3326 +Epoch 148/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3323 +Epoch 149/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3320 +Epoch 150/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3317 +Epoch 151/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3313 +Epoch 152/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3310 +Epoch 153/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3307 +Epoch 154/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3304 +Epoch 155/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3302 +Epoch 156/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3299 +Epoch 157/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3296 +Epoch 158/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3293 +Epoch 159/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3290 +Epoch 160/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3287 +Epoch 161/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3284 +Epoch 162/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3281 +Epoch 163/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3279 +Epoch 164/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3276 +Epoch 165/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3273 +Epoch 166/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3270 +Epoch 167/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3268 +Epoch 168/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3265 +Epoch 169/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3262 +Epoch 170/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3260 +Epoch 171/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3257 +Epoch 172/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3255 +Epoch 173/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3252 +Epoch 174/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3250 +Epoch 175/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3247 +Epoch 176/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3244 +Epoch 177/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3242 +Epoch 178/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3240 +Epoch 179/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3237 +Epoch 180/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3235 +Epoch 181/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3232 +Epoch 182/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3230 +Epoch 183/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3228 +Epoch 184/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3225 +Epoch 185/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3223 +Epoch 186/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3221 +Epoch 187/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3219 +Epoch 188/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3216 +Epoch 189/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3214 +Epoch 190/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3212 +Epoch 191/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3210 +Epoch 192/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3208 +Epoch 193/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3205 +Epoch 194/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3203 +Epoch 195/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3201 +Epoch 196/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3199 +Epoch 197/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3197 +Epoch 198/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3195 +Epoch 199/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3193 +Epoch 200/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3191 +Epoch 201/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3189 +Epoch 202/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3187 +Epoch 203/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3185 +Epoch 204/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3183 +Epoch 205/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3181 +Epoch 206/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3179 +Epoch 207/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3177 +Epoch 208/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3175 +Epoch 209/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3174 +Epoch 210/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3172 +Epoch 211/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3170 +Epoch 212/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3168 +Epoch 213/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3166 +Epoch 214/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3165 +Epoch 215/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3163 +Epoch 216/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3161 +Epoch 217/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3159 +Epoch 218/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3157 +Epoch 219/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3155 +Epoch 220/250 +1/1 [==============================] - 5s 5s/step - FinalLoss: 0.3154 +Epoch 221/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3152 +Epoch 222/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3150 +Epoch 223/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3148 +Epoch 224/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3147 +Epoch 225/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3145 +Epoch 226/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3143 +Epoch 227/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3142 +Epoch 228/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3140 +Epoch 229/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3139 +Epoch 230/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3137 +Epoch 231/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3135 +Epoch 232/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3134 +Epoch 233/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3132 +Epoch 234/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3131 +Epoch 235/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3129 +Epoch 236/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3127 +Epoch 237/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3126 +Epoch 238/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3124 +Epoch 239/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3123 +Epoch 240/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3121 +Epoch 241/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3120 +Epoch 242/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3118 +Epoch 243/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3117 +Epoch 244/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3116 +Epoch 245/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3114 +Epoch 246/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3113 +Epoch 247/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3111 +Epoch 248/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3110 +Epoch 249/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3108 +Epoch 250/250 +1/1 [==============================] - 6s 6s/step - FinalLoss: 0.3107 + +``` +
+--- +## Perform inference and testing + +Having trained the model to a large extent, we now see how it performs on the +test set. We calculate the Accuracy Score to understand the results closely. + + +```python +preds = model.predict(tf.convert_to_tensor(x_test)) + +preds = preds.reshape((preds.shape[0], preds.shape[1])) + +results = accuracy_score(preds, y_test) + +print(f"Test Accuracy score : {results*100}%") + +plt.plot(range(len(history.history["FinalLoss"])), history.history["FinalLoss"]) +plt.title("Loss over training") +plt.show() +``` + +
+``` +Test Accuracy score : 97.64% + +``` +
+![png](/img/examples/vision/forwardforward/forwardforward_15_1.png) + + +--- +## Conclusion + +This example has hereby demonstrated how the Forward-Forward algorithm works using +the TensorFlow and Keras packages. While the investigation results presented by Prof. Hinton +in their paper are currently still limited to smaller models and datasets like MNIST and +Fashion-MNIST, subsequent results on larger models like LLMs are expected in future +papers. + +Through the paper, Prof. Hinton has reported results of 1.36% test accuracy error with a +2000-units, 4 hidden-layer, fully-connected network run over 60 epochs (while mentioning +that backpropagation takes only 20 epochs to achieve similar performance). Another run of +doubling the learning rate and training for 40 epochs yields a slightly worse error rate +of 1.46% + +The current example does not yield state-of-the-art results. But with proper tuning of +the Learning Rate, model architecture (number of units in `Dense` layers, kernel +activations, initializations, regularization etc.), the results can be improved +to match the claims of the paper.