diff --git a/docs/neural-network/activation-functions/elu.md b/docs/neural-network/activation-functions/elu.md
index 15e771965..fc7b1f326 100644
--- a/docs/neural-network/activation-functions/elu.md
+++ b/docs/neural-network/activation-functions/elu.md
@@ -16,9 +16,6 @@ $$
 |---|---|---|---|---|
 | 1 | alpha | 1.0 | float | The value at which leakage will begin to saturate. Ex. alpha = 1.0 means that the output will never be less than -1.0 when inactivated. |
 
-## Size and Performance
-ELU is a simple function and is well-suited for deployment on resource-constrained devices or when working with large neural networks.
-
 ## Plots
 <img src="../../images/activation-functions/elu.png" alt="ELU Function" width="500" height="auto">
 
diff --git a/docs/neural-network/activation-functions/gelu.md b/docs/neural-network/activation-functions/gelu.md
index d4de7cd16..ab88f6bf2 100644
--- a/docs/neural-network/activation-functions/gelu.md
+++ b/docs/neural-network/activation-functions/gelu.md
@@ -10,9 +10,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-GELU is computationally more expensive than simpler activation functions like ReLU due to its use of hyperbolic tangent and exponential calculations. The implementation uses an approximation formula to improve performance, but it still requires more computational resources. Despite this cost, GELU has gained popularity in transformer architectures and other deep learning models due to its favorable properties for training deep networks.
-
 ## Plots
 <img src="../../images/activation-functions/gelu.png" alt="GELU Function" width="500" height="auto">
 
diff --git a/docs/neural-network/activation-functions/hard-sigmoid.md b/docs/neural-network/activation-functions/hard-sigmoid.md
index d1266f650..3d27c7d6e 100644
--- a/docs/neural-network/activation-functions/hard-sigmoid.md
+++ b/docs/neural-network/activation-functions/hard-sigmoid.md
@@ -10,9 +10,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-Hard Sigmoid has a minimal memory footprint compared to the standard Sigmoid function, as it uses simple arithmetic operations (multiplication, addition) and comparisons instead of expensive exponential calculations. This makes it particularly well-suited for mobile and embedded applications or when computational resources are limited.
-
 ## Plots
 <img src="../../images/activation-functions/hard-sigmoid.png" alt="Hard Sigmoid Function" width="500" height="auto">
 
diff --git a/docs/neural-network/activation-functions/hard-silu.md b/docs/neural-network/activation-functions/hard-silu.md
index b666c695a..7f6161669 100644
--- a/docs/neural-network/activation-functions/hard-silu.md
+++ b/docs/neural-network/activation-functions/hard-silu.md
@@ -12,9 +12,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-Hard SiLU is designed to be computationally efficient compared to the standard SiLU (Swish) activation function. By using the piecewise linear Hard Sigmoid approximation instead of the standard Sigmoid function, it reduces the computational complexity while maintaining similar functional properties. This makes it particularly suitable for deployment on resource-constrained devices or when working with large neural networks.
-
 ## Plots
 <img src="../../images/activation-functions/hard-silu.png" alt="Hard SiLU Function" width="500" height="auto">
 
diff --git a/docs/neural-network/activation-functions/hyperbolic-tangent.md b/docs/neural-network/activation-functions/hyperbolic-tangent.md
index 29ab1bb79..6a06cd0de 100644
--- a/docs/neural-network/activation-functions/hyperbolic-tangent.md
+++ b/docs/neural-network/activation-functions/hyperbolic-tangent.md
@@ -10,9 +10,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-Hyperbolic Tangent requires more computational resources compared to simpler activation functions like ReLU due to its exponential calculations. While not as computationally efficient as piecewise linear functions, it provides important zero-centered outputs that can be critical for certain network architectures, particularly in recurrent neural networks where gradient flow is important.
-
 ## Plots
 <img src="../../images/activation-functions/hyperbolic-tangent.png" alt="Hyperbolic Tangent Function" width="500" height="auto">
 
diff --git a/docs/neural-network/activation-functions/leaky-relu.md b/docs/neural-network/activation-functions/leaky-relu.md
index 020655489..dcea6dee4 100644
--- a/docs/neural-network/activation-functions/leaky-relu.md
+++ b/docs/neural-network/activation-functions/leaky-relu.md
@@ -16,9 +16,6 @@ $$
 |---|---|---|---|---|
 | 1 | leakage | 0.1 | float | The amount of leakage as a proportion of the input value to allow to pass through when not inactivated. |
 
-## Size and Performance
-Leaky ReLU is computationally efficient, requiring only simple comparison operations and multiplication. It has a minimal memory footprint and executes quickly compared to more complex activation functions that use exponential or hyperbolic calculations. The leakage parameter allows for a small gradient when the unit is not active, which helps prevent the "dying ReLU" problem while maintaining the computational efficiency of the standard ReLU function.
-
 ## Plots
 <img src="../../images/activation-functions/leaky-relu.png" alt="Leaky ReLU Function" width="500" height="auto">
 
diff --git a/docs/neural-network/activation-functions/relu.md b/docs/neural-network/activation-functions/relu.md
index 5c68a67cd..6044dfb5e 100644
--- a/docs/neural-network/activation-functions/relu.md
+++ b/docs/neural-network/activation-functions/relu.md
@@ -14,9 +14,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-ReLU is one of the most computationally efficient activation functions, requiring only a simple comparison operation and conditional assignment. It has minimal memory requirements and executes very quickly compared to activation functions that use exponential or hyperbolic calculations. This efficiency makes ReLU particularly well-suited for deep networks with many layers, where the computational savings compound significantly.
-
 ## Plots
 <img src="../../images/activation-functions/relu.png" alt="ReLU Function" width="500" height="auto">
 
diff --git a/docs/neural-network/activation-functions/relu6.md b/docs/neural-network/activation-functions/relu6.md
index e854e9dcc..f9c616e8f 100644
--- a/docs/neural-network/activation-functions/relu6.md
+++ b/docs/neural-network/activation-functions/relu6.md
@@ -10,9 +10,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-ReLU6 maintains the computational efficiency of standard ReLU while adding an upper bound check. It requires only simple comparison operations and conditional assignments. The additional upper bound check adds minimal computational overhead compared to standard ReLU, while providing benefits for quantization and numerical stability. This makes ReLU6 particularly well-suited for mobile and embedded applications where model size and computational efficiency are critical.
-
 ## Plots
 <img src="../../images/activation-functions/relu6.png" alt="ReLU6 Function" width="500" height="auto">
 
diff --git a/docs/neural-network/activation-functions/selu.md b/docs/neural-network/activation-functions/selu.md
index 7dc68283c..adbeef3ff 100644
--- a/docs/neural-network/activation-functions/selu.md
+++ b/docs/neural-network/activation-functions/selu.md
@@ -18,9 +18,6 @@ Where the constants are typically:
 ## Parameters
 This actvation function does not have any parameters.
 
-## Size and Performance
-SELU is computationally more expensive than simpler activation functions like ReLU due to its use of exponential calculations for negative inputs. However, it offers significant benefits by enabling self-normalization, which can eliminate the need for additional normalization layers like Batch Normalization. This trade-off often results in better overall network performance and potentially simpler network architectures. The self-normalizing property of SELU can lead to faster convergence during training and more stable gradients, which may reduce the total computational cost of training deep networks despite the higher per-activation computational cost.
-
 ## Plots
 <img src="../../images/activation-functions/selu.png" alt="SELU Function" width="500" height="auto">
 
diff --git a/docs/neural-network/activation-functions/sigmoid.md b/docs/neural-network/activation-functions/sigmoid.md
index 9e1d2a9f4..7625f67bd 100644
--- a/docs/neural-network/activation-functions/sigmoid.md
+++ b/docs/neural-network/activation-functions/sigmoid.md
@@ -10,9 +10,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-Sigmoid is computationally more expensive than simpler activation functions like ReLU due to its use of exponential calculations. It requires computing an exponential term and a division operation for each neuron activation. For deep networks, this computational cost can become significant. Additionally, sigmoid activations can cause the vanishing gradient problem during backpropagation when inputs are large in magnitude, potentially slowing down training. Despite these limitations, sigmoid remains valuable in output layers of networks performing binary classification or when probability interpretations are needed.
-
 ## Plots
 <img src="../../images/activation-functions/sigmoid.png" alt="Sigmoid Function" width="500" height="auto">
 
diff --git a/docs/neural-network/activation-functions/silu.md b/docs/neural-network/activation-functions/silu.md
index e553561ff..02f898745 100644
--- a/docs/neural-network/activation-functions/silu.md
+++ b/docs/neural-network/activation-functions/silu.md
@@ -13,9 +13,6 @@ Where
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-SiLU is computationally more expensive than simpler activation functions like ReLU due to its use of exponential calculations from the sigmoid component. Each activation requires computing an exponential term and a division operation. However, SiLU offers improved performance in deep learning models, particularly in computer vision and natural language processing tasks, which can justify the additional computational cost. The smooth, non-monotonic nature of SiLU helps with gradient flow during training, potentially leading to faster convergence and better overall model performance despite the higher per-activation computational cost.
-
 ## Plots
 <img src="../../images/activation-functions/silu.png" alt="SiLU Function" width="500" height="auto">
 
diff --git a/docs/neural-network/activation-functions/softmax.md b/docs/neural-network/activation-functions/softmax.md
index afe50d394..368ae7ba7 100644
--- a/docs/neural-network/activation-functions/softmax.md
+++ b/docs/neural-network/activation-functions/softmax.md
@@ -16,9 +16,6 @@ Where:
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-Softmax is computationally more expensive than many other activation functions due to its need to process all neurons in a layer collectively rather than independently. It requires exponential calculations for each neuron, followed by a normalization step that involves summing all exponential values and dividing each by this sum. This creates a computational dependency between all neurons in the layer. Despite this cost, Softmax is essential for multi-class classification output layers where probability distributions are required. The implementation uses optimized matrix operations to improve performance, but the computational complexity still scales with the number of neurons in the layer.
-
 ## Plots
 <img src="../../images/activation-functions/softmax.png" alt="Softmax Function" width="500" height="auto">
 
diff --git a/docs/neural-network/activation-functions/softplus.md b/docs/neural-network/activation-functions/softplus.md
index 4a69c6985..99fb9dd48 100644
--- a/docs/neural-network/activation-functions/softplus.md
+++ b/docs/neural-network/activation-functions/softplus.md
@@ -10,9 +10,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-Softplus is computationally more expensive than ReLU due to its use of both exponential and logarithmic calculations. Each activation requires computing an exponential term, an addition, and a logarithm. This makes Softplus significantly more resource-intensive than simpler activation functions, especially in large networks. However, Softplus provides a smooth, differentiable alternative to ReLU with no zero-gradient regions, which can improve gradient flow during training for certain types of networks. The trade-off between computational cost and the benefits of smoothness should be considered when choosing between Softplus and ReLU.
-
 ## Plots
 <img src="../../images/activation-functions/softplus.png" alt="Softplus Function" width="500" height="auto">
 
diff --git a/docs/neural-network/activation-functions/softsign.md b/docs/neural-network/activation-functions/softsign.md
index 8a05f9944..b9e40dc68 100644
--- a/docs/neural-network/activation-functions/softsign.md
+++ b/docs/neural-network/activation-functions/softsign.md
@@ -10,9 +10,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-Softsign is computationally more efficient than functions that use exponential calculations (like Sigmoid or Tanh), as it only requires an absolute value operation and basic arithmetic. However, it is slightly more expensive than ReLU due to the division operation. Softsign has the advantage of not saturating as quickly as Tanh, which can help with the vanishing gradient problem in deep networks. The smoother gradients of Softsign can lead to more stable training dynamics, though this comes at a small computational cost compared to simpler activation functions.
-
 ## Plots
 <img src="../../images/activation-functions/softsign.png" alt="Softsign Function" width="500" height="auto">
 
diff --git a/docs/neural-network/activation-functions/thresholded-relu.md b/docs/neural-network/activation-functions/thresholded-relu.md
index 0fecb02cc..5a4cf2553 100644
--- a/docs/neural-network/activation-functions/thresholded-relu.md
+++ b/docs/neural-network/activation-functions/thresholded-relu.md
@@ -16,9 +16,6 @@ $$
 |---|---|---|---|---|
 | 1 | threshold | 1.0 | float | The threshold at which the neuron is activated. |
 
-## Size and Performance
-Thresholded ReLU maintains the computational efficiency of standard ReLU while adding a threshold comparison. It requires only a simple comparison operation against the threshold value and conditional assignment. This makes it nearly as efficient as standard ReLU with minimal additional computational overhead. The threshold parameter allows for controlling neuron sparsity, which can be beneficial for reducing overfitting and improving generalization in certain network architectures. By adjusting the threshold, you can fine-tune the balance between network capacity and regularization without significantly impacting computational performance.
-
 ## Plots
 <img src="../../images/activation-functions/thresholded-relu.png" alt="Thresholded ReLU Function" width="500" height="auto">
 
diff --git a/src/NeuralNet/ActivationFunctions/ELU/ELU.php b/src/NeuralNet/ActivationFunctions/ELU/ELU.php
index 3d51feac3..354aa61e7 100644
--- a/src/NeuralNet/ActivationFunctions/ELU/ELU.php
+++ b/src/NeuralNet/ActivationFunctions/ELU/ELU.php
@@ -56,17 +56,14 @@ public function __construct(protected float $alpha = 1.0)
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate positive part: x for x > 0
         $positiveActivation = NumPower::maximum($input, 0);
 
-        // Calculate negative part: alpha * (e^x - 1) for x <= 0
         $negativeMask = NumPower::minimum($input, 0);
         $negativeActivation = NumPower::multiply(
             NumPower::expm1($negativeMask),
             $this->alpha
         );
 
-        // Combine both parts
         return NumPower::add($positiveActivation, $negativeActivation);
     }
 
@@ -82,17 +79,14 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input, NDArray $output) : NDArray
     {
-        // For x > 0: 1
         $positiveMask = NumPower::greater($input, 0);
 
-        // For x <= 0: output + α
         $negativeMask = NumPower::lessEqual($input, 0);
         $negativePart = NumPower::multiply(
             NumPower::add($output, $this->alpha),
             $negativeMask
         );
 
-        // Combine both parts
         return NumPower::add($positiveMask, $negativePart);
     }
 
diff --git a/src/NeuralNet/ActivationFunctions/GELU/GELU.php b/src/NeuralNet/ActivationFunctions/GELU/GELU.php
index e6870c263..710e7e919 100644
--- a/src/NeuralNet/ActivationFunctions/GELU/GELU.php
+++ b/src/NeuralNet/ActivationFunctions/GELU/GELU.php
@@ -55,24 +55,11 @@ class GELU implements ActivationFunction, IBufferDerivative
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate x^3
         $cubed = NumPower::pow($input, 3);
-
-        // Calculate inner term: x + BETA * x^3
-        $innerTerm = NumPower::add(
-            $input,
-            NumPower::multiply($cubed, self::BETA)
-        );
-
-        // Apply tanh(ALPHA * innerTerm)
-        $tanhTerm = NumPower::tanh(
-            NumPower::multiply($innerTerm, self::ALPHA)
-        );
-
-        // Calculate 1 + tanhTerm
+        $innerTerm = NumPower::add($input, NumPower::multiply($cubed, self::BETA));
+        $tanhTerm = NumPower::tanh(NumPower::multiply($innerTerm, self::ALPHA));
         $onePlusTanh = NumPower::add(1.0, $tanhTerm);
 
-        // Calculate 0.5 * x * (1 + tanhTerm)
         return NumPower::multiply(
             NumPower::multiply($input, $onePlusTanh),
             0.5
@@ -96,10 +83,8 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input) : NDArray
     {
-        // Calculate x^3
         $cubed = NumPower::pow($input, 3);
 
-        // Calculate inner term: ALPHA * (x + BETA * x^3)
         $innerTerm = NumPower::multiply(
             NumPower::add(
                 $input,
@@ -108,20 +93,17 @@ public function differentiate(NDArray $input) : NDArray
             self::ALPHA
         );
 
-        // Calculate cosh and sech^2
         $cosh = NumPower::cosh($innerTerm);
         $sech2 = NumPower::pow(
             NumPower::divide(1.0, $cosh),
             2
         );
 
-        // Calculate 0.5 * (1 + tanh(innerTerm))
         $firstTerm = NumPower::multiply(
             NumPower::add(1.0, NumPower::tanh($innerTerm)),
             0.5
         );
 
-        // Calculate 0.5 * x * sech^2 * ALPHA * (1 + 3 * BETA * x^2)
         $secondTerm = NumPower::multiply(
             NumPower::multiply(
                 NumPower::multiply(
@@ -139,7 +121,6 @@ public function differentiate(NDArray $input) : NDArray
             )
         );
 
-        // Combine terms
         return NumPower::add($firstTerm, $secondTerm);
     }
 
diff --git a/src/NeuralNet/ActivationFunctions/HardSiLU/HardSiLU.php b/src/NeuralNet/ActivationFunctions/HardSiLU/HardSiLU.php
index 6ed54631a..6ab81eac1 100644
--- a/src/NeuralNet/ActivationFunctions/HardSiLU/HardSiLU.php
+++ b/src/NeuralNet/ActivationFunctions/HardSiLU/HardSiLU.php
@@ -51,10 +51,8 @@ public function __construct()
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate HardSigmoid(x)
         $hardSigmoid = $this->hardSigmoid->activate($input);
 
-        // Calculate x * HardSigmoid(x)
         return NumPower::multiply($input, $hardSigmoid);
     }
 
@@ -68,16 +66,10 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input) : NDArray
     {
-        // Calculate HardSigmoid(x)
         $hardSigmoid = $this->hardSigmoid->activate($input);
-
-        // Calculate HardSigmoid'(x)
         $hardSigmoidDerivative = $this->hardSigmoid->differentiate($input);
-
-        // Calculate x * HardSigmoid'(x)
         $xTimesDerivative = NumPower::multiply($input, $hardSigmoidDerivative);
 
-        // Calculate HardSigmoid(x) + x * HardSigmoid'(x)
         return NumPower::add($hardSigmoid, $xTimesDerivative);
     }
 
diff --git a/src/NeuralNet/ActivationFunctions/HardSigmoid/HardSigmoid.php b/src/NeuralNet/ActivationFunctions/HardSigmoid/HardSigmoid.php
index 5bfd6b62c..659e2c07d 100644
--- a/src/NeuralNet/ActivationFunctions/HardSigmoid/HardSigmoid.php
+++ b/src/NeuralNet/ActivationFunctions/HardSigmoid/HardSigmoid.php
@@ -63,13 +63,11 @@ class HardSigmoid implements ActivationFunction, IBufferDerivative
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate 0.2 * x + 0.5
         $linear = NumPower::add(
             NumPower::multiply($input, self::SLOPE),
             self::INTERCEPT
         );
 
-        // Clip values between 0 and 1
         return NumPower::clip($linear, 0.0, 1.0);
     }
 
@@ -89,11 +87,6 @@ public function differentiate(NDArray $input) : NDArray
         $inLinearRegion = NumPower::multiply($inLinearRegion, NumPower::lessEqual($input, self::UPPER_BOUND));
         $linearPart = NumPower::multiply($inLinearRegion, self::SLOPE);
 
-        // For values outside the linear region: 0
-        // Since we're multiplying by 0 for these regions, we don't need to explicitly handle them
-        // The mask $inLinearRegion already contains 0s for x <= -2.5 and x >= 2.5,
-        // so when we multiply by SLOPE, those values remain 0 in the result
-
         return $linearPart;
     }
 
diff --git a/src/NeuralNet/ActivationFunctions/HyperbolicTangent/HyperbolicTangent.php b/src/NeuralNet/ActivationFunctions/HyperbolicTangent/HyperbolicTangent.php
index b9ef77071..1723db90d 100644
--- a/src/NeuralNet/ActivationFunctions/HyperbolicTangent/HyperbolicTangent.php
+++ b/src/NeuralNet/ActivationFunctions/HyperbolicTangent/HyperbolicTangent.php
@@ -45,10 +45,8 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $x) : NDArray
     {
-        // Calculate tanh^2(x)
         $squared = NumPower::pow($x, 2);
 
-        // Calculate 1 - tanh^2(x)
         return NumPower::subtract(1.0, $squared);
     }
 
diff --git a/src/NeuralNet/ActivationFunctions/LeakyReLU/LeakyReLU.php b/src/NeuralNet/ActivationFunctions/LeakyReLU/LeakyReLU.php
index bace16275..07f45e48c 100644
--- a/src/NeuralNet/ActivationFunctions/LeakyReLU/LeakyReLU.php
+++ b/src/NeuralNet/ActivationFunctions/LeakyReLU/LeakyReLU.php
@@ -63,16 +63,13 @@ public function __construct(float $leakage = 0.1)
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate positive part: x for x > 0
         $positiveActivation = NumPower::maximum($input, 0);
 
-        // Calculate negative part: leakage * x for x <= 0
         $negativeActivation = NumPower::multiply(
             NumPower::minimum($input, 0),
             $this->leakage
         );
 
-        // Combine both parts
         return NumPower::add($positiveActivation, $negativeActivation);
     }
 
@@ -87,16 +84,13 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input) : NDArray
     {
-        // For x > 0: 1
         $positivePart = NumPower::greater($input, 0);
 
-        // For x <= 0: leakage
         $negativePart = NumPower::multiply(
             NumPower::lessEqual($input, 0),
             $this->leakage
         );
 
-        // Combine both parts
         return NumPower::add($positivePart, $negativePart);
     }
 
diff --git a/src/NeuralNet/ActivationFunctions/ReLU6/ReLU6.php b/src/NeuralNet/ActivationFunctions/ReLU6/ReLU6.php
index 26ab6a850..89daf2faa 100644
--- a/src/NeuralNet/ActivationFunctions/ReLU6/ReLU6.php
+++ b/src/NeuralNet/ActivationFunctions/ReLU6/ReLU6.php
@@ -37,10 +37,8 @@ class ReLU6 implements ActivationFunction, IBufferDerivative
      */
     public function activate(NDArray $input) : NDArray
     {
-        // First apply ReLU: max(0, x)
         $reluActivation = NumPower::maximum($input, 0.0);
 
-        // Then cap at 6: min(relu(x), 6)
         return NumPower::minimum($reluActivation, 6.0);
     }
 
@@ -54,11 +52,9 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input) : NDArray
     {
-        // 1 where 0 < x < 6, 0 elsewhere
         $greaterThanZero = NumPower::greater($input, 0.0);
         $lessThanSix = NumPower::less($input, 6.0);
 
-        // Combine conditions with logical AND
         return NumPower::multiply($greaterThanZero, $lessThanSix);
     }
 
diff --git a/src/NeuralNet/ActivationFunctions/SELU/SELU.php b/src/NeuralNet/ActivationFunctions/SELU/SELU.php
index 63ccc332d..d96216d59 100644
--- a/src/NeuralNet/ActivationFunctions/SELU/SELU.php
+++ b/src/NeuralNet/ActivationFunctions/SELU/SELU.php
@@ -59,20 +59,17 @@ class SELU implements ActivationFunction, IBufferDerivative
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate positive part: λ * x for x > 0
         $positive = NumPower::multiply(
             NumPower::maximum($input, 0),
             self::LAMBDA
         );
 
-        // Calculate negative part: λ * α * (e^x - 1) for x <= 0
         $negativeMask = NumPower::minimum($input, 0);
         $negative = NumPower::multiply(
             NumPower::expm1($negativeMask),
             self::BETA
         );
 
-        // Combine both parts
         return NumPower::add($positive, $negative);
     }
 
@@ -87,11 +84,9 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input) : NDArray
     {
-        // For x > 0: λ
         $positiveMask = NumPower::greater($input, 0);
         $positivePart = NumPower::multiply($positiveMask, self::LAMBDA);
 
-        // For x <= 0: λ * α * e^x
         $negativeMask = NumPower::lessEqual($input, 0);
         $negativePart = NumPower::multiply(
             NumPower::multiply(
@@ -103,7 +98,6 @@ public function differentiate(NDArray $input) : NDArray
             $negativeMask
         );
 
-        // Combine both parts
         return NumPower::add($positivePart, $negativePart);
     }
 
diff --git a/src/NeuralNet/ActivationFunctions/SiLU/SiLU.php b/src/NeuralNet/ActivationFunctions/SiLU/SiLU.php
index 9efc0cec8..89985e831 100644
--- a/src/NeuralNet/ActivationFunctions/SiLU/SiLU.php
+++ b/src/NeuralNet/ActivationFunctions/SiLU/SiLU.php
@@ -52,10 +52,8 @@ public function __construct()
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate sigmoid(x) using the Sigmoid activation function
         $sigmoid = $this->sigmoid->activate($input);
 
-        // Calculate x * sigmoid(x)
         return NumPower::multiply($input, $sigmoid);
     }
 
@@ -70,16 +68,10 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input) : NDArray
     {
-        // Calculate sigmoid(x) using the Sigmoid activation function
         $sigmoid = $this->sigmoid->activate($input);
-
-        // Calculate sigmoid'(x) = sigmoid(x) * (1 - sigmoid(x))
         $sigmoidDerivative = $this->sigmoid->differentiate($sigmoid);
-
-        // Calculate x * sigmoid'(x)
         $xTimesSigmoidDerivative = NumPower::multiply($input, $sigmoidDerivative);
 
-        // Calculate sigmoid(x) + x * sigmoid'(x)
         return NumPower::add($sigmoid, $xTimesSigmoidDerivative);
     }
 
diff --git a/src/NeuralNet/ActivationFunctions/Sigmoid/Sigmoid.php b/src/NeuralNet/ActivationFunctions/Sigmoid/Sigmoid.php
index 7dce11669..282170bb3 100644
--- a/src/NeuralNet/ActivationFunctions/Sigmoid/Sigmoid.php
+++ b/src/NeuralNet/ActivationFunctions/Sigmoid/Sigmoid.php
@@ -33,13 +33,9 @@ class Sigmoid implements ActivationFunction, OBufferDerivative
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate e^(-x)
         $negExp = NumPower::exp(NumPower::multiply($input, -1.0));
-
-        // Calculate 1 + e^(-x)
         $denominator = NumPower::add(1.0, $negExp);
 
-        // Calculate 1 / (1 + e^(-x))
         return NumPower::divide(1.0, $denominator);
     }
 
@@ -55,10 +51,8 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $output) : NDArray
     {
-        // Calculate (1 - output)
         $oneMinusOutput = NumPower::subtract(1.0, $output);
 
-        // Calculate output * (1 - output)
         return NumPower::multiply($output, $oneMinusOutput);
     }
 
diff --git a/src/NeuralNet/ActivationFunctions/Softmax/Softmax.php b/src/NeuralNet/ActivationFunctions/Softmax/Softmax.php
index ed557e4f9..0b7064819 100644
--- a/src/NeuralNet/ActivationFunctions/Softmax/Softmax.php
+++ b/src/NeuralNet/ActivationFunctions/Softmax/Softmax.php
@@ -77,13 +77,9 @@ public function activate(NDArray $input) : NDArray
     public function differentiate(NDArray $output) : NDArray
     {
         // Get the softmax output as a 1D PHP array
-        $s = NumPower::flatten($output)->toArray();
-
-        // Create a diagonal matrix from the softmax values
-        $diag = NumPower::diag(NumPower::array($s));
-
-        // Create outer product of softmax vector with itself
-        $outer = NumPower::outer(NumPower::array($s), NumPower::array($s));
+        $softmax = NumPower::flatten($output)->toArray();
+        $diag = NumPower::diag(NumPower::array($softmax));
+        $outer = NumPower::outer(NumPower::array($softmax), NumPower::array($softmax));
 
         // Jacobian: diag(s) - outer(s, s)
         return NumPower::subtract($diag, $outer);
diff --git a/src/NeuralNet/ActivationFunctions/Softplus/Softplus.php b/src/NeuralNet/ActivationFunctions/Softplus/Softplus.php
index 4abf47641..b9a1ff625 100644
--- a/src/NeuralNet/ActivationFunctions/Softplus/Softplus.php
+++ b/src/NeuralNet/ActivationFunctions/Softplus/Softplus.php
@@ -35,13 +35,9 @@ class Softplus implements ActivationFunction, IBufferDerivative
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate e^x
         $exp = NumPower::exp($input);
-
-        // Calculate 1 + e^x
         $onePlusExp = NumPower::add(1.0, $exp);
 
-        // Calculate log(1 + e^x)
         return NumPower::log($onePlusExp);
     }
 
@@ -55,13 +51,9 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input) : NDArray
     {
-        // Calculate e^(-x)
         $negExp = NumPower::exp(NumPower::multiply($input, -1.0));
-
-        // Calculate 1 + e^(-x)
         $denominator = NumPower::add(1.0, $negExp);
 
-        // Calculate 1 / (1 + e^(-x))
         return NumPower::divide(1.0, $denominator);
     }
 
diff --git a/src/NeuralNet/ActivationFunctions/Softsign/Softsign.php b/src/NeuralNet/ActivationFunctions/Softsign/Softsign.php
index 77a28c5c2..6921d7d57 100644
--- a/src/NeuralNet/ActivationFunctions/Softsign/Softsign.php
+++ b/src/NeuralNet/ActivationFunctions/Softsign/Softsign.php
@@ -36,13 +36,9 @@ class Softsign implements ActivationFunction, IBufferDerivative
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate |x|
         $absInput = NumPower::abs($input);
-
-        // Calculate 1 + |x|
         $denominator = NumPower::add(1.0, $absInput);
 
-        // Calculate x / (1 + |x|)
         return NumPower::divide($input, $denominator);
     }
 
@@ -56,16 +52,10 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input) : NDArray
     {
-        // Calculate |x|
         $absInput = NumPower::abs($input);
-
-        // Calculate 1 + |x|
         $onePlusAbs = NumPower::add(1.0, $absInput);
-
-        // Calculate (1 + |x|)²
         $denominator = NumPower::multiply($onePlusAbs, $onePlusAbs);
 
-        // Calculate 1 / (1 + |x|)²
         return NumPower::divide(1.0, $denominator);
     }
 
diff --git a/src/NeuralNet/ActivationFunctions/ThresholdedReLU/ThresholdedReLU.php b/src/NeuralNet/ActivationFunctions/ThresholdedReLU/ThresholdedReLU.php
index 483744025..8924d9c57 100644
--- a/src/NeuralNet/ActivationFunctions/ThresholdedReLU/ThresholdedReLU.php
+++ b/src/NeuralNet/ActivationFunctions/ThresholdedReLU/ThresholdedReLU.php
@@ -61,10 +61,8 @@ public function __construct(float $threshold = 1.0)
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Create a mask where input > threshold
         $mask = NumPower::greater($input, $this->threshold);
 
-        // Apply the mask to the input
         return NumPower::multiply($input, $mask);
     }
 
@@ -78,7 +76,6 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input) : NDArray
     {
-        // The derivative is 1 where input > threshold, 0 otherwise
         return NumPower::greater($input, $this->threshold);
     }