RubixML · apphp · Aug 14, 2025 · Aug 6, 2025 · Aug 12, 2025 · Aug 12, 2025
diff --git a/docs/neural-network/activation-functions/elu.md b/docs/neural-network/activation-functions/elu.md
@@ -16,9 +16,6 @@ $$
 |---|---|---|---|---|
 | 1 | alpha | 1.0 | float | The value at which leakage will begin to saturate. Ex. alpha = 1.0 means that the output will never be less than -1.0 when inactivated. |
 
-## Size and Performance
-ELU is a simple function and is well-suited for deployment on resource-constrained devices or when working with large neural networks.
-
 ## Plots
 <img src="../../images/activation-functions/elu.png" alt="ELU Function" width="500" height="auto">
 

diff --git a/docs/neural-network/activation-functions/gelu.md b/docs/neural-network/activation-functions/gelu.md
@@ -10,9 +10,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-GELU is computationally more expensive than simpler activation functions like ReLU due to its use of hyperbolic tangent and exponential calculations. The implementation uses an approximation formula to improve performance, but it still requires more computational resources. Despite this cost, GELU has gained popularity in transformer architectures and other deep learning models due to its favorable properties for training deep networks.
-
 ## Plots
 <img src="../../images/activation-functions/gelu.png" alt="GELU Function" width="500" height="auto">
 

diff --git a/docs/neural-network/activation-functions/hard-sigmoid.md b/docs/neural-network/activation-functions/hard-sigmoid.md
@@ -10,9 +10,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-Hard Sigmoid has a minimal memory footprint compared to the standard Sigmoid function, as it uses simple arithmetic operations (multiplication, addition) and comparisons instead of expensive exponential calculations. This makes it particularly well-suited for mobile and embedded applications or when computational resources are limited.
-
 ## Plots
 <img src="../../images/activation-functions/hard-sigmoid.png" alt="Hard Sigmoid Function" width="500" height="auto">
 

diff --git a/docs/neural-network/activation-functions/hard-silu.md b/docs/neural-network/activation-functions/hard-silu.md
@@ -12,9 +12,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-Hard SiLU is designed to be computationally efficient compared to the standard SiLU (Swish) activation function. By using the piecewise linear Hard Sigmoid approximation instead of the standard Sigmoid function, it reduces the computational complexity while maintaining similar functional properties. This makes it particularly suitable for deployment on resource-constrained devices or when working with large neural networks.
-
 ## Plots
 <img src="../../images/activation-functions/hard-silu.png" alt="Hard SiLU Function" width="500" height="auto">
 

diff --git a/docs/neural-network/activation-functions/hyperbolic-tangent.md b/docs/neural-network/activation-functions/hyperbolic-tangent.md
@@ -10,9 +10,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-Hyperbolic Tangent requires more computational resources compared to simpler activation functions like ReLU due to its exponential calculations. While not as computationally efficient as piecewise linear functions, it provides important zero-centered outputs that can be critical for certain network architectures, particularly in recurrent neural networks where gradient flow is important.
-
 ## Plots
 <img src="../../images/activation-functions/hyperbolic-tangent.png" alt="Hyperbolic Tangent Function" width="500" height="auto">
 

diff --git a/docs/neural-network/activation-functions/leaky-relu.md b/docs/neural-network/activation-functions/leaky-relu.md
@@ -16,9 +16,6 @@ $$
 |---|---|---|---|---|
 | 1 | leakage | 0.1 | float | The amount of leakage as a proportion of the input value to allow to pass through when not inactivated. |
 
-## Size and Performance
-Leaky ReLU is computationally efficient, requiring only simple comparison operations and multiplication. It has a minimal memory footprint and executes quickly compared to more complex activation functions that use exponential or hyperbolic calculations. The leakage parameter allows for a small gradient when the unit is not active, which helps prevent the "dying ReLU" problem while maintaining the computational efficiency of the standard ReLU function.
-
 ## Plots
 <img src="../../images/activation-functions/leaky-relu.png" alt="Leaky ReLU Function" width="500" height="auto">
 

diff --git a/docs/neural-network/activation-functions/relu.md b/docs/neural-network/activation-functions/relu.md
@@ -14,9 +14,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-ReLU is one of the most computationally efficient activation functions, requiring only a simple comparison operation and conditional assignment. It has minimal memory requirements and executes very quickly compared to activation functions that use exponential or hyperbolic calculations. This efficiency makes ReLU particularly well-suited for deep networks with many layers, where the computational savings compound significantly.
-
 ## Plots
 <img src="../../images/activation-functions/relu.png" alt="ReLU Function" width="500" height="auto">
 

diff --git a/docs/neural-network/activation-functions/relu6.md b/docs/neural-network/activation-functions/relu6.md
@@ -10,9 +10,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-ReLU6 maintains the computational efficiency of standard ReLU while adding an upper bound check. It requires only simple comparison operations and conditional assignments. The additional upper bound check adds minimal computational overhead compared to standard ReLU, while providing benefits for quantization and numerical stability. This makes ReLU6 particularly well-suited for mobile and embedded applications where model size and computational efficiency are critical.
-
 ## Plots
 <img src="../../images/activation-functions/relu6.png" alt="ReLU6 Function" width="500" height="auto">
 

diff --git a/docs/neural-network/activation-functions/selu.md b/docs/neural-network/activation-functions/selu.md
@@ -18,9 +18,6 @@ Where the constants are typically:
 ## Parameters
 This actvation function does not have any parameters.
 
-## Size and Performance
-SELU is computationally more expensive than simpler activation functions like ReLU due to its use of exponential calculations for negative inputs. However, it offers significant benefits by enabling self-normalization, which can eliminate the need for additional normalization layers like Batch Normalization. This trade-off often results in better overall network performance and potentially simpler network architectures. The self-normalizing property of SELU can lead to faster convergence during training and more stable gradients, which may reduce the total computational cost of training deep networks despite the higher per-activation computational cost.
-
 ## Plots
 <img src="../../images/activation-functions/selu.png" alt="SELU Function" width="500" height="auto">
 

diff --git a/docs/neural-network/activation-functions/sigmoid.md b/docs/neural-network/activation-functions/sigmoid.md
@@ -10,9 +10,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-Sigmoid is computationally more expensive than simpler activation functions like ReLU due to its use of exponential calculations. It requires computing an exponential term and a division operation for each neuron activation. For deep networks, this computational cost can become significant. Additionally, sigmoid activations can cause the vanishing gradient problem during backpropagation when inputs are large in magnitude, potentially slowing down training. Despite these limitations, sigmoid remains valuable in output layers of networks performing binary classification or when probability interpretations are needed.
-
 ## Plots
 <img src="../../images/activation-functions/sigmoid.png" alt="Sigmoid Function" width="500" height="auto">
 

diff --git a/docs/neural-network/activation-functions/silu.md b/docs/neural-network/activation-functions/silu.md
@@ -13,9 +13,6 @@ Where
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-SiLU is computationally more expensive than simpler activation functions like ReLU due to its use of exponential calculations from the sigmoid component. Each activation requires computing an exponential term and a division operation. However, SiLU offers improved performance in deep learning models, particularly in computer vision and natural language processing tasks, which can justify the additional computational cost. The smooth, non-monotonic nature of SiLU helps with gradient flow during training, potentially leading to faster convergence and better overall model performance despite the higher per-activation computational cost.
-
 ## Plots
 <img src="../../images/activation-functions/silu.png" alt="SiLU Function" width="500" height="auto">
 

diff --git a/docs/neural-network/activation-functions/softmax.md b/docs/neural-network/activation-functions/softmax.md
@@ -16,9 +16,6 @@ Where:
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-Softmax is computationally more expensive than many other activation functions due to its need to process all neurons in a layer collectively rather than independently. It requires exponential calculations for each neuron, followed by a normalization step that involves summing all exponential values and dividing each by this sum. This creates a computational dependency between all neurons in the layer. Despite this cost, Softmax is essential for multi-class classification output layers where probability distributions are required. The implementation uses optimized matrix operations to improve performance, but the computational complexity still scales with the number of neurons in the layer.
-
 ## Plots
 <img src="../../images/activation-functions/softmax.png" alt="Softmax Function" width="500" height="auto">
 

diff --git a/docs/neural-network/activation-functions/softplus.md b/docs/neural-network/activation-functions/softplus.md
@@ -10,9 +10,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-Softplus is computationally more expensive than ReLU due to its use of both exponential and logarithmic calculations. Each activation requires computing an exponential term, an addition, and a logarithm. This makes Softplus significantly more resource-intensive than simpler activation functions, especially in large networks. However, Softplus provides a smooth, differentiable alternative to ReLU with no zero-gradient regions, which can improve gradient flow during training for certain types of networks. The trade-off between computational cost and the benefits of smoothness should be considered when choosing between Softplus and ReLU.
-
 ## Plots
 <img src="../../images/activation-functions/softplus.png" alt="Softplus Function" width="500" height="auto">
 

diff --git a/docs/neural-network/activation-functions/softsign.md b/docs/neural-network/activation-functions/softsign.md
@@ -10,9 +10,6 @@ $$
 ## Parameters
 This activation function does not have any parameters.
 
-## Size and Performance
-Softsign is computationally more efficient than functions that use exponential calculations (like Sigmoid or Tanh), as it only requires an absolute value operation and basic arithmetic. However, it is slightly more expensive than ReLU due to the division operation. Softsign has the advantage of not saturating as quickly as Tanh, which can help with the vanishing gradient problem in deep networks. The smoother gradients of Softsign can lead to more stable training dynamics, though this comes at a small computational cost compared to simpler activation functions.
-
 ## Plots
 <img src="../../images/activation-functions/softsign.png" alt="Softsign Function" width="500" height="auto">
 

diff --git a/docs/neural-network/activation-functions/thresholded-relu.md b/docs/neural-network/activation-functions/thresholded-relu.md
@@ -16,9 +16,6 @@ $$
 |---|---|---|---|---|
 | 1 | threshold | 1.0 | float | The threshold at which the neuron is activated. |
 
-## Size and Performance
-Thresholded ReLU maintains the computational efficiency of standard ReLU while adding a threshold comparison. It requires only a simple comparison operation against the threshold value and conditional assignment. This makes it nearly as efficient as standard ReLU with minimal additional computational overhead. The threshold parameter allows for controlling neuron sparsity, which can be beneficial for reducing overfitting and improving generalization in certain network architectures. By adjusting the threshold, you can fine-tune the balance between network capacity and regularization without significantly impacting computational performance.
-
 ## Plots
 <img src="../../images/activation-functions/thresholded-relu.png" alt="Thresholded ReLU Function" width="500" height="auto">
 

diff --git a/src/NeuralNet/ActivationFunctions/ELU/ELU.php b/src/NeuralNet/ActivationFunctions/ELU/ELU.php
@@ -56,17 +56,14 @@ public function __construct(protected float $alpha = 1.0)
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate positive part: x for x > 0
         $positiveActivation = NumPower::maximum($input, 0);
 
-        // Calculate negative part: alpha * (e^x - 1) for x <= 0
         $negativeMask = NumPower::minimum($input, 0);
         $negativeActivation = NumPower::multiply(
             NumPower::expm1($negativeMask),
             $this->alpha
         );
 
-        // Combine both parts
         return NumPower::add($positiveActivation, $negativeActivation);
     }
 
@@ -82,17 +79,14 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input, NDArray $output) : NDArray
     {
-        // For x > 0: 1
         $positiveMask = NumPower::greater($input, 0);
 
-        // For x <= 0: output + α
         $negativeMask = NumPower::lessEqual($input, 0);
         $negativePart = NumPower::multiply(
             NumPower::add($output, $this->alpha),
             $negativeMask
         );
 
-        // Combine both parts
         return NumPower::add($positiveMask, $negativePart);
     }
 

diff --git a/src/NeuralNet/ActivationFunctions/GELU/GELU.php b/src/NeuralNet/ActivationFunctions/GELU/GELU.php
@@ -55,24 +55,11 @@ class GELU implements ActivationFunction, IBufferDerivative
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate x^3
         $cubed = NumPower::pow($input, 3);
-
-        // Calculate inner term: x + BETA * x^3
-        $innerTerm = NumPower::add(
-            $input,
-            NumPower::multiply($cubed, self::BETA)
-        );
-
-        // Apply tanh(ALPHA * innerTerm)
-        $tanhTerm = NumPower::tanh(
-            NumPower::multiply($innerTerm, self::ALPHA)
-        );
-
-        // Calculate 1 + tanhTerm
+        $innerTerm = NumPower::add($input, NumPower::multiply($cubed, self::BETA));
+        $tanhTerm = NumPower::tanh(NumPower::multiply($innerTerm, self::ALPHA));
         $onePlusTanh = NumPower::add(1.0, $tanhTerm);
 
-        // Calculate 0.5 * x * (1 + tanhTerm)
         return NumPower::multiply(
             NumPower::multiply($input, $onePlusTanh),
             0.5
@@ -96,10 +83,8 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input) : NDArray
     {
-        // Calculate x^3
         $cubed = NumPower::pow($input, 3);
 
-        // Calculate inner term: ALPHA * (x + BETA * x^3)
         $innerTerm = NumPower::multiply(
             NumPower::add(
                 $input,
@@ -108,20 +93,17 @@ public function differentiate(NDArray $input) : NDArray
             self::ALPHA
         );
 
-        // Calculate cosh and sech^2
         $cosh = NumPower::cosh($innerTerm);
         $sech2 = NumPower::pow(
             NumPower::divide(1.0, $cosh),
             2
         );
 
-        // Calculate 0.5 * (1 + tanh(innerTerm))
         $firstTerm = NumPower::multiply(
             NumPower::add(1.0, NumPower::tanh($innerTerm)),
             0.5
         );
 
-        // Calculate 0.5 * x * sech^2 * ALPHA * (1 + 3 * BETA * x^2)
         $secondTerm = NumPower::multiply(
             NumPower::multiply(
                 NumPower::multiply(
@@ -139,7 +121,6 @@ public function differentiate(NDArray $input) : NDArray
             )
         );
 
-        // Combine terms
         return NumPower::add($firstTerm, $secondTerm);
     }
 

diff --git a/src/NeuralNet/ActivationFunctions/HardSiLU/HardSiLU.php b/src/NeuralNet/ActivationFunctions/HardSiLU/HardSiLU.php
@@ -51,10 +51,8 @@ public function __construct()
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate HardSigmoid(x)
         $hardSigmoid = $this->hardSigmoid->activate($input);
 
-        // Calculate x * HardSigmoid(x)
         return NumPower::multiply($input, $hardSigmoid);
     }
 
@@ -68,16 +66,10 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input) : NDArray
     {
-        // Calculate HardSigmoid(x)
         $hardSigmoid = $this->hardSigmoid->activate($input);
-
-        // Calculate HardSigmoid'(x)
         $hardSigmoidDerivative = $this->hardSigmoid->differentiate($input);
-
-        // Calculate x * HardSigmoid'(x)
         $xTimesDerivative = NumPower::multiply($input, $hardSigmoidDerivative);
 
-        // Calculate HardSigmoid(x) + x * HardSigmoid'(x)
         return NumPower::add($hardSigmoid, $xTimesDerivative);
     }
 

diff --git a/src/NeuralNet/ActivationFunctions/HardSigmoid/HardSigmoid.php b/src/NeuralNet/ActivationFunctions/HardSigmoid/HardSigmoid.php
@@ -63,13 +63,11 @@ class HardSigmoid implements ActivationFunction, IBufferDerivative
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate 0.2 * x + 0.5
         $linear = NumPower::add(
             NumPower::multiply($input, self::SLOPE),
             self::INTERCEPT
         );
 
-        // Clip values between 0 and 1
         return NumPower::clip($linear, 0.0, 1.0);
     }
 
@@ -89,11 +87,6 @@ public function differentiate(NDArray $input) : NDArray
         $inLinearRegion = NumPower::multiply($inLinearRegion, NumPower::lessEqual($input, self::UPPER_BOUND));
         $linearPart = NumPower::multiply($inLinearRegion, self::SLOPE);
 
-        // For values outside the linear region: 0
-        // Since we're multiplying by 0 for these regions, we don't need to explicitly handle them
-        // The mask $inLinearRegion already contains 0s for x <= -2.5 and x >= 2.5,
-        // so when we multiply by SLOPE, those values remain 0 in the result
-
         return $linearPart;
     }
 

diff --git a/src/NeuralNet/ActivationFunctions/HyperbolicTangent/HyperbolicTangent.php b/src/NeuralNet/ActivationFunctions/HyperbolicTangent/HyperbolicTangent.php
@@ -45,10 +45,8 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $x) : NDArray
     {
-        // Calculate tanh^2(x)
         $squared = NumPower::pow($x, 2);
 
-        // Calculate 1 - tanh^2(x)
         return NumPower::subtract(1.0, $squared);
     }
 

diff --git a/src/NeuralNet/ActivationFunctions/LeakyReLU/LeakyReLU.php b/src/NeuralNet/ActivationFunctions/LeakyReLU/LeakyReLU.php
@@ -63,16 +63,13 @@ public function __construct(float $leakage = 0.1)
      */
     public function activate(NDArray $input) : NDArray
     {
-        // Calculate positive part: x for x > 0
         $positiveActivation = NumPower::maximum($input, 0);
 
-        // Calculate negative part: leakage * x for x <= 0
         $negativeActivation = NumPower::multiply(
             NumPower::minimum($input, 0),
             $this->leakage
         );
 
-        // Combine both parts
         return NumPower::add($positiveActivation, $negativeActivation);
     }
 
@@ -87,16 +84,13 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input) : NDArray
     {
-        // For x > 0: 1
         $positivePart = NumPower::greater($input, 0);
 
-        // For x <= 0: leakage
         $negativePart = NumPower::multiply(
             NumPower::lessEqual($input, 0),
             $this->leakage
         );
 
-        // Combine both parts
         return NumPower::add($positivePart, $negativePart);
     }
 

diff --git a/src/NeuralNet/ActivationFunctions/ReLU6/ReLU6.php b/src/NeuralNet/ActivationFunctions/ReLU6/ReLU6.php
@@ -37,10 +37,8 @@ class ReLU6 implements ActivationFunction, IBufferDerivative
      */
     public function activate(NDArray $input) : NDArray
     {
-        // First apply ReLU: max(0, x)
         $reluActivation = NumPower::maximum($input, 0.0);
 
-        // Then cap at 6: min(relu(x), 6)
         return NumPower::minimum($reluActivation, 6.0);
     }
 
@@ -54,11 +52,9 @@ public function activate(NDArray $input) : NDArray
      */
     public function differentiate(NDArray $input) : NDArray
     {
-        // 1 where 0 < x < 6, 0 elsewhere
         $greaterThanZero = NumPower::greater($input, 0.0);
         $lessThanSix = NumPower::less($input, 6.0);
 
-        // Combine conditions with logical AND
         return NumPower::multiply($greaterThanZero, $lessThanSix);
     }