Skip to content

Removed unneeded comment #382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions docs/neural-network/activation-functions/elu.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,6 @@ $$
|---|---|---|---|---|
| 1 | alpha | 1.0 | float | The value at which leakage will begin to saturate. Ex. alpha = 1.0 means that the output will never be less than -1.0 when inactivated. |

## Size and Performance
ELU is a simple function and is well-suited for deployment on resource-constrained devices or when working with large neural networks.

## Plots
<img src="../../images/activation-functions/elu.png" alt="ELU Function" width="500" height="auto">

Expand Down
3 changes: 0 additions & 3 deletions docs/neural-network/activation-functions/gelu.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,6 @@ $$
## Parameters
This activation function does not have any parameters.

## Size and Performance
GELU is computationally more expensive than simpler activation functions like ReLU due to its use of hyperbolic tangent and exponential calculations. The implementation uses an approximation formula to improve performance, but it still requires more computational resources. Despite this cost, GELU has gained popularity in transformer architectures and other deep learning models due to its favorable properties for training deep networks.

## Plots
<img src="../../images/activation-functions/gelu.png" alt="GELU Function" width="500" height="auto">

Expand Down
3 changes: 0 additions & 3 deletions docs/neural-network/activation-functions/hard-sigmoid.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,6 @@ $$
## Parameters
This activation function does not have any parameters.

## Size and Performance
Hard Sigmoid has a minimal memory footprint compared to the standard Sigmoid function, as it uses simple arithmetic operations (multiplication, addition) and comparisons instead of expensive exponential calculations. This makes it particularly well-suited for mobile and embedded applications or when computational resources are limited.

## Plots
<img src="../../images/activation-functions/hard-sigmoid.png" alt="Hard Sigmoid Function" width="500" height="auto">

Expand Down
3 changes: 0 additions & 3 deletions docs/neural-network/activation-functions/hard-silu.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@ $$
## Parameters
This activation function does not have any parameters.

## Size and Performance
Hard SiLU is designed to be computationally efficient compared to the standard SiLU (Swish) activation function. By using the piecewise linear Hard Sigmoid approximation instead of the standard Sigmoid function, it reduces the computational complexity while maintaining similar functional properties. This makes it particularly suitable for deployment on resource-constrained devices or when working with large neural networks.

## Plots
<img src="../../images/activation-functions/hard-silu.png" alt="Hard SiLU Function" width="500" height="auto">

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,6 @@ $$
## Parameters
This activation function does not have any parameters.

## Size and Performance
Hyperbolic Tangent requires more computational resources compared to simpler activation functions like ReLU due to its exponential calculations. While not as computationally efficient as piecewise linear functions, it provides important zero-centered outputs that can be critical for certain network architectures, particularly in recurrent neural networks where gradient flow is important.

## Plots
<img src="../../images/activation-functions/hyperbolic-tangent.png" alt="Hyperbolic Tangent Function" width="500" height="auto">

Expand Down
3 changes: 0 additions & 3 deletions docs/neural-network/activation-functions/leaky-relu.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,6 @@ $$
|---|---|---|---|---|
| 1 | leakage | 0.1 | float | The amount of leakage as a proportion of the input value to allow to pass through when not inactivated. |

## Size and Performance
Leaky ReLU is computationally efficient, requiring only simple comparison operations and multiplication. It has a minimal memory footprint and executes quickly compared to more complex activation functions that use exponential or hyperbolic calculations. The leakage parameter allows for a small gradient when the unit is not active, which helps prevent the "dying ReLU" problem while maintaining the computational efficiency of the standard ReLU function.

## Plots
<img src="../../images/activation-functions/leaky-relu.png" alt="Leaky ReLU Function" width="500" height="auto">

Expand Down
3 changes: 0 additions & 3 deletions docs/neural-network/activation-functions/relu.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,6 @@ $$
## Parameters
This activation function does not have any parameters.

## Size and Performance
ReLU is one of the most computationally efficient activation functions, requiring only a simple comparison operation and conditional assignment. It has minimal memory requirements and executes very quickly compared to activation functions that use exponential or hyperbolic calculations. This efficiency makes ReLU particularly well-suited for deep networks with many layers, where the computational savings compound significantly.

## Plots
<img src="../../images/activation-functions/relu.png" alt="ReLU Function" width="500" height="auto">

Expand Down
3 changes: 0 additions & 3 deletions docs/neural-network/activation-functions/relu6.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,6 @@ $$
## Parameters
This activation function does not have any parameters.

## Size and Performance
ReLU6 maintains the computational efficiency of standard ReLU while adding an upper bound check. It requires only simple comparison operations and conditional assignments. The additional upper bound check adds minimal computational overhead compared to standard ReLU, while providing benefits for quantization and numerical stability. This makes ReLU6 particularly well-suited for mobile and embedded applications where model size and computational efficiency are critical.

## Plots
<img src="../../images/activation-functions/relu6.png" alt="ReLU6 Function" width="500" height="auto">

Expand Down
3 changes: 0 additions & 3 deletions docs/neural-network/activation-functions/selu.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,6 @@ Where the constants are typically:
## Parameters
This actvation function does not have any parameters.

## Size and Performance
SELU is computationally more expensive than simpler activation functions like ReLU due to its use of exponential calculations for negative inputs. However, it offers significant benefits by enabling self-normalization, which can eliminate the need for additional normalization layers like Batch Normalization. This trade-off often results in better overall network performance and potentially simpler network architectures. The self-normalizing property of SELU can lead to faster convergence during training and more stable gradients, which may reduce the total computational cost of training deep networks despite the higher per-activation computational cost.

## Plots
<img src="../../images/activation-functions/selu.png" alt="SELU Function" width="500" height="auto">

Expand Down
3 changes: 0 additions & 3 deletions docs/neural-network/activation-functions/sigmoid.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,6 @@ $$
## Parameters
This activation function does not have any parameters.

## Size and Performance
Sigmoid is computationally more expensive than simpler activation functions like ReLU due to its use of exponential calculations. It requires computing an exponential term and a division operation for each neuron activation. For deep networks, this computational cost can become significant. Additionally, sigmoid activations can cause the vanishing gradient problem during backpropagation when inputs are large in magnitude, potentially slowing down training. Despite these limitations, sigmoid remains valuable in output layers of networks performing binary classification or when probability interpretations are needed.

## Plots
<img src="../../images/activation-functions/sigmoid.png" alt="Sigmoid Function" width="500" height="auto">

Expand Down
3 changes: 0 additions & 3 deletions docs/neural-network/activation-functions/silu.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,6 @@ Where
## Parameters
This activation function does not have any parameters.

## Size and Performance
SiLU is computationally more expensive than simpler activation functions like ReLU due to its use of exponential calculations from the sigmoid component. Each activation requires computing an exponential term and a division operation. However, SiLU offers improved performance in deep learning models, particularly in computer vision and natural language processing tasks, which can justify the additional computational cost. The smooth, non-monotonic nature of SiLU helps with gradient flow during training, potentially leading to faster convergence and better overall model performance despite the higher per-activation computational cost.

## Plots
<img src="../../images/activation-functions/silu.png" alt="SiLU Function" width="500" height="auto">

Expand Down
3 changes: 0 additions & 3 deletions docs/neural-network/activation-functions/softmax.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,6 @@ Where:
## Parameters
This activation function does not have any parameters.

## Size and Performance
Softmax is computationally more expensive than many other activation functions due to its need to process all neurons in a layer collectively rather than independently. It requires exponential calculations for each neuron, followed by a normalization step that involves summing all exponential values and dividing each by this sum. This creates a computational dependency between all neurons in the layer. Despite this cost, Softmax is essential for multi-class classification output layers where probability distributions are required. The implementation uses optimized matrix operations to improve performance, but the computational complexity still scales with the number of neurons in the layer.

## Plots
<img src="../../images/activation-functions/softmax.png" alt="Softmax Function" width="500" height="auto">

Expand Down
3 changes: 0 additions & 3 deletions docs/neural-network/activation-functions/softplus.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,6 @@ $$
## Parameters
This activation function does not have any parameters.

## Size and Performance
Softplus is computationally more expensive than ReLU due to its use of both exponential and logarithmic calculations. Each activation requires computing an exponential term, an addition, and a logarithm. This makes Softplus significantly more resource-intensive than simpler activation functions, especially in large networks. However, Softplus provides a smooth, differentiable alternative to ReLU with no zero-gradient regions, which can improve gradient flow during training for certain types of networks. The trade-off between computational cost and the benefits of smoothness should be considered when choosing between Softplus and ReLU.

## Plots
<img src="../../images/activation-functions/softplus.png" alt="Softplus Function" width="500" height="auto">

Expand Down
3 changes: 0 additions & 3 deletions docs/neural-network/activation-functions/softsign.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,6 @@ $$
## Parameters
This activation function does not have any parameters.

## Size and Performance
Softsign is computationally more efficient than functions that use exponential calculations (like Sigmoid or Tanh), as it only requires an absolute value operation and basic arithmetic. However, it is slightly more expensive than ReLU due to the division operation. Softsign has the advantage of not saturating as quickly as Tanh, which can help with the vanishing gradient problem in deep networks. The smoother gradients of Softsign can lead to more stable training dynamics, though this comes at a small computational cost compared to simpler activation functions.

## Plots
<img src="../../images/activation-functions/softsign.png" alt="Softsign Function" width="500" height="auto">

Expand Down
3 changes: 0 additions & 3 deletions docs/neural-network/activation-functions/thresholded-relu.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,6 @@ $$
|---|---|---|---|---|
| 1 | threshold | 1.0 | float | The threshold at which the neuron is activated. |

## Size and Performance
Thresholded ReLU maintains the computational efficiency of standard ReLU while adding a threshold comparison. It requires only a simple comparison operation against the threshold value and conditional assignment. This makes it nearly as efficient as standard ReLU with minimal additional computational overhead. The threshold parameter allows for controlling neuron sparsity, which can be beneficial for reducing overfitting and improving generalization in certain network architectures. By adjusting the threshold, you can fine-tune the balance between network capacity and regularization without significantly impacting computational performance.

## Plots
<img src="../../images/activation-functions/thresholded-relu.png" alt="Thresholded ReLU Function" width="500" height="auto">

Expand Down
6 changes: 0 additions & 6 deletions src/NeuralNet/ActivationFunctions/ELU/ELU.php
Original file line number Diff line number Diff line change
Expand Up @@ -56,17 +56,14 @@ public function __construct(protected float $alpha = 1.0)
*/
public function activate(NDArray $input) : NDArray
{
// Calculate positive part: x for x > 0
$positiveActivation = NumPower::maximum($input, 0);

// Calculate negative part: alpha * (e^x - 1) for x <= 0
$negativeMask = NumPower::minimum($input, 0);
$negativeActivation = NumPower::multiply(
NumPower::expm1($negativeMask),
$this->alpha
);

// Combine both parts
return NumPower::add($positiveActivation, $negativeActivation);
}

Expand All @@ -82,17 +79,14 @@ public function activate(NDArray $input) : NDArray
*/
public function differentiate(NDArray $input, NDArray $output) : NDArray
{
// For x > 0: 1
$positiveMask = NumPower::greater($input, 0);

// For x <= 0: output + α
$negativeMask = NumPower::lessEqual($input, 0);
$negativePart = NumPower::multiply(
NumPower::add($output, $this->alpha),
$negativeMask
);

// Combine both parts
return NumPower::add($positiveMask, $negativePart);
}

Expand Down
23 changes: 2 additions & 21 deletions src/NeuralNet/ActivationFunctions/GELU/GELU.php
Original file line number Diff line number Diff line change
Expand Up @@ -55,24 +55,11 @@ class GELU implements ActivationFunction, IBufferDerivative
*/
public function activate(NDArray $input) : NDArray
{
// Calculate x^3
$cubed = NumPower::pow($input, 3);

// Calculate inner term: x + BETA * x^3
$innerTerm = NumPower::add(
$input,
NumPower::multiply($cubed, self::BETA)
);

// Apply tanh(ALPHA * innerTerm)
$tanhTerm = NumPower::tanh(
NumPower::multiply($innerTerm, self::ALPHA)
);

// Calculate 1 + tanhTerm
$innerTerm = NumPower::add($input, NumPower::multiply($cubed, self::BETA));
$tanhTerm = NumPower::tanh(NumPower::multiply($innerTerm, self::ALPHA));
$onePlusTanh = NumPower::add(1.0, $tanhTerm);

// Calculate 0.5 * x * (1 + tanhTerm)
return NumPower::multiply(
NumPower::multiply($input, $onePlusTanh),
0.5
Expand All @@ -96,10 +83,8 @@ public function activate(NDArray $input) : NDArray
*/
public function differentiate(NDArray $input) : NDArray
{
// Calculate x^3
$cubed = NumPower::pow($input, 3);

// Calculate inner term: ALPHA * (x + BETA * x^3)
$innerTerm = NumPower::multiply(
NumPower::add(
$input,
Expand All @@ -108,20 +93,17 @@ public function differentiate(NDArray $input) : NDArray
self::ALPHA
);

// Calculate cosh and sech^2
$cosh = NumPower::cosh($innerTerm);
$sech2 = NumPower::pow(
NumPower::divide(1.0, $cosh),
2
);

// Calculate 0.5 * (1 + tanh(innerTerm))
$firstTerm = NumPower::multiply(
NumPower::add(1.0, NumPower::tanh($innerTerm)),
0.5
);

// Calculate 0.5 * x * sech^2 * ALPHA * (1 + 3 * BETA * x^2)
$secondTerm = NumPower::multiply(
NumPower::multiply(
NumPower::multiply(
Expand All @@ -139,7 +121,6 @@ public function differentiate(NDArray $input) : NDArray
)
);

// Combine terms
return NumPower::add($firstTerm, $secondTerm);
}

Expand Down
8 changes: 0 additions & 8 deletions src/NeuralNet/ActivationFunctions/HardSiLU/HardSiLU.php
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,8 @@ public function __construct()
*/
public function activate(NDArray $input) : NDArray
{
// Calculate HardSigmoid(x)
$hardSigmoid = $this->hardSigmoid->activate($input);

// Calculate x * HardSigmoid(x)
return NumPower::multiply($input, $hardSigmoid);
}

Expand All @@ -68,16 +66,10 @@ public function activate(NDArray $input) : NDArray
*/
public function differentiate(NDArray $input) : NDArray
{
// Calculate HardSigmoid(x)
$hardSigmoid = $this->hardSigmoid->activate($input);

// Calculate HardSigmoid'(x)
$hardSigmoidDerivative = $this->hardSigmoid->differentiate($input);

// Calculate x * HardSigmoid'(x)
$xTimesDerivative = NumPower::multiply($input, $hardSigmoidDerivative);

// Calculate HardSigmoid(x) + x * HardSigmoid'(x)
return NumPower::add($hardSigmoid, $xTimesDerivative);
}

Expand Down
7 changes: 0 additions & 7 deletions src/NeuralNet/ActivationFunctions/HardSigmoid/HardSigmoid.php
Original file line number Diff line number Diff line change
Expand Up @@ -63,13 +63,11 @@ class HardSigmoid implements ActivationFunction, IBufferDerivative
*/
public function activate(NDArray $input) : NDArray
{
// Calculate 0.2 * x + 0.5
$linear = NumPower::add(
NumPower::multiply($input, self::SLOPE),
self::INTERCEPT
);

// Clip values between 0 and 1
return NumPower::clip($linear, 0.0, 1.0);
}

Expand All @@ -89,11 +87,6 @@ public function differentiate(NDArray $input) : NDArray
$inLinearRegion = NumPower::multiply($inLinearRegion, NumPower::lessEqual($input, self::UPPER_BOUND));
$linearPart = NumPower::multiply($inLinearRegion, self::SLOPE);

// For values outside the linear region: 0
// Since we're multiplying by 0 for these regions, we don't need to explicitly handle them
// The mask $inLinearRegion already contains 0s for x <= -2.5 and x >= 2.5,
// so when we multiply by SLOPE, those values remain 0 in the result

return $linearPart;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,8 @@ public function activate(NDArray $input) : NDArray
*/
public function differentiate(NDArray $x) : NDArray
{
// Calculate tanh^2(x)
$squared = NumPower::pow($x, 2);

// Calculate 1 - tanh^2(x)
return NumPower::subtract(1.0, $squared);
}

Expand Down
6 changes: 0 additions & 6 deletions src/NeuralNet/ActivationFunctions/LeakyReLU/LeakyReLU.php
Original file line number Diff line number Diff line change
Expand Up @@ -63,16 +63,13 @@ public function __construct(float $leakage = 0.1)
*/
public function activate(NDArray $input) : NDArray
{
// Calculate positive part: x for x > 0
$positiveActivation = NumPower::maximum($input, 0);

// Calculate negative part: leakage * x for x <= 0
$negativeActivation = NumPower::multiply(
NumPower::minimum($input, 0),
$this->leakage
);

// Combine both parts
return NumPower::add($positiveActivation, $negativeActivation);
}

Expand All @@ -87,16 +84,13 @@ public function activate(NDArray $input) : NDArray
*/
public function differentiate(NDArray $input) : NDArray
{
// For x > 0: 1
$positivePart = NumPower::greater($input, 0);

// For x <= 0: leakage
$negativePart = NumPower::multiply(
NumPower::lessEqual($input, 0),
$this->leakage
);

// Combine both parts
return NumPower::add($positivePart, $negativePart);
}

Expand Down
4 changes: 0 additions & 4 deletions src/NeuralNet/ActivationFunctions/ReLU6/ReLU6.php
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,8 @@ class ReLU6 implements ActivationFunction, IBufferDerivative
*/
public function activate(NDArray $input) : NDArray
{
// First apply ReLU: max(0, x)
$reluActivation = NumPower::maximum($input, 0.0);

// Then cap at 6: min(relu(x), 6)
return NumPower::minimum($reluActivation, 6.0);
}

Expand All @@ -54,11 +52,9 @@ public function activate(NDArray $input) : NDArray
*/
public function differentiate(NDArray $input) : NDArray
{
// 1 where 0 < x < 6, 0 elsewhere
$greaterThanZero = NumPower::greater($input, 0.0);
$lessThanSix = NumPower::less($input, 6.0);

// Combine conditions with logical AND
return NumPower::multiply($greaterThanZero, $lessThanSix);
}

Expand Down
Loading
Loading