Skip to content

Commit d15a6d7

Browse files
committed
[feat] add new problem: 'elastic net regularization with gradient descent'
1 parent 4bb2901 commit d15a6d7

File tree

2 files changed

+167
-0
lines changed

2 files changed

+167
-0
lines changed
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Elastic Net Regression Using Gradient Descent
2+
3+
Elastic Net Regression combines both L1 (Lasso) and L2 (Ridge) regularization techniques to overcome the limitations of using either regularization method alone. It's particularly useful when dealing with datasets that have many correlated features.
4+
5+
## What is Elastic Net?
6+
7+
Elastic Net addresses two main issues:
8+
- **Lasso's limitation**: When features are highly correlated, Lasso tends to select only one feature from a group of correlated features arbitrarily
9+
- **Ridge's limitation**: Ridge regression doesn't perform feature selection (coefficients approach zero but never become exactly zero)
10+
11+
The goal of Elastic Net is to minimize the objective function:
12+
13+
$$J(w, b) = \underbrace{\frac{1}{2n} \sum_{i=1}^n\left( y_i - \left(\sum_{j=1}^pX_{ij}w_j+b\right)\right)^2}_{\text{MSE Loss}} + \underbrace{\alpha_1 \sum_{j=1}^p |w_j|}_{\text{L1 Regularization}} + \underbrace{\alpha_2 \sum_{j=1}^p w_j^2}_{\text{L2 Regularization}}$$
14+
15+
Where:
16+
* The first term is the **Mean Squared Error (MSE) Loss**: $\frac{1}{2n} \sum_{i=1}^n\left( y_i - \left(\sum_{j=1}^pX_{ij}w_j+b\right)\right)^2$
17+
* The second term is the **L1 Regularization** (Lasso penalty): $\alpha_1 \sum_{j=1}^p |w_j|$
18+
* The third term is the **L2 Regularization** (Ridge penalty): $\alpha_2 \sum_{j=1}^p w_j^2$
19+
* $\alpha_1$ controls the strength of L1 regularization
20+
* $\alpha_2$ controls the strength of L2 regularization
21+
22+
## Step-by-Step Implementation Guide
23+
24+
### 1. Initialize weights $w_j$ and bias $b$ to 0
25+
26+
### 2. Make Predictions
27+
At each iteration, calculate predictions using:
28+
$$\hat{y}_i = \sum_{j=1}^pX_{ij}w_j + b$$
29+
30+
Where:
31+
- $\hat{y}_i$ is the predicted value for the $i$-th sample
32+
- $X_{ij}$ is the value of the $i$-th sample's $j$-th feature
33+
- $w_j$ is the weight associated with the $j$-th feature
34+
35+
### 3. Calculate Residuals
36+
Find the difference between actual and predicted values: $error_i = \hat{y}_i - y_i$
37+
38+
### 4. Update Weights and Bias Using Gradients
39+
40+
**Gradient with respect to weights:**
41+
$$\frac{\partial J}{\partial w_j} = \frac{1}{n} \sum_{i=1}^nX_{ij}(\hat{y}_i - y_i) + \alpha_1 \cdot \text{sign}(w_j) + 2\alpha_2 \cdot w_j$$
42+
43+
**Gradient with respect to bias:**
44+
$$\frac{\partial J}{\partial b} = \frac{1}{n} \sum_{i=1}^n(\hat{y}_i - y_i)$$
45+
46+
**Update rules:**
47+
$$w_j = w_j - \eta \cdot \frac{\partial J}{\partial w_j}$$
48+
$$b = b - \eta \cdot \frac{\partial J}{\partial b}$$
49+
50+
Where $\eta$ is the learning rate.
51+
52+
### 5. Check for Convergence
53+
Repeat steps 2-4 until convergence. Convergence is determined by evaluating the L1 norm of the weight gradients:
54+
55+
$$||\nabla w||_1 = \sum_{j=1}^p \left|\frac{\partial J}{\partial w_j}\right|$$
56+
57+
If $||\nabla w||_1 < \text{tolerance}$, stop the algorithm.
58+
59+
### 6. Return the Final Weights and Bias
60+
61+
## Key Parameters
62+
63+
- **alpha1**: L1 regularization strength (promotes sparsity)
64+
- **alpha2**: L2 regularization strength (handles correlated features)
65+
- **learning_rate**: Step size for gradient descent
66+
- **max_iter**: Maximum number of iterations
67+
- **tol**: Convergence tolerance
68+
Path
69+
## Key Differences from Lasso and Ridge
70+
71+
1. **Lasso (L1 only)**: Tends to select one feature from correlated groups, can be unstable with small sample sizes
72+
2. **Ridge (L2 only)**: Keeps all features but shrinks coefficients, doesn't perform feature selection
73+
3. **Elastic Net (L1 + L2)**: Combines benefits of both - performs feature selection while handling correlated features better than Lasso alone
74+
75+
The balance between L1 and L2 regularization is controlled by the `alpha1` and `alpha2` parameters, allowing you to tune the model for your specific dataset characteristics.
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
import numpy as np
2+
3+
4+
def elastic_net_gradient_descent(
5+
X: np.ndarray,
6+
y: np.ndarray,
7+
alpha1: float = 0.1,
8+
alpha2: float = 0.1,
9+
learning_rate: float = 0.01,
10+
max_iter: int = 1000,
11+
tol: float = 1e-4,
12+
) -> tuple:
13+
"""
14+
Implement Elastic Net regression using gradient descent.
15+
16+
Parameters:
17+
X: Feature matrix (n_samples, n_features)
18+
y: Target values (n_samples,)
19+
alpha1: L1 regularization strength (Lasso)
20+
alpha2: L2 regularization strength (Ridge)
21+
learning_rate: Step size for gradient descent
22+
max_iter: Maximum number of iterations
23+
tol: Convergence tolerance
24+
25+
Returns:
26+
tuple: (weights, bias)
27+
"""
28+
n_samples, n_features = X.shape
29+
30+
# Initialize weights and bias
31+
weights = np.zeros(n_features)
32+
bias = 0
33+
34+
for _ in range(max_iter):
35+
# Make predictions
36+
y_pred = np.dot(X, weights) + bias
37+
38+
# Calculate residuals
39+
error = y_pred - y
40+
41+
# Calculate gradients
42+
grad_w = (1 / n_samples) * np.dot(X.T, error) + alpha1 * np.sign(weights) + 2 * alpha2 * weights
43+
grad_b = (1 / n_samples) * np.sum(error)
44+
45+
# Update weights and bias
46+
weights -= learning_rate * grad_w
47+
bias -= learning_rate * grad_b
48+
49+
# Check for convergence
50+
if np.linalg.norm(grad_w, ord=1) < tol:
51+
break
52+
53+
return weights, bias
54+
55+
56+
def test_elastic_net_gradient_descent():
57+
"""Test cases for Elastic Net implementation"""
58+
59+
# Test case 1: Simple linear relationship
60+
X = np.array([[0, 0], [1, 1], [2, 2]])
61+
y = np.array([0, 1, 2])
62+
alpha1, alpha2 = 0.1, 0.1
63+
64+
weights, bias = elastic_net_gradient_descent(
65+
X, y, alpha1=alpha1, alpha2=alpha2, learning_rate=0.01, max_iter=1000
66+
)
67+
68+
expected_weights = np.array([0.37, 0.37])
69+
expected_bias = 0.24
70+
71+
assert np.allclose(weights, expected_weights, atol=0.05), "Test case 1 failed"
72+
assert np.isclose(bias, expected_bias, atol=0.05), "Test case 1 failed"
73+
74+
# Test case 2: More complex relationship
75+
X = np.array([[0, 1], [1, 2], [2, 3], [3, 4], [4, 5]])
76+
y = np.array([1, 2, 3, 4, 5])
77+
alpha1, alpha2 = 0.1, 0.1
78+
79+
weights, bias = elastic_net_gradient_descent(
80+
X, y, alpha1=alpha1, alpha2=alpha2, learning_rate=0.01, max_iter=2000
81+
)
82+
83+
expected_weights = np.array([0.42, 0.48])
84+
expected_bias = 0.68
85+
86+
assert np.allclose(weights, expected_weights, atol=0.05), "Test case 2 failed"
87+
assert np.isclose(bias, expected_bias, atol=0.05), "Test case 2 failed"
88+
89+
90+
if __name__ == "__main__":
91+
test_elastic_net_gradient_descent()
92+
print("All Elastic Net tests passed!")

0 commit comments

Comments
 (0)