Skip to content

Commit 04708b4

Browse files
authored
Merge pull request #569 from ana-baltaretu/main
Improve problem 15, change learn text, starter code and add more tests
2 parents 6c7efc2 + ff5229f commit 04708b4

File tree

6 files changed

+74
-17
lines changed

6 files changed

+74
-17
lines changed

build/15.json

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,18 @@
1818
{
1919
"profile_link": "https://www.youtube.com/@StoatScript/videos",
2020
"name": "StoatScript"
21+
},
22+
{
23+
"profile_link": "https://github.com/ana-baltaretu",
24+
"name": "anisca22"
2125
}
2226
],
2327
"tinygrad_difficulty": "medium",
2428
"pytorch_difficulty": "medium",
2529
"description": "Write a Python function that performs linear regression using gradient descent. The function should take NumPy arrays X (features with a column of ones for the intercept) and y (target) as input, along with learning rate alpha and the number of iterations, and return the coefficients of the linear regression model as a NumPy array. Round your answer to four decimal places. -0.0 is a valid result for rounding a very small number.",
26-
"learn_section": "\n## Linear Regression Using Gradient Descent\n\nLinear regression can also be performed using a technique called gradient descent, where the coefficients (or weights) of the model are iteratively adjusted to minimize a cost function (usually mean squared error). This method is particularly useful when the number of features is too large for analytical solutions like the normal equation or when the feature matrix is not invertible.\n\nThe gradient descent algorithm updates the weights by moving in the direction of the negative gradient of the cost function with respect to the weights. The updates occur iteratively until the algorithm converges to a minimum of the cost function.\n\nThe update rule for each weight is given by:\n$$\n\\theta_j := \\theta_j - \\alpha \\frac{1}{m} \\sum_{i=1}^{m} \\left( h_{\\theta}(x^{(i)}) - y^{(i)} \\right)x_j^{(i)}\n$$\n\n### Explanation of Terms\n1. \\( \\alpha \\) is the learning rate.\n2. \\( m \\) is the number of training examples.\n3. \\( h_{\\theta}(x^{(i)}) \\) is the hypothesis function at iteration \\( i \\).\n4. \\( x^{(i)} \\) is the feature vector of the \\( i^{\\text{th}} \\) training example.\n5. \\( y^{(i)} \\) is the actual target value for the \\( i^{\\text{th}} \\) training example.\n6. \\( x_j^{(i)} \\) is the value of feature \\( j \\) for the \\( i^{\\text{th}} \\) training example.\n\n### Key Points\n- **Learning Rate**: The choice of learning rate is crucial for the convergence and performance of gradient descent. \n - A small learning rate may lead to slow convergence.\n - A large learning rate may cause overshooting and divergence.\n- **Number of Iterations**: The number of iterations determines how long the algorithm runs before it converges or stops.\n\n### Practical Implementation\nImplementing gradient descent involves initializing the weights, computing the gradient of the cost function, and iteratively updating the weights according to the update rule.",
27-
"starter_code": "import numpy as np\ndef linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:\n\t# Your code here, make sure to round\n\tm, n = X.shape\n\ttheta = np.zeros((n, 1))\n\treturn theta",
28-
"solution": "\nimport numpy as np\ndef linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:\n m, n = X.shape\n theta = np.zeros((n, 1))\n for _ in range(iterations):\n predictions = X @ theta\n errors = predictions - y.reshape(-1, 1)\n updates = X.T @ errors / m\n theta -= alpha * updates\n return np.round(theta.flatten(), 4)",
30+
"learn_section": "\n## Linear Regression Using Gradient Descent\n\nLinear regression can also be performed using a technique called gradient descent, where the coefficients (or weights) of the model are iteratively adjusted to minimize a cost function (usually mean squared error). This method is particularly useful when the number of features is too large for analytical solutions like the normal equation or when the feature matrix is not invertible.\n\nThe gradient descent algorithm updates the weights by moving in the direction of the negative gradient of the cost function with respect to the weights. The updates occur iteratively until the algorithm converges to a minimum of the cost function.\n\nThe update rule for each weight is given by:\n$$\n\\theta_j := \\theta_j - \\alpha \\frac{1}{m} \\sum_{i=1}^{m} \\left( h_{\\theta}(x^{(i)}) - y^{(i)} \\right)x_j^{(i)}\n$$\n\n### Explanation of Terms\n1. $\\alpha$ is the learning rate.\n2. $m$ is the number of training examples.\n3. $h_{\\theta}(x^{(i)})$ is the hypothesis function at iteration $i$.\n4. $x^{(i)}$ is the feature vector of the $i^{\\text{th}}$ training example.\n5. $y^{(i)}$ is the actual target value for the $i^{\\text{th}}$ training example.\n6. $x_j^{(i)}$ is the value of feature $j$ for the $i^{\\text{th}}$ training example.\n\n### Key Points\n- **Learning Rate**: The choice of learning rate is crucial for the convergence and performance of gradient descent. \n - A small learning rate may lead to slow convergence.\n - A large learning rate may cause overshooting and divergence.\n- **Number of Iterations**: The number of iterations determines how long the algorithm runs before it converges or stops.\n\n### Practical Implementation\nImplementing gradient descent involves:\n1. Initializing the weights theta ($\\theta$), with 0s in our case, \n2. Compute the predicted values for all training examples using the current weights\n3. Computing the gradient for each iteration step by aggregating the errors over all training examples and scaling it by the learning rate,\n4. and iteratively updating the weights according to the update rule.",
31+
"starter_code": "import numpy as np\n\ndef linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:\n\t\"\"\"\n\tPerform linear regression using gradient descent.\n\n\tm = number of training examples\n\tn = number of parameters (features), technically n-1 features, 1st column is for intercept\n\n\tX: shape (m, n), `m` training examples with `n` input values for each feature\n\ty: shape (m, 1) array with the target values (ground truth)\n\talpha: learning rate\n\titerations: number of gradient descent steps\n\t\"\"\"\n\n\tm, n = X.shape\n\ty = y.reshape(-1, 1) \t# Make sure y is a column vector\n\ttheta = np.zeros((n, 1))\n\n\t# TODO: Your code here\n\n\treturn np.round(theta.flatten(), 4) \t# Rounded to 4 decimals",
32+
"solution": "\nimport numpy as np\ndef linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:\n \"\"\"\n Perform linear regression using gradient descent.\n\n m = number of training examples\n n = number of parameters (features), technically n-1 features, 1st column is for intercept\n\n X: shape (m, n), `m` training examples with `n` input values for each feature\n y: shape (m, 1) array with the target values (ground truth)\n alpha: learning rate\n iterations: number of gradient descent steps\n \"\"\"\n\n m, n = X.shape\n y = y.reshape(-1, 1) # Make sure y is a column vector\n theta = np.zeros((n, 1)) # 1. Initializing the weights with 0s\n for _ in range(iterations):\n predictions = X @ theta # 2. Compute predictions h(x)\n errors = predictions - y # 3. Compute error = h - y\n gradient = alpha * (X.T @ errors) / m # 4. Compute gradient\n theta -= gradient # 5. Update weights for each iteration\n return np.round(theta.flatten(), 4)",
2933
"example": {
3034
"input": "X = np.array([[1, 1], [1, 2], [1, 3]]), y = np.array([1, 2, 3]), alpha = 0.01, iterations = 1000",
3135
"output": "np.array([0.1107, 0.9513])",
@@ -35,6 +39,14 @@
3539
{
3640
"test": "print(linear_regression_gradient_descent(np.array([[1, 1], [1, 2], [1, 3]]), np.array([1, 2, 3]), 0.01, 1000))",
3741
"expected_output": "[0.1107, 0.9513]"
42+
},
43+
{
44+
"test": "print(linear_regression_gradient_descent(np.array([[1, 5], [1, 7], [1, 9]]), np.array([10, 14, 18]), 0.01, 5000))",
45+
"expected_output": "[0.0211, 1.9971]"
46+
},
47+
{
48+
"test": "print(linear_regression_gradient_descent(np.array([[1, 1, 0], [1, 2, 1], [1, 3, 2], [1, 4, 3]]), np.array([5, 6, 7, 8]), 0.05, 2000))",
49+
"expected_output": "[3.0, 2.0, -1.0]"
3850
}
3951
],
4052
"tinygrad_starter_code": "from tinygrad.tensor import Tensor\n\ndef linear_regression_gradient_descent_tg(X, y, alpha, iterations) -> Tensor:\n \"\"\"\n Solve linear regression via gradient descent using tinygrad autograd.\n X: Tensor or convertible shape (m,n); y: shape (m,) or (m,1).\n alpha: learning rate; iterations: number of steps.\n Returns a 1-D Tensor of length n, rounded to 4 decimals.\n \"\"\"\n X_t = Tensor(X).float()\n y_t = Tensor(y).float().reshape(-1,1)\n m, n = X_t.shape\n theta = Tensor([[0.0] for _ in range(n)])\n # Your implementation here\n pass",

questions/15_linear-regression-using-gradient-descent/learn.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,12 @@ $$
1111
$$
1212

1313
### Explanation of Terms
14-
1. \( \alpha \) is the learning rate.
15-
2. \( m \) is the number of training examples.
16-
3. \( h_{\theta}(x^{(i)}) \) is the hypothesis function at iteration \( i \).
17-
4. \( x^{(i)} \) is the feature vector of the \( i^{\text{th}} \) training example.
18-
5. \( y^{(i)} \) is the actual target value for the \( i^{\text{th}} \) training example.
19-
6. \( x_j^{(i)} \) is the value of feature \( j \) for the \( i^{\text{th}} \) training example.
14+
1. $\alpha$ is the learning rate.
15+
2. $m$ is the number of training examples.
16+
3. $h_{\theta}(x^{(i)})$ is the hypothesis function at iteration $i$.
17+
4. $x^{(i)}$ is the feature vector of the $i^{\text{th}}$ training example.
18+
5. $y^{(i)}$ is the actual target value for the $i^{\text{th}}$ training example.
19+
6. $x_j^{(i)}$ is the value of feature $j$ for the $i^{\text{th}}$ training example.
2020

2121
### Key Points
2222
- **Learning Rate**: The choice of learning rate is crucial for the convergence and performance of gradient descent.
@@ -25,4 +25,8 @@ $$
2525
- **Number of Iterations**: The number of iterations determines how long the algorithm runs before it converges or stops.
2626

2727
### Practical Implementation
28-
Implementing gradient descent involves initializing the weights, computing the gradient of the cost function, and iteratively updating the weights according to the update rule.
28+
Implementing gradient descent involves:
29+
1. Initializing the weights theta ($\theta$), with 0s in our case,
30+
2. Compute the predicted values for all training examples using the current weights
31+
3. Computing the gradient for each iteration step by aggregating the errors over all training examples and scaling it by the learning rate,
32+
4. and iteratively updating the weights according to the update rule.

questions/15_linear-regression-using-gradient-descent/meta.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,10 @@
1818
{
1919
"profile_link": "https://www.youtube.com/@StoatScript/videos",
2020
"name": "StoatScript"
21+
},
22+
{
23+
"profile_link": "https://github.com/ana-baltaretu",
24+
"name": "anisca22"
2125
}
2226
],
2327
"tinygrad_difficulty": "medium",
Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,24 @@
11

22
import numpy as np
33
def linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:
4+
"""
5+
Perform linear regression using gradient descent.
6+
7+
m = number of training examples
8+
n = number of parameters (features), technically n-1 features, 1st column is for intercept
9+
10+
X: shape (m, n), `m` training examples with `n` input values for each feature
11+
y: shape (m, 1) array with the target values (ground truth)
12+
alpha: learning rate
13+
iterations: number of gradient descent steps
14+
"""
15+
416
m, n = X.shape
5-
theta = np.zeros((n, 1))
17+
y = y.reshape(-1, 1) # Make sure y is a column vector
18+
theta = np.zeros((n, 1)) # 1. Initializing the weights with 0s
619
for _ in range(iterations):
7-
predictions = X @ theta
8-
errors = predictions - y.reshape(-1, 1)
9-
updates = X.T @ errors / m
10-
theta -= alpha * updates
20+
predictions = X @ theta # 2. Compute predictions h(x)
21+
errors = predictions - y # 3. Compute error = h - y
22+
gradient = alpha * (X.T @ errors) / m # 4. Compute gradient
23+
theta -= gradient # 5. Update weights for each iteration
1124
return np.round(theta.flatten(), 4)
Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,22 @@
11
import numpy as np
2+
23
def linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:
3-
# Your code here, make sure to round
4+
"""
5+
Perform linear regression using gradient descent.
6+
7+
m = number of training examples
8+
n = number of parameters (features), technically n-1 features, 1st column is for intercept
9+
10+
X: shape (m, n), `m` training examples with `n` input values for each feature
11+
y: shape (m, 1) array with the target values (ground truth)
12+
alpha: learning rate
13+
iterations: number of gradient descent steps
14+
"""
15+
416
m, n = X.shape
17+
y = y.reshape(-1, 1) # Make sure y is a column vector
518
theta = np.zeros((n, 1))
6-
return theta
19+
20+
# TODO: Your code here
21+
22+
return np.round(theta.flatten(), 4) # Rounded to 4 decimals

questions/15_linear-regression-using-gradient-descent/tests.json

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,13 @@
22
{
33
"test": "print(linear_regression_gradient_descent(np.array([[1, 1], [1, 2], [1, 3]]), np.array([1, 2, 3]), 0.01, 1000))",
44
"expected_output": "[0.1107, 0.9513]"
5+
},
6+
{
7+
"test": "print(linear_regression_gradient_descent(np.array([[1, 5], [1, 7], [1, 9]]), np.array([10, 14, 18]), 0.01, 5000))",
8+
"expected_output": "[0.0211, 1.9971]"
9+
},
10+
{
11+
"test": "print(linear_regression_gradient_descent(np.array([[1, 1, 0], [1, 2, 1], [1, 3, 2], [1, 4, 3]]), np.array([5, 6, 7, 8]), 0.05, 2000))",
12+
"expected_output": "[3.0, 2.0, -1.0]"
513
}
614
]

0 commit comments

Comments
 (0)