Merge pull request #569 from ana-baltaretu/main

moe18 · web-flow · commit 04708b485a05 · 2025-11-10T10:07:52.000-05:00
Improve problem 15, change learn text, starter code and add more tests
diff --git a/build/15.json b/build/15.json
@@ -18,14 +18,18 @@
     {
       "profile_link": "https://www.youtube.com/@StoatScript/videos",
       "name": "StoatScript"
+    },
+    {
+      "profile_link": "https://github.com/ana-baltaretu",
+      "name": "anisca22"
     }
   ],
   "tinygrad_difficulty": "medium",
   "pytorch_difficulty": "medium",
   "description": "Write a Python function that performs linear regression using gradient descent. The function should take NumPy arrays X (features with a column of ones for the intercept) and y (target) as input, along with learning rate alpha and the number of iterations, and return the coefficients of the linear regression model as a NumPy array. Round your answer to four decimal places. -0.0 is a valid result for rounding a very small number.",
-  "learn_section": "\n## Linear Regression Using Gradient Descent\n\nLinear regression can also be performed using a technique called gradient descent, where the coefficients (or weights) of the model are iteratively adjusted to minimize a cost function (usually mean squared error). This method is particularly useful when the number of features is too large for analytical solutions like the normal equation or when the feature matrix is not invertible.\n\nThe gradient descent algorithm updates the weights by moving in the direction of the negative gradient of the cost function with respect to the weights. The updates occur iteratively until the algorithm converges to a minimum of the cost function.\n\nThe update rule for each weight is given by:\n$$\n\\theta_j := \\theta_j - \\alpha \\frac{1}{m} \\sum_{i=1}^{m} \\left( h_{\\theta}(x^{(i)}) - y^{(i)} \\right)x_j^{(i)}\n$$\n\n### Explanation of Terms\n1. \\( \\alpha \\) is the learning rate.\n2. \\( m \\) is the number of training examples.\n3. \\( h_{\\theta}(x^{(i)}) \\) is the hypothesis function at iteration \\( i \\).\n4. \\( x^{(i)} \\) is the feature vector of the \\( i^{\\text{th}} \\) training example.\n5. \\( y^{(i)} \\) is the actual target value for the \\( i^{\\text{th}} \\) training example.\n6. \\( x_j^{(i)} \\) is the value of feature \\( j \\) for the \\( i^{\\text{th}} \\) training example.\n\n### Key Points\n- **Learning Rate**: The choice of learning rate is crucial for the convergence and performance of gradient descent. \n  - A small learning rate may lead to slow convergence.\n  - A large learning rate may cause overshooting and divergence.\n- **Number of Iterations**: The number of iterations determines how long the algorithm runs before it converges or stops.\n\n### Practical Implementation\nImplementing gradient descent involves initializing the weights, computing the gradient of the cost function, and iteratively updating the weights according to the update rule.",
-  "starter_code": "import numpy as np\ndef linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:\n\t# Your code here, make sure to round\n\tm, n = X.shape\n\ttheta = np.zeros((n, 1))\n\treturn theta",
-  "solution": "\nimport numpy as np\ndef linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:\n    m, n = X.shape\n    theta = np.zeros((n, 1))\n    for _ in range(iterations):\n        predictions = X @ theta\n        errors = predictions - y.reshape(-1, 1)\n        updates = X.T @ errors / m\n        theta -= alpha * updates\n    return np.round(theta.flatten(), 4)",
+  "learn_section": "\n## Linear Regression Using Gradient Descent\n\nLinear regression can also be performed using a technique called gradient descent, where the coefficients (or weights) of the model are iteratively adjusted to minimize a cost function (usually mean squared error). This method is particularly useful when the number of features is too large for analytical solutions like the normal equation or when the feature matrix is not invertible.\n\nThe gradient descent algorithm updates the weights by moving in the direction of the negative gradient of the cost function with respect to the weights. The updates occur iteratively until the algorithm converges to a minimum of the cost function.\n\nThe update rule for each weight is given by:\n$$\n\\theta_j := \\theta_j - \\alpha \\frac{1}{m} \\sum_{i=1}^{m} \\left( h_{\\theta}(x^{(i)}) - y^{(i)} \\right)x_j^{(i)}\n$$\n\n### Explanation of Terms\n1. $\\alpha$ is the learning rate.\n2. $m$ is the number of training examples.\n3. $h_{\\theta}(x^{(i)})$ is the hypothesis function at iteration $i$.\n4. $x^{(i)}$ is the feature vector of the $i^{\\text{th}}$ training example.\n5. $y^{(i)}$ is the actual target value for the $i^{\\text{th}}$ training example.\n6. $x_j^{(i)}$ is the value of feature $j$ for the $i^{\\text{th}}$ training example.\n\n### Key Points\n- **Learning Rate**: The choice of learning rate is crucial for the convergence and performance of gradient descent. \n  - A small learning rate may lead to slow convergence.\n  - A large learning rate may cause overshooting and divergence.\n- **Number of Iterations**: The number of iterations determines how long the algorithm runs before it converges or stops.\n\n### Practical Implementation\nImplementing gradient descent involves:\n1. Initializing the weights theta ($\\theta$), with 0s in our case, \n2. Compute the predicted values for all training examples using the current weights\n3. Computing the gradient for each iteration step by aggregating the errors over all training examples and scaling it by the learning rate,\n4. and iteratively updating the weights according to the update rule.",
+  "starter_code": "import numpy as np\n\ndef linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:\n\t\"\"\"\n\tPerform linear regression using gradient descent.\n\n\tm = number of training examples\n\tn = number of parameters (features), technically n-1 features, 1st column is for intercept\n\n\tX: shape (m, n), `m` training examples with `n` input values for each feature\n\ty: shape (m, 1) array with the target values (ground truth)\n\talpha: learning rate\n\titerations: number of gradient descent steps\n\t\"\"\"\n\n\tm, n = X.shape\n\ty = y.reshape(-1, 1) \t# Make sure y is a column vector\n\ttheta = np.zeros((n, 1))\n\n\t# TODO: Your code here\n\n\treturn np.round(theta.flatten(), 4) \t# Rounded to 4 decimals",
+  "solution": "\nimport numpy as np\ndef linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:\n    \"\"\"\n    Perform linear regression using gradient descent.\n\n    m = number of training examples\n    n = number of parameters (features), technically n-1 features, 1st column is for intercept\n\n    X: shape (m, n), `m` training examples with `n` input values for each feature\n    y: shape (m, 1) array with the target values (ground truth)\n    alpha: learning rate\n    iterations: number of gradient descent steps\n    \"\"\"\n\n    m, n = X.shape\n    y = y.reshape(-1, 1)    # Make sure y is a column vector\n    theta = np.zeros((n, 1))                    # 1. Initializing the weights with 0s\n    for _ in range(iterations):\n        predictions = X @ theta                 # 2. Compute predictions h(x)\n        errors = predictions - y                # 3. Compute error = h - y\n        gradient = alpha * (X.T @ errors) / m   # 4. Compute gradient\n        theta -= gradient                       # 5. Update weights for each iteration\n    return np.round(theta.flatten(), 4)",
   "example": {
     "input": "X = np.array([[1, 1], [1, 2], [1, 3]]), y = np.array([1, 2, 3]), alpha = 0.01, iterations = 1000",
     "output": "np.array([0.1107, 0.9513])",
@@ -35,6 +39,14 @@
     {
       "test": "print(linear_regression_gradient_descent(np.array([[1, 1], [1, 2], [1, 3]]), np.array([1, 2, 3]), 0.01, 1000))",
       "expected_output": "[0.1107, 0.9513]"
+    },
+    {
+      "test": "print(linear_regression_gradient_descent(np.array([[1, 5], [1, 7], [1, 9]]), np.array([10, 14, 18]), 0.01, 5000))",
+      "expected_output": "[0.0211, 1.9971]"
+    },
+    {
+      "test": "print(linear_regression_gradient_descent(np.array([[1, 1, 0], [1, 2, 1], [1, 3, 2], [1, 4, 3]]), np.array([5, 6, 7, 8]), 0.05, 2000))",
+      "expected_output": "[3.0, 2.0, -1.0]"
     }
   ],
   "tinygrad_starter_code": "from tinygrad.tensor import Tensor\n\ndef linear_regression_gradient_descent_tg(X, y, alpha, iterations) -> Tensor:\n    \"\"\"\n    Solve linear regression via gradient descent using tinygrad autograd.\n    X: Tensor or convertible shape (m,n); y: shape (m,) or (m,1).\n    alpha: learning rate; iterations: number of steps.\n    Returns a 1-D Tensor of length n, rounded to 4 decimals.\n    \"\"\"\n    X_t = Tensor(X).float()\n    y_t = Tensor(y).float().reshape(-1,1)\n    m, n = X_t.shape\n    theta = Tensor([[0.0] for _ in range(n)])\n    # Your implementation here\n    pass",
diff --git a/questions/15_linear-regression-using-gradient-descent/learn.md b/questions/15_linear-regression-using-gradient-descent/learn.md
@@ -11,12 +11,12 @@ $$
 $$
 
 ### Explanation of Terms
-1. \( \alpha \) is the learning rate.
-2. \( m \) is the number of training examples.
-3. \( h_{\theta}(x^{(i)}) \) is the hypothesis function at iteration \( i \).
-4. \( x^{(i)} \) is the feature vector of the \( i^{\text{th}} \) training example.
-5. \( y^{(i)} \) is the actual target value for the \( i^{\text{th}} \) training example.
-6. \( x_j^{(i)} \) is the value of feature \( j \) for the \( i^{\text{th}} \) training example.
+1. $\alpha$ is the learning rate.
+2. $m$ is the number of training examples.
+3. $h_{\theta}(x^{(i)})$ is the hypothesis function at iteration $i$.
+4. $x^{(i)}$ is the feature vector of the $i^{\text{th}}$ training example.
+5. $y^{(i)}$ is the actual target value for the $i^{\text{th}}$ training example.
+6. $x_j^{(i)}$ is the value of feature $j$ for the $i^{\text{th}}$ training example.
 
 ### Key Points
 - **Learning Rate**: The choice of learning rate is crucial for the convergence and performance of gradient descent. 
@@ -25,4 +25,8 @@ $$
 - **Number of Iterations**: The number of iterations determines how long the algorithm runs before it converges or stops.
 
 ### Practical Implementation
-Implementing gradient descent involves initializing the weights, computing the gradient of the cost function, and iteratively updating the weights according to the update rule.
+Implementing gradient descent involves:
+1. Initializing the weights theta ($\theta$), with 0s in our case, 
+2. Compute the predicted values for all training examples using the current weights
+3. Computing the gradient for each iteration step by aggregating the errors over all training examples and scaling it by the learning rate,
+4. and iteratively updating the weights according to the update rule.
diff --git a/questions/15_linear-regression-using-gradient-descent/meta.json b/questions/15_linear-regression-using-gradient-descent/meta.json
@@ -18,6 +18,10 @@
     {
       "profile_link": "https://www.youtube.com/@StoatScript/videos",
       "name": "StoatScript"
+    },
+    {
+      "profile_link": "https://github.com/ana-baltaretu",
+      "name": "anisca22"
     }
   ],
   "tinygrad_difficulty": "medium",
diff --git a/questions/15_linear-regression-using-gradient-descent/solution.py b/questions/15_linear-regression-using-gradient-descent/solution.py
@@ -1,11 +1,24 @@
 
 import numpy as np
 def linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:
+    """
+    Perform linear regression using gradient descent.
+
+    m = number of training examples
+    n = number of parameters (features), technically n-1 features, 1st column is for intercept
+
+    X: shape (m, n), `m` training examples with `n` input values for each feature
+    y: shape (m, 1) array with the target values (ground truth)
+    alpha: learning rate
+    iterations: number of gradient descent steps
+    """
+
     m, n = X.shape
-    theta = np.zeros((n, 1))
+    y = y.reshape(-1, 1)    # Make sure y is a column vector
+    theta = np.zeros((n, 1))                    # 1. Initializing the weights with 0s
     for _ in range(iterations):
-        predictions = X @ theta
-        errors = predictions - y.reshape(-1, 1)
-        updates = X.T @ errors / m
-        theta -= alpha * updates
+        predictions = X @ theta                 # 2. Compute predictions h(x)
+        errors = predictions - y                # 3. Compute error = h - y
+        gradient = alpha * (X.T @ errors) / m   # 4. Compute gradient
+        theta -= gradient                       # 5. Update weights for each iteration
     return np.round(theta.flatten(), 4)
diff --git a/questions/15_linear-regression-using-gradient-descent/starter_code.py b/questions/15_linear-regression-using-gradient-descent/starter_code.py
@@ -1,6 +1,22 @@
 import numpy as np
+
 def linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:
-	# Your code here, make sure to round
+	"""
+	Perform linear regression using gradient descent.
+
+	m = number of training examples
+	n = number of parameters (features), technically n-1 features, 1st column is for intercept
+
+	X: shape (m, n), `m` training examples with `n` input values for each feature
+	y: shape (m, 1) array with the target values (ground truth)
+	alpha: learning rate
+	iterations: number of gradient descent steps
+	"""
+
 	m, n = X.shape
+	y = y.reshape(-1, 1) 	# Make sure y is a column vector
 	theta = np.zeros((n, 1))
-	return theta
+
+	# TODO: Your code here
+
+	return np.round(theta.flatten(), 4) 	# Rounded to 4 decimals
diff --git a/questions/15_linear-regression-using-gradient-descent/tests.json b/questions/15_linear-regression-using-gradient-descent/tests.json
@@ -2,5 +2,13 @@
   {
     "test": "print(linear_regression_gradient_descent(np.array([[1, 1], [1, 2], [1, 3]]), np.array([1, 2, 3]), 0.01, 1000))",
     "expected_output": "[0.1107, 0.9513]"
+  },
+  {
+    "test": "print(linear_regression_gradient_descent(np.array([[1, 5], [1, 7], [1, 9]]), np.array([10, 14, 18]), 0.01, 5000))",
+    "expected_output": "[0.0211, 1.9971]"
+  },
+  {
+    "test": "print(linear_regression_gradient_descent(np.array([[1, 1, 0], [1, 2, 1], [1, 3, 2], [1, 4, 3]]), np.array([5, 6, 7, 8]), 0.05, 2000))",
+    "expected_output": "[3.0, 2.0, -1.0]"
   }
 ]

Original file line number	Diff line number	Diff line change
`@@ -18,6 +18,10 @@`
`18`	`18`	`{`
`19`	`19`	`"profile_link": "https://www.youtube.com/@StoatScript/videos",`
`20`	`20`	`"name": "StoatScript"`
	`21`	`+ },`
	`22`	`+ {`
	`23`	`+ "profile_link": "https://github.com/ana-baltaretu",`
	`24`	`+ "name": "anisca22"`
`21`	`25`	`}`
`22`	`26`	`],`
`23`	`27`	`"tinygrad_difficulty": "medium",`
Original file line number	Diff line number	Diff line change
`@@ -2,5 +2,13 @@`
`2`	`2`	`{`
`3`	`3`	`"test": "print(linear_regression_gradient_descent(np.array([[1, 1], [1, 2], [1, 3]]), np.array([1, 2, 3]), 0.01, 1000))",`
`4`	`4`	`"expected_output": "[0.1107, 0.9513]"`
	`5`	`+ },`
	`6`	`+ {`
	`7`	`+ "test": "print(linear_regression_gradient_descent(np.array([[1, 5], [1, 7], [1, 9]]), np.array([10, 14, 18]), 0.01, 5000))",`
	`8`	`+ "expected_output": "[0.0211, 1.9971]"`
	`9`	`+ },`
	`10`	`+ {`
	`11`	`+ "test": "print(linear_regression_gradient_descent(np.array([[1, 1, 0], [1, 2, 1], [1, 3, 2], [1, 4, 3]]), np.array([5, 6, 7, 8]), 0.05, 2000))",`
	`12`	`+ "expected_output": "[3.0, 2.0, -1.0]"`
`5`	`13`	`}`
`6`	`14`	`]`