Open-Deep-ML · reeeeemo · Nov 4, 2025
diff --git a/build/100.json b/build/100.json
@@ -14,7 +14,7 @@
   ],
   "marimo_link": "https://open-deep-ml.github.io/DML-OpenProblem/problem-softsign",
   "description": "Implement the Softsign activation function, a smooth activation function used in neural networks. Your task is to compute the Softsign value for a given input, ensuring the output is bounded between -1 and 1.",
-  "learn_section": "## Understanding the Softsign Activation Function\n\nThe Softsign activation function is a smooth, non-linear activation function used in neural networks. It’s similar to the hyperbolic tangent (tanh) function but with different properties, particularly in its tails which approach their limits more slowly.\n\n### Mathematical Definition\n\nThe Softsign function is mathematically defined as:\n\n$$\nSoftsign(x) = \\frac{x}{1 + |x|}\n$$\n\nWhere:\n- $x$ is the input to the function\n- $|x|$ represents the absolute value of $x$\n\n### Characteristics\n\n- **Output Range:** The output is bounded between -1 and 1, approaching these values asymptotically as $x$ approaches $\\pm \\infty$.\n- **Shape:** The function has an S-shaped curve, similar to tanh but with a smoother approach to its asymptotes.\n- **Gradient:** The gradient is smoother and more gradual compared to tanh, which can help prevent vanishing gradient problems in deep networks.\n- **Symmetry:** The function is symmetric around the origin $(0,0)$.\n\n### Key Properties\n\n- **Bounded Output:** Unlike ReLU, Softsign naturally bounds its output between -1 and 1.\n- **Smoothness:** The function is continuous and differentiable everywhere.\n- **No Saturation:** The gradients approach zero more slowly than in tanh or sigmoid functions.\n- **Zero-Centered:** The function crosses through the origin, making it naturally zero-centered.\n\nThis activation function can be particularly useful in scenarios where you need bounded outputs with more gradual saturation compared to tanh or sigmoid functions.",
+  "learn_section": "## Understanding the Softsign Activation Function\n\nThe Softsign activation function is a smooth, non-linear activation function used in neural networks. It�s similar to the hyperbolic tangent (tanh) function but with different properties, particularly in its tails which approach their limits more slowly.\n\n### Mathematical Definition\n\nThe Softsign function is mathematically defined as:\n\n$$\nSoftsign(x) = \\frac{x}{1 + |x|}\n$$\n\nWhere:\n- $x$ is the input to the function\n- $|x|$ represents the absolute value of $x$\n\n### Characteristics\n\n- **Output Range:** The output is bounded between -1 and 1, approaching these values asymptotically as $x$ approaches $\\pm \\infty$.\n- **Shape:** The function has an S-shaped curve, similar to tanh but with a smoother approach to its asymptotes.\n- **Gradient:** The gradient is smoother and more gradual compared to tanh, which can help prevent vanishing gradient problems in deep networks.\n- **Symmetry:** The function is symmetric around the origin $(0,0)$.\n\n### Key Properties\n\n- **Bounded Output:** Unlike ReLU, Softsign naturally bounds its output between -1 and 1.\n- **Smoothness:** The function is continuous and differentiable everywhere.\n- **No Saturation:** The gradients approach zero more slowly than in tanh or sigmoid functions.\n- **Zero-Centered:** The function crosses through the origin, making it naturally zero-centered.\n\nThis activation function can be particularly useful in scenarios where you need bounded outputs with more gradual saturation compared to tanh or sigmoid functions.",
   "starter_code": "def softsign(x: float) -> float:\n\t\"\"\"\n\tImplements the Softsign activation function.\n\n\tArgs:\n\t\tx (float): Input value\n\n\tReturns:\n\t\tfloat: The Softsign of the input\t\"\"\"\n\t# Your code here\n\tpass\n\treturn round(val,4)",
   "solution": "def softsign(x: float) -> float:\n    \"\"\"\n    Implements the Softsign activation function.\n\n    Args:\n        x (float): Input value\n\n    Returns:\n        float: The Softsign of the input, calculated as x/(1 + |x|)\n    \"\"\"\n    return round(x / (1 + abs(x)), 4)",
   "example": {

diff --git a/build/101.json b/build/101.json
@@ -1,42 +0,0 @@
-{
-  "id": "101",
-  "title": "Implement the GRPO Objective Function",
-  "difficulty": "hard",
-  "category": "Reinforcement Learning",
-  "video": "",
-  "likes": "0",
-  "dislikes": "0",
-  "contributor": [
-    {
-      "profile_link": "https://github.com/moe18",
-      "name": "Moe Chabot"
-    }
-  ],
-  "description": "Implement the GRPO (Generalized Relative Policy Optimization) objective function used to optimize policy parameters in reinforcement learning. Your task is to compute the GRPO objective given the likelihood ratios, advantage estimates, old policy probabilities, reference policy probabilities, and apply the clipping mechanism and KL divergence penalty correctly to maintain training stability.",
-  "learn_section": "### Understanding GRPO (Generalized Relative Policy Optimization)\n\nGRPO is an advanced policy optimization algorithm in reinforcement learning that updates policy parameters while ensuring training stability. It builds upon Proximal Policy Optimization (PPO) by incorporating a KL divergence penalty to keep the new policy close to a reference policy.\n\n### Mathematical Definition\n\nThe GRPO objective function is defined as:\n\n$$\nJ_{GRPO}(\\theta) = \\mathbb{E}_{q \\sim P(Q), \\{o_i\\}_{i=1}^G \\sim \\pi_{\\theta_{old}}(O|q)} \\left[ \\frac{1}{G} \\sum_{i=1}^G \\min\\left( \\rho_i A_i, \\text{clip}(\\rho_i, 1-\\epsilon, 1+\\epsilon) A_i \\right) - \\beta D_{KL}(\\pi_{\\theta} \\| \\pi_{ref}) \\right]\n$$\n\nWhere:\n\n- $\\rho_i = \\frac{\\pi_{\\theta}(o_i | q)}{\\pi_{\\theta_{old}}(o_i | q)}$ is the likelihood ratio.\n- $A_i$ is the advantage estimate for the $i$-th action.\n- $\\epsilon$ is the clipping parameter.\n- $\\beta$ controls the influence of the KL divergence penalty.\n- $D_{KL}$ is the Kullback-Leibler divergence between the new policy $\\pi_{\\theta}$ and the reference policy $\\pi_{ref}$.\n\n### Key Components\n\n#### Likelihood Ratio $\\rho_i$\n- Measures how much more likely the new policy $\\pi_{\\theta}$ is to produce an output $o_i$ compared to the old policy $\\pi_{\\theta_{old}}$.\n- $$\\rho_i = \\frac{\\pi_{\\theta}(o_i | q)}{\\pi_{\\theta_{old}}(o_i | q)}$$\n\n#### Advantage Function $A_i$\n- Evaluates the benefit of taking action $o_i$ compared to the average action.\n- $$A_i = \\frac{r_i - \\text{mean}(r_1, \\ldots, r_G)}{\\text{std}(r_1, \\ldots, r_G)}$$\n- Where $r_i$ is the reward for the $i$-th action.\n\n#### Clipping Mechanism\n- Restricts the likelihood ratio to the range $[1 - \\epsilon, 1 + \\epsilon]$ to prevent large updates.\n- $$\\text{clip}(\\rho_i, 1 - \\epsilon, 1 + \\epsilon)$$\n\n#### KL Divergence Penalty\n- Ensures the new policy $\\pi_{\\theta}$ does not deviate significantly from the reference policy $\\pi_{ref}$.\n- $$-\\beta D_{KL}(\\pi_{\\theta} \\| \\pi_{ref})$$\n\n### Benefits of GRPO\n\n#### Stability\n- The clipping mechanism prevents drastic policy updates, ensuring stable training.\n\n#### Controlled Exploration\n- The KL divergence penalty maintains a balance between exploring new policies and sticking close to a reliable reference policy.\n\n#### Improved Performance\n- By carefully managing policy updates, GRPO can lead to more effective learning and better policy performance.\n\n### Use Cases\n\n#### Reinforcement Learning Tasks\n- Suitable for environments requiring stable and efficient policy updates.\n- also a key component used for the DeepSeek-R1 model\n\n#### Complex Decision-Making Problems\n- Effective in scenarios with high-dimensional action spaces where maintaining policy stability is crucial.\n\n### Conclusion\n\nGRPO enhances policy optimization in reinforcement learning by combining the benefits of PPO with an additional KL divergence penalty. This ensures that policy updates are both effective and stable, leading to more reliable and performant learning agents.",
-  "starter_code": "import numpy as np\n\ndef grpo_objective(rhos, A, pi_theta_old, pi_theta_ref, epsilon=0.2, beta=0.01) -> float:\n\t\"\"\"\n\tCompute the GRPO objective function.\n\n\tArgs:\n\t\trhos: List of likelihood ratios (p_i) = pi_theta(o_i | q) / pi_theta_old(o_i | q).\n\t\tA: List of advantage estimates (A_i).\n\t\tpi_theta_old: List representing the old policy probabilities pi_theta_old(o_i | q).\n\t\tpi_theta_ref: List representing the reference policy probabilities pi_ref(o_i | q).\n\t\tepsilon: Clipping parameter (eps).\n\t\tbeta: KL divergence penalty coefficient (beta).\n\n\tReturns:\n\t\tThe computed GRPO objective value.\n\t\"\"\"\n\t# Your code here\n\tpass",
-  "solution": "import numpy as np\n\ndef grpo_objective(rhos, A, pi_theta_old, pi_theta_ref, epsilon=0.2, beta=0.01) -> float:\n    \"\"\"\n    Compute the GRPO objective function.\n\n    Args:\n        rhos: List of likelihood ratios (ρ_i) = π_theta(o_i | q) / π_theta_old(o_i | q).\n        A: List of advantage estimates (A_i).\n        pi_theta_old: List representing the old policy probabilities π_theta_old(o_i | q).\n        pi_theta_ref: List representing the reference policy probabilities π_ref(o_i | q).\n        epsilon: Clipping parameter (ϵ).\n        beta: KL divergence penalty coefficient (β).\n\n    Returns:\n        The computed GRPO objective value.\n    \"\"\"\n    G = len(rhos)\n    if not (len(A) == len(pi_theta_old) == len(pi_theta_ref) == G):\n        raise ValueError(\"All input lists must have the same length.\")\n    \n    # Compute clipped likelihood ratios\n    clipped_rhos = np.clip(rhos, 1 - epsilon, 1 + epsilon)\n    \n    # Compute the minimum terms for the objective\n    unclipped = np.array(rhos) * np.array(A)\n    clipped = clipped_rhos * np.array(A)\n    min_terms = np.minimum(unclipped, clipped)\n    average_min = np.mean(min_terms)\n    \n    # Compute pi_theta from rhos and pi_theta_old\n    pi_theta = np.array(rhos) * np.array(pi_theta_old)\n    \n    # Normalize pi_theta and pi_theta_ref to ensure they are valid probability distributions\n    pi_theta /= np.sum(pi_theta)\n    pi_theta_ref /= np.sum(pi_theta_ref)\n    \n    # Compute KL divergence D_KL(pi_theta || pi_theta_ref)\n    kl_divergence = np.sum(pi_theta * np.log(pi_theta / pi_theta_ref + 1e-10))  # Added epsilon to avoid log(0)\n    \n    # Compute the final objective\n    objective = average_min - beta * kl_divergence\n    \n    return objective",
-  "example": {
-    "input": "grpo_objective([1.2, 0.8, 1.1], [1.0, 1.0, 1.0], [0.9, 1.1, 1.0], [1.0, 0.5, 1.5], epsilon=0.2, beta=0.01)",
-    "output": "1.032749",
-    "reasoning": "The function calculates the GRPO objective by first clipping the likelihood ratios, computing the minimum terms, averaging them, and then subtracting the KL divergence penalty scaled by beta."
-  },
-  "test_cases": [
-    {
-      "test": "print(round(grpo_objective([1.2, 0.8, 1.1], [1.0, 1.0, 1.0], [0.9, 1.1, 1.0], [1.0, 0.5, 1.5], epsilon=0.2, beta=0.01),6))",
-      "expected_output": "1.032749"
-    },
-    {
-      "test": "print(round(grpo_objective([0.9, 1.1], [1.0, 1.0], [1.0, 1.0], [0.8, 1.2], epsilon=0.1, beta=0.05),6))",
-      "expected_output": "0.999743"
-    },
-    {
-      "test": "print(round(grpo_objective([1.5, 0.5, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.2, 0.7, 1.3], epsilon=0.15, beta=0.02),6))",
-      "expected_output": "0.882682"
-    },
-    {
-      "test": "print(round(grpo_objective([1.0], [1.0], [1.0], [1.0], epsilon=0.1, beta=0.01),6))",
-      "expected_output": "1.0"
-    }
-  ]
-}

diff --git a/questions/189_im2col/description.md b/questions/189_im2col/description.md
@@ -0,0 +1,5 @@
+## Problem
+
+Write a function that implements the Im2Col (Image-to-Column) optimization technique given the 2D input image, convolutional kernel, and stride. 
+
+Return them as rows of the 2D flattened array of local images patches. Assume image has already been padded.
diff --git a/questions/189_im2col/example.json b/questions/189_im2col/example.json
@@ -0,0 +1,5 @@
+{
+  "input": "import numpy as np\n\ninput_matrix = np.array([\n    [2,4,6,7],\n    [9,2,6,5],\n    [1,2,3,4],\n    [5,6,7,8]\n])\n\nkernel = np.array([\n    [1,0],\n    [-1,1]\n])\n\nstride = 1\n\noutput = im2col(input_matrix, kernel, stride)\nprint(output)",
+  "output": "[[2. 4. 9. 2.]\n [4. 6. 2. 6.]\n [6. 7. 6. 5.]\n [9. 2. 1. 2.]\n [2. 6. 2. 3.]\n [6. 5. 3. 4.]\n [1. 2. 5. 6.]\n [2. 3. 6. 7.]\n [3. 4. 7. 8.]]",
+  "reasoning": "The function extracts each patch from the input image based on the kernel size and stride, then flattens each patch into a 1D vector. Each row of the output corresponds to one local patch. For example, the kernel is 2x2, so the first window covers [2,4,9,2], then shifts across the image horizontally and vertically until all patches are collected."
+}
diff --git a/questions/189_im2col/learn.md b/questions/189_im2col/learn.md
@@ -0,0 +1,39 @@
+
+# Learn Section
+
+#  What is Im2col?
+- Also known as Image to Column, *im2col* is a patch extraction operation designed to convert a convolution into a general matrix multiplication (GEMM) problem.  It reorganizes overlapping image patches into rows of a 2D matrix so that the convolution kernel can be applied using efficient linear algebra routines.
+
+## Why are we adding a step to convert into matrix multiplication instead of directly doing element-wise convolution?
+- Decades of optimization have gone into GPU and CPU matrix multiplication libraries (e.g., cuBLAS, MKL). By reformulating convolution as GEMM, we can leverage these highly optimized routines.
+    - Element-wise on the other hand, performs worse due to the irregular memory accesses during the convolution operation.
+- In the im2col layout, pixels are arranged contiguously in memory, enabling faster and more predictable access patterns.
+- Although im2col introduces some data redundancy (overlapping patches share values), the performance gains from contiguous memory access and GEMM vastly outweigh this cost.
+
+# Implementation of Im2col
+A simple way to visualize im2col is to imagine **sliding a kernel window** across an image and recording each window as one row of a new matrix.
+
+For example, a 3x3 kernel sliding over a 6x6 image with a stride of 1, would produce a matrix of shape `(9,16)`. This is because the convolution output has:
+
+$$
+H_{out} = [\frac{H_{in} - k_{h}}{s}] + 1
+$$
+$$
+W_{out} = [\frac{W_{in} - k_{w}}{s}] + 1
+$$
+
+and the flattened patch dimension is:
+$$
+I_{out} = (C_{in} \cdot k_h \cdot k_w, H_{out} \cdot W_{out})
+$$
+
+Where:
+- $H_{in}, W_{in}$: Input image dimensions
+- $k_h, k_w$: Kernel (filter) height and width
+- $s$: Stride
+- $C_{in}$: Number of input channels
+
+> **Note:** Padding is assumed to be 0 in these equations
+
+By flattening each local patch, the convolution becomes equivalent to a matrix multiplication between the flattened image matrix (im2col output) and the flattened kernel weights, which allows for GEMM optimizations.
+
diff --git a/questions/189_im2col/meta.json b/questions/189_im2col/meta.json
@@ -0,0 +1,15 @@
+{
+  "id": "189",
+  "title": "Im2Col",
+  "difficulty": "medium",
+  "category": "Computer Vision",
+  "video": "",
+  "likes": "0",
+  "dislikes": "0",
+  "contributor": [
+    {
+      "profile_link": "https://github.com/reeeeemo",
+      "name": "Robert Oxley"
+    }
+  ]
+}
diff --git a/questions/189_im2col/pytorch/solution.py b/questions/189_im2col/pytorch/solution.py
@@ -0,0 +1,2 @@
+def your_function(...):
+    ...
diff --git a/questions/189_im2col/pytorch/starter_code.py b/questions/189_im2col/pytorch/starter_code.py
@@ -0,0 +1,2 @@
+def your_function(...):
+    pass
diff --git a/questions/189_im2col/pytorch/tests.json b/questions/189_im2col/pytorch/tests.json
@@ -0,0 +1,6 @@
+[
+  {
+    "test": "print(your_function(...))",
+    "expected_output": "..."
+  }
+]
diff --git a/questions/189_im2col/solution.py b/questions/189_im2col/solution.py
@@ -0,0 +1,36 @@
+import numpy as np
+
+def im2col(img: np.ndarray, kernel: np.ndarray, stride: int = 1) -> np.ndarray:
+    """
+        Converts a 2D image into a collection of flattened patches.
+
+        Each patch corresponds to a region covered by the convolution kernel determined by the specified stride.
+
+        Padding is assumed to be handled externally.
+        Args:
+            img (np.ndarray): image to flatten
+            kernel (np.ndarray): convolution kernel
+            stride (int): step size between patches
+        Returns:
+            np.ndarray: 2D array where each row is a flattened image patch
+    """
+
+    # unpack shapes
+    img_h, img_w = img.shape
+    kern_h, kern_w = kernel.shape
+
+    # desired output shape ( (input - filter) / stride ) + 1
+    out_h = (img_h - kern_h) // stride + 1
+    out_w = (img_w - kern_w) // stride + 1
+
+    cols = np.zeros((out_h * out_w, kern_h * kern_w))
+
+    # iterates over every patch (relative to stride), flattens patch into row and inputs into our 2d array as a column
+    col_idx = 0
+    for y in range(0, out_h * stride, stride):
+        for x in range(0, out_w * stride, stride):
+            patch = img[y:y + kern_h, x:x + kern_w]
+            cols[col_idx, :] = patch.flatten()
+            col_idx += 1
+
+    return cols
diff --git a/questions/189_im2col/starter_code.py b/questions/189_im2col/starter_code.py
@@ -0,0 +1,25 @@
+import numpy as np
+
+def im2col(img: np.ndarray, kernel: np.ndarray, stride: int = 1) -> np.ndarray:
+    """
+        Converts a 2D image into a collection of flattened patches.
+
+        Each patch corresponds to a region covered by the convolution kernel determined by the specified stride.
+
+        Padding is assumed to be handled externally.
+        Args:
+            img (np.ndarray): image to flatten
+            kernel (np.ndarray): convolution kernel
+            stride (int): step size between patches
+        Returns:
+            np.ndarray: 2D array where each row is a flattened image patch
+    """
+
+    # unpack shapes
+    img_h, img_w = img.shape
+    kern_h, kern_w = kernel.shape
+
+    # TODO: IM2COL algorithm 
+    cols = np.array([])
+
+    return cols