Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion build/100.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
],
"marimo_link": "https://open-deep-ml.github.io/DML-OpenProblem/problem-softsign",
"description": "Implement the Softsign activation function, a smooth activation function used in neural networks. Your task is to compute the Softsign value for a given input, ensuring the output is bounded between -1 and 1.",
"learn_section": "## Understanding the Softsign Activation Function\n\nThe Softsign activation function is a smooth, non-linear activation function used in neural networks. It’s similar to the hyperbolic tangent (tanh) function but with different properties, particularly in its tails which approach their limits more slowly.\n\n### Mathematical Definition\n\nThe Softsign function is mathematically defined as:\n\n$$\nSoftsign(x) = \\frac{x}{1 + |x|}\n$$\n\nWhere:\n- $x$ is the input to the function\n- $|x|$ represents the absolute value of $x$\n\n### Characteristics\n\n- **Output Range:** The output is bounded between -1 and 1, approaching these values asymptotically as $x$ approaches $\\pm \\infty$.\n- **Shape:** The function has an S-shaped curve, similar to tanh but with a smoother approach to its asymptotes.\n- **Gradient:** The gradient is smoother and more gradual compared to tanh, which can help prevent vanishing gradient problems in deep networks.\n- **Symmetry:** The function is symmetric around the origin $(0,0)$.\n\n### Key Properties\n\n- **Bounded Output:** Unlike ReLU, Softsign naturally bounds its output between -1 and 1.\n- **Smoothness:** The function is continuous and differentiable everywhere.\n- **No Saturation:** The gradients approach zero more slowly than in tanh or sigmoid functions.\n- **Zero-Centered:** The function crosses through the origin, making it naturally zero-centered.\n\nThis activation function can be particularly useful in scenarios where you need bounded outputs with more gradual saturation compared to tanh or sigmoid functions.",
"learn_section": "## Understanding the Softsign Activation Function\n\nThe Softsign activation function is a smooth, non-linear activation function used in neural networks. It�s similar to the hyperbolic tangent (tanh) function but with different properties, particularly in its tails which approach their limits more slowly.\n\n### Mathematical Definition\n\nThe Softsign function is mathematically defined as:\n\n$$\nSoftsign(x) = \\frac{x}{1 + |x|}\n$$\n\nWhere:\n- $x$ is the input to the function\n- $|x|$ represents the absolute value of $x$\n\n### Characteristics\n\n- **Output Range:** The output is bounded between -1 and 1, approaching these values asymptotically as $x$ approaches $\\pm \\infty$.\n- **Shape:** The function has an S-shaped curve, similar to tanh but with a smoother approach to its asymptotes.\n- **Gradient:** The gradient is smoother and more gradual compared to tanh, which can help prevent vanishing gradient problems in deep networks.\n- **Symmetry:** The function is symmetric around the origin $(0,0)$.\n\n### Key Properties\n\n- **Bounded Output:** Unlike ReLU, Softsign naturally bounds its output between -1 and 1.\n- **Smoothness:** The function is continuous and differentiable everywhere.\n- **No Saturation:** The gradients approach zero more slowly than in tanh or sigmoid functions.\n- **Zero-Centered:** The function crosses through the origin, making it naturally zero-centered.\n\nThis activation function can be particularly useful in scenarios where you need bounded outputs with more gradual saturation compared to tanh or sigmoid functions.",
"starter_code": "def softsign(x: float) -> float:\n\t\"\"\"\n\tImplements the Softsign activation function.\n\n\tArgs:\n\t\tx (float): Input value\n\n\tReturns:\n\t\tfloat: The Softsign of the input\t\"\"\"\n\t# Your code here\n\tpass\n\treturn round(val,4)",
"solution": "def softsign(x: float) -> float:\n \"\"\"\n Implements the Softsign activation function.\n\n Args:\n x (float): Input value\n\n Returns:\n float: The Softsign of the input, calculated as x/(1 + |x|)\n \"\"\"\n return round(x / (1 + abs(x)), 4)",
"example": {
Expand Down
42 changes: 0 additions & 42 deletions build/101.json
Original file line number Diff line number Diff line change
@@ -1,42 +0,0 @@
{
"id": "101",
"title": "Implement the GRPO Objective Function",
"difficulty": "hard",
"category": "Reinforcement Learning",
"video": "",
"likes": "0",
"dislikes": "0",
"contributor": [
{
"profile_link": "https://github.com/moe18",
"name": "Moe Chabot"
}
],
"description": "Implement the GRPO (Generalized Relative Policy Optimization) objective function used to optimize policy parameters in reinforcement learning. Your task is to compute the GRPO objective given the likelihood ratios, advantage estimates, old policy probabilities, reference policy probabilities, and apply the clipping mechanism and KL divergence penalty correctly to maintain training stability.",
"learn_section": "### Understanding GRPO (Generalized Relative Policy Optimization)\n\nGRPO is an advanced policy optimization algorithm in reinforcement learning that updates policy parameters while ensuring training stability. It builds upon Proximal Policy Optimization (PPO) by incorporating a KL divergence penalty to keep the new policy close to a reference policy.\n\n### Mathematical Definition\n\nThe GRPO objective function is defined as:\n\n$$\nJ_{GRPO}(\\theta) = \\mathbb{E}_{q \\sim P(Q), \\{o_i\\}_{i=1}^G \\sim \\pi_{\\theta_{old}}(O|q)} \\left[ \\frac{1}{G} \\sum_{i=1}^G \\min\\left( \\rho_i A_i, \\text{clip}(\\rho_i, 1-\\epsilon, 1+\\epsilon) A_i \\right) - \\beta D_{KL}(\\pi_{\\theta} \\| \\pi_{ref}) \\right]\n$$\n\nWhere:\n\n- $\\rho_i = \\frac{\\pi_{\\theta}(o_i | q)}{\\pi_{\\theta_{old}}(o_i | q)}$ is the likelihood ratio.\n- $A_i$ is the advantage estimate for the $i$-th action.\n- $\\epsilon$ is the clipping parameter.\n- $\\beta$ controls the influence of the KL divergence penalty.\n- $D_{KL}$ is the Kullback-Leibler divergence between the new policy $\\pi_{\\theta}$ and the reference policy $\\pi_{ref}$.\n\n### Key Components\n\n#### Likelihood Ratio $\\rho_i$\n- Measures how much more likely the new policy $\\pi_{\\theta}$ is to produce an output $o_i$ compared to the old policy $\\pi_{\\theta_{old}}$.\n- $$\\rho_i = \\frac{\\pi_{\\theta}(o_i | q)}{\\pi_{\\theta_{old}}(o_i | q)}$$\n\n#### Advantage Function $A_i$\n- Evaluates the benefit of taking action $o_i$ compared to the average action.\n- $$A_i = \\frac{r_i - \\text{mean}(r_1, \\ldots, r_G)}{\\text{std}(r_1, \\ldots, r_G)}$$\n- Where $r_i$ is the reward for the $i$-th action.\n\n#### Clipping Mechanism\n- Restricts the likelihood ratio to the range $[1 - \\epsilon, 1 + \\epsilon]$ to prevent large updates.\n- $$\\text{clip}(\\rho_i, 1 - \\epsilon, 1 + \\epsilon)$$\n\n#### KL Divergence Penalty\n- Ensures the new policy $\\pi_{\\theta}$ does not deviate significantly from the reference policy $\\pi_{ref}$.\n- $$-\\beta D_{KL}(\\pi_{\\theta} \\| \\pi_{ref})$$\n\n### Benefits of GRPO\n\n#### Stability\n- The clipping mechanism prevents drastic policy updates, ensuring stable training.\n\n#### Controlled Exploration\n- The KL divergence penalty maintains a balance between exploring new policies and sticking close to a reliable reference policy.\n\n#### Improved Performance\n- By carefully managing policy updates, GRPO can lead to more effective learning and better policy performance.\n\n### Use Cases\n\n#### Reinforcement Learning Tasks\n- Suitable for environments requiring stable and efficient policy updates.\n- also a key component used for the DeepSeek-R1 model\n\n#### Complex Decision-Making Problems\n- Effective in scenarios with high-dimensional action spaces where maintaining policy stability is crucial.\n\n### Conclusion\n\nGRPO enhances policy optimization in reinforcement learning by combining the benefits of PPO with an additional KL divergence penalty. This ensures that policy updates are both effective and stable, leading to more reliable and performant learning agents.",
"starter_code": "import numpy as np\n\ndef grpo_objective(rhos, A, pi_theta_old, pi_theta_ref, epsilon=0.2, beta=0.01) -> float:\n\t\"\"\"\n\tCompute the GRPO objective function.\n\n\tArgs:\n\t\trhos: List of likelihood ratios (p_i) = pi_theta(o_i | q) / pi_theta_old(o_i | q).\n\t\tA: List of advantage estimates (A_i).\n\t\tpi_theta_old: List representing the old policy probabilities pi_theta_old(o_i | q).\n\t\tpi_theta_ref: List representing the reference policy probabilities pi_ref(o_i | q).\n\t\tepsilon: Clipping parameter (eps).\n\t\tbeta: KL divergence penalty coefficient (beta).\n\n\tReturns:\n\t\tThe computed GRPO objective value.\n\t\"\"\"\n\t# Your code here\n\tpass",
"solution": "import numpy as np\n\ndef grpo_objective(rhos, A, pi_theta_old, pi_theta_ref, epsilon=0.2, beta=0.01) -> float:\n \"\"\"\n Compute the GRPO objective function.\n\n Args:\n rhos: List of likelihood ratios (ρ_i) = π_theta(o_i | q) / π_theta_old(o_i | q).\n A: List of advantage estimates (A_i).\n pi_theta_old: List representing the old policy probabilities π_theta_old(o_i | q).\n pi_theta_ref: List representing the reference policy probabilities π_ref(o_i | q).\n epsilon: Clipping parameter (ϵ).\n beta: KL divergence penalty coefficient (β).\n\n Returns:\n The computed GRPO objective value.\n \"\"\"\n G = len(rhos)\n if not (len(A) == len(pi_theta_old) == len(pi_theta_ref) == G):\n raise ValueError(\"All input lists must have the same length.\")\n \n # Compute clipped likelihood ratios\n clipped_rhos = np.clip(rhos, 1 - epsilon, 1 + epsilon)\n \n # Compute the minimum terms for the objective\n unclipped = np.array(rhos) * np.array(A)\n clipped = clipped_rhos * np.array(A)\n min_terms = np.minimum(unclipped, clipped)\n average_min = np.mean(min_terms)\n \n # Compute pi_theta from rhos and pi_theta_old\n pi_theta = np.array(rhos) * np.array(pi_theta_old)\n \n # Normalize pi_theta and pi_theta_ref to ensure they are valid probability distributions\n pi_theta /= np.sum(pi_theta)\n pi_theta_ref /= np.sum(pi_theta_ref)\n \n # Compute KL divergence D_KL(pi_theta || pi_theta_ref)\n kl_divergence = np.sum(pi_theta * np.log(pi_theta / pi_theta_ref + 1e-10)) # Added epsilon to avoid log(0)\n \n # Compute the final objective\n objective = average_min - beta * kl_divergence\n \n return objective",
"example": {
"input": "grpo_objective([1.2, 0.8, 1.1], [1.0, 1.0, 1.0], [0.9, 1.1, 1.0], [1.0, 0.5, 1.5], epsilon=0.2, beta=0.01)",
"output": "1.032749",
"reasoning": "The function calculates the GRPO objective by first clipping the likelihood ratios, computing the minimum terms, averaging them, and then subtracting the KL divergence penalty scaled by beta."
},
"test_cases": [
{
"test": "print(round(grpo_objective([1.2, 0.8, 1.1], [1.0, 1.0, 1.0], [0.9, 1.1, 1.0], [1.0, 0.5, 1.5], epsilon=0.2, beta=0.01),6))",
"expected_output": "1.032749"
},
{
"test": "print(round(grpo_objective([0.9, 1.1], [1.0, 1.0], [1.0, 1.0], [0.8, 1.2], epsilon=0.1, beta=0.05),6))",
"expected_output": "0.999743"
},
{
"test": "print(round(grpo_objective([1.5, 0.5, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.2, 0.7, 1.3], epsilon=0.15, beta=0.02),6))",
"expected_output": "0.882682"
},
{
"test": "print(round(grpo_objective([1.0], [1.0], [1.0], [1.0], epsilon=0.1, beta=0.01),6))",
"expected_output": "1.0"
}
]
}
5 changes: 5 additions & 0 deletions questions/189_im2col/description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
## Problem

Write a function that implements the Im2Col (Image-to-Column) optimization technique given the 2D input image, convolutional kernel, and stride.

Return them as rows of the 2D flattened array of local images patches. Assume image has already been padded.
5 changes: 5 additions & 0 deletions questions/189_im2col/example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"input": "import numpy as np\n\ninput_matrix = np.array([\n [2,4,6,7],\n [9,2,6,5],\n [1,2,3,4],\n [5,6,7,8]\n])\n\nkernel = np.array([\n [1,0],\n [-1,1]\n])\n\nstride = 1\n\noutput = im2col(input_matrix, kernel, stride)\nprint(output)",
"output": "[[2. 4. 9. 2.]\n [4. 6. 2. 6.]\n [6. 7. 6. 5.]\n [9. 2. 1. 2.]\n [2. 6. 2. 3.]\n [6. 5. 3. 4.]\n [1. 2. 5. 6.]\n [2. 3. 6. 7.]\n [3. 4. 7. 8.]]",
"reasoning": "The function extracts each patch from the input image based on the kernel size and stride, then flattens each patch into a 1D vector. Each row of the output corresponds to one local patch. For example, the kernel is 2x2, so the first window covers [2,4,9,2], then shifts across the image horizontally and vertically until all patches are collected."
}
39 changes: 39 additions & 0 deletions questions/189_im2col/learn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@

# Learn Section

# What is Im2col?
- Also known as Image to Column, *im2col* is a patch extraction operation designed to convert a convolution into a general matrix multiplication (GEMM) problem. It reorganizes overlapping image patches into rows of a 2D matrix so that the convolution kernel can be applied using efficient linear algebra routines.

## Why are we adding a step to convert into matrix multiplication instead of directly doing element-wise convolution?
- Decades of optimization have gone into GPU and CPU matrix multiplication libraries (e.g., cuBLAS, MKL). By reformulating convolution as GEMM, we can leverage these highly optimized routines.
- Element-wise on the other hand, performs worse due to the irregular memory accesses during the convolution operation.
- In the im2col layout, pixels are arranged contiguously in memory, enabling faster and more predictable access patterns.
- Although im2col introduces some data redundancy (overlapping patches share values), the performance gains from contiguous memory access and GEMM vastly outweigh this cost.

# Implementation of Im2col
A simple way to visualize im2col is to imagine **sliding a kernel window** across an image and recording each window as one row of a new matrix.

For example, a 3x3 kernel sliding over a 6x6 image with a stride of 1, would produce a matrix of shape `(9,16)`. This is because the convolution output has:

$$
H_{out} = [\frac{H_{in} - k_{h}}{s}] + 1
$$
$$
W_{out} = [\frac{W_{in} - k_{w}}{s}] + 1
$$

and the flattened patch dimension is:
$$
I_{out} = (C_{in} \cdot k_h \cdot k_w, H_{out} \cdot W_{out})
$$

Where:
- $H_{in}, W_{in}$: Input image dimensions
- $k_h, k_w$: Kernel (filter) height and width
- $s$: Stride
- $C_{in}$: Number of input channels

> **Note:** Padding is assumed to be 0 in these equations

By flattening each local patch, the convolution becomes equivalent to a matrix multiplication between the flattened image matrix (im2col output) and the flattened kernel weights, which allows for GEMM optimizations.

15 changes: 15 additions & 0 deletions questions/189_im2col/meta.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"id": "189",
"title": "Im2Col",
"difficulty": "medium",
"category": "Computer Vision",
"video": "",
"likes": "0",
"dislikes": "0",
"contributor": [
{
"profile_link": "https://github.com/reeeeemo",
"name": "Robert Oxley"
}
]
}
2 changes: 2 additions & 0 deletions questions/189_im2col/pytorch/solution.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def your_function(...):
...
2 changes: 2 additions & 0 deletions questions/189_im2col/pytorch/starter_code.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def your_function(...):
pass
6 changes: 6 additions & 0 deletions questions/189_im2col/pytorch/tests.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[
{
"test": "print(your_function(...))",
"expected_output": "..."
}
]
36 changes: 36 additions & 0 deletions questions/189_im2col/solution.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import numpy as np

def im2col(img: np.ndarray, kernel: np.ndarray, stride: int = 1) -> np.ndarray:
"""
Converts a 2D image into a collection of flattened patches.

Each patch corresponds to a region covered by the convolution kernel determined by the specified stride.

Padding is assumed to be handled externally.
Args:
img (np.ndarray): image to flatten
kernel (np.ndarray): convolution kernel
stride (int): step size between patches
Returns:
np.ndarray: 2D array where each row is a flattened image patch
"""

# unpack shapes
img_h, img_w = img.shape
kern_h, kern_w = kernel.shape

# desired output shape ( (input - filter) / stride ) + 1
out_h = (img_h - kern_h) // stride + 1
out_w = (img_w - kern_w) // stride + 1

cols = np.zeros((out_h * out_w, kern_h * kern_w))

# iterates over every patch (relative to stride), flattens patch into row and inputs into our 2d array as a column
col_idx = 0
for y in range(0, out_h * stride, stride):
for x in range(0, out_w * stride, stride):
patch = img[y:y + kern_h, x:x + kern_w]
cols[col_idx, :] = patch.flatten()
col_idx += 1

return cols
25 changes: 25 additions & 0 deletions questions/189_im2col/starter_code.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import numpy as np

def im2col(img: np.ndarray, kernel: np.ndarray, stride: int = 1) -> np.ndarray:
"""
Converts a 2D image into a collection of flattened patches.

Each patch corresponds to a region covered by the convolution kernel determined by the specified stride.

Padding is assumed to be handled externally.
Args:
img (np.ndarray): image to flatten
kernel (np.ndarray): convolution kernel
stride (int): step size between patches
Returns:
np.ndarray: 2D array where each row is a flattened image patch
"""

# unpack shapes
img_h, img_w = img.shape
kern_h, kern_w = kernel.shape

# TODO: IM2COL algorithm
cols = np.array([])

return cols
Loading