Skip to content

Commit 58a0e36

Browse files
committed
Merge https://github.com/Open-Deep-ML/DML-OpenProblem into add_new_q_mixed_precision_training
2 parents f13d2a3 + 306fe25 commit 58a0e36

File tree

46 files changed

+653
-38
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+653
-38
lines changed

old_repo/Problems/138_gini_impurity/learn.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ A pure node (all one class) has \( G = 0 \), and higher values indicate more cla
2626

2727
---
2828

29-
## Gini Gain for a Split
29+
## Weighted Gini Impurity
3030

3131
Given a feature and a threshold to split the dataset into left and right subsets:
3232

questions/138_find-the-best-gini-based-split-for-a-binary-decisi/learn.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ A pure node (all one class) has \( G = 0 \), and higher values indicate more cla
2626

2727
---
2828

29-
## Gini Gain for a Split
29+
## Weighted Gini Impurity
3030

3131
Given a feature and a threshold to split the dataset into left and right subsets:
3232

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Implement a dropout layer that applies random neuron deactivation during training to prevent overfitting in neural networks. The layer should randomly zero out a proportion of input elements based on a dropout rate p, scale the remaining values by 1/(1-p) to maintain expected values, and pass inputs unchanged during inference. During backpropagation, gradients must be masked with the same dropout pattern and scaled by the same factor to ensure proper gradient flow.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"input": "x = np.array([1.0, 2.0, 3.0, 4.0]), grad = np.array([0.1, 0.2, 0.3, 0.4]), p = 0.5",
3+
"output": "output = array([[2., 0. , 6. , 0. ]]), grad = array([[0.2, 0. , 0.6, 0. ]])",
4+
"reasoning": "The Dropout layer randomly zeroes out elements of the input tensor with probability p during training. To maintain the expected value of the activations, the remaining elements are scaled by a factor of 1 / (1 - p). During inference, Dropout is disabled and the input is passed through unchanged. During backpropagation, the same dropout mask and scaling are applied to the gradients, ensuring the expected gradient magnitude is preserved."
5+
}
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# Implementing Dropout Layer
2+
3+
## Introduction
4+
Dropout is a regularization technique that randomly deactivates neurons during training to prevent overfitting. It forces the network to learn with different neurons and prevents it from becoming too dependent on specific neurons.
5+
6+
## Learning Objectives
7+
- Understand the concept and purpose of dropout
8+
- Learn how dropout works during training and inference
9+
- Implement dropout layer with proper scaling
10+
11+
## Theory
12+
During training, dropout randomly sets a proportion of inputs to zero and scales up the remaining values to maintain the expected value. The mathematical formulation is:
13+
14+
During training:
15+
16+
$y = \dfrac{x \odot m}{1-p}$
17+
18+
During inference:
19+
20+
$y = x$
21+
22+
During backpropagation:
23+
24+
$grad = \dfrac{grad \odot m}{1-p}$
25+
26+
Where:
27+
- $x$ is the input vector
28+
- $m$ is a binary mask vector sampled from Bernoulli(p)
29+
- $\odot$ represents element-wise multiplication
30+
- $p$ is the dropout rate (probability of keeping a neuron)
31+
32+
The mask $m$ is randomly generated for each forward pass during training and is stored in memory to be used in the corresponding backward pass. This ensures that the same neurons are dropped during both forward and backward propagation for a given input.
33+
34+
The scaling factor $\frac{1}{1-p}$ during training ensures that the expected value of the output matches the input, making the network's behavior consistent between training and inference.
35+
36+
During backpropagation, the gradients must also be scaled by the same factor $\frac{1}{1-p}$ to maintain the correct gradient flow.
37+
38+
Dropout acts as a form of regularization by:
39+
1. Preventing co-adaptation of neurons, forcing them to learn more robust features that are useful in combination with many different random subsets of other neurons
40+
2. Creating an implicit ensemble of networks, as each forward pass uses a different subset of neurons, effectively training multiple networks that share parameters
41+
3. Reducing the effective capacity of the network during training, which helps prevent overfitting by making the model less likely to memorize the training data
42+
43+
Read more at:
44+
45+
1. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15(1), 1929-1958. [PDF](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)
46+
47+
## Problem Statement
48+
Implement a dropout layer class that can be used during both training and inference phases of a neural network. The implementation should:
49+
50+
1. Apply dropout during training by randomly zeroing out elements
51+
2. Scale the remaining values appropriately to maintain expected values
52+
3. Pass through inputs unchanged during inference
53+
4. Support backpropagation by storing and using the dropout mask
54+
55+
### Requirements
56+
The `DropoutLayer` class should implement:
57+
58+
1. `__init__(p: float)`: Initialize with dropout probability p
59+
2. `forward(x: np.ndarray, training: bool = True) -> np.ndarray`: Apply dropout during forward pass
60+
3. `backward(grad: np.ndarray) -> np.ndarray`: Handle gradient flow during backpropagation
61+
62+
### Input Parameters
63+
- `p`: Dropout rate (probability of keeping a neuron), must be between 0 and 1
64+
- `x`: Input tensor of any shape
65+
- `training`: Boolean flag indicating if in training mode
66+
- `grad`: Gradient tensor during backpropagation
67+
68+
### Output
69+
- Forward pass: Tensor of same shape as input with dropout applied
70+
- Backward pass: Gradient tensor with dropout mask applied
71+
72+
## Example
73+
```python
74+
# Example usage:
75+
x = np.array([1.0, 2.0, 3.0, 4.0])
76+
grad = np.array([0.1, 0.2, 0.3, 0.4])
77+
p = 0.5 # 50% dropout rate
78+
79+
# During training
80+
output_train = dropout_layer(x, p, training=True)
81+
82+
# During inference
83+
output_inference = dropout_layer(x, p, training=False)
84+
85+
# Backward
86+
grad_back = dropout.backward(grad)
87+
```
88+
89+
## Tips
90+
- Use numpy's random binomial generator for creating the mask
91+
- Remember to scale up the output and gradients during training by 1/(1-p)
92+
- Test with different dropout rates (typically between 0.2 and 0.5)
93+
- Verify that the expected value of the output matches the input
94+
95+
## Common Pitfalls
96+
- Using the same mask for all examples in a batch
97+
- Setting dropout rate too high (can lead to underfitting)
98+
99+
---
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"id": "151",
3+
"title": "Dropout Layer",
4+
"difficulty": "medium",
5+
"category": "Deep Learning",
6+
"video": "",
7+
"likes": "0",
8+
"dislikes": "0",
9+
"contributor": [
10+
{
11+
"profile_link": "https://github.com/mavleo96",
12+
"name": "Vijayabharathi Murugan"
13+
}
14+
],
15+
"tinygrad_difficulty": null,
16+
"pytorch_difficulty": "easy"
17+
}
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
import torch
2+
import torch.nn as nn
3+
4+
class DropoutLayer(nn.Module):
5+
def __init__(self, p: float):
6+
"""Initialize the dropout layer."""
7+
super(DropoutLayer, self).__init__()
8+
if p < 0 or p >= 1:
9+
raise ValueError("Dropout rate must be between 0 and 1 (1-exclusive)")
10+
11+
self.p = p
12+
self.mask = None
13+
14+
def forward(self, x: torch.Tensor, training: bool = True) -> torch.Tensor:
15+
"""Forward pass of the dropout layer."""
16+
if not training:
17+
return x
18+
19+
self.mask = torch.bernoulli(torch.ones_like(x) * (1 - self.p))
20+
21+
return x * self.mask / (1 - self.p)
22+
23+
def backward(self, grad: torch.Tensor) -> torch.Tensor:
24+
"""Backward pass of the dropout layer."""
25+
if self.mask is None:
26+
raise ValueError("Forward pass must be called before backward pass")
27+
28+
return grad * self.mask / (1 - self.p)
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
import torch
2+
3+
class DropoutLayer:
4+
def __init__(self, p: float):
5+
"""Initialize the dropout layer."""
6+
# Your code here
7+
8+
9+
def forward(self, x: torch.Tensor, training: bool = True) -> torch.Tensor:
10+
"""Forward pass of the dropout layer."""
11+
# Your code here
12+
13+
14+
def backward(self, grad: torch.Tensor) -> torch.Tensor:
15+
"""Backward pass of the dropout layer."""
16+
# Your code here
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
[
2+
{
3+
"test": "import torch\ntorch.manual_seed(42)\nx = torch.tensor([[1.0, 2.0], [3.0, 4.0]])\ngrad = torch.tensor([[0.5, 0.2], [1.0, 2.0]])\n\ndropout = DropoutLayer(0.2)\n\nprint(dropout.forward(x, training=True), dropout.forward(x, training=False), dropout.backward(grad))",
4+
"expected_output": "(tensor([[0., 0.], [3.75, 0.]]), tensor([[1.0, 2.0], [3.0, 4.0]]), tensor([[0., 0.], [1.25, 0.]]))"
5+
},
6+
{
7+
"test": "import torch\ntorch.manual_seed(42)\nx = torch.ones((1000, 1000))\ndropout = DropoutLayer(0.2)\n\n_ = dropout.forward(x, training=True)\nmask1 = dropout.mask.clone()\n_ = dropout.forward(x, training=True)\nmask2 = dropout.mask.clone()\nprint(mask1.equal(mask2))",
8+
"expected_output": "False"
9+
},
10+
{
11+
"test": "import torch\ntorch.manual_seed(42)\nx = torch.ones((1000, 1000))\ndropout = DropoutLayer(0.3)\noutput_train = dropout.forward(x, training=True)\nmean_output = torch.mean(output_train)\nprint(abs(mean_output - 1.0).item() < 0.1)",
12+
"expected_output": "True"
13+
},
14+
{
15+
"test": "p = 1.5\ntry:\n dropout = DropoutLayer(p)\n raise AssertionError('Expected ValueError for p = 1.5')\nexcept ValueError:\n pass\np = -0.5\ntry:\n dropout = DropoutLayer(p)\n raise AssertionError('Expected ValueError for p = -0.5')\nexcept ValueError:\n pass\nprint('All tests passed')",
16+
"expected_output": "All tests passed"
17+
}
18+
]
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
import numpy as np
2+
3+
class DropoutLayer:
4+
def __init__(self, p: float):
5+
"""Initialize the dropout layer."""
6+
if p < 0 or p >= 1:
7+
raise ValueError("Dropout rate must be between 0 and 1 (1-exclusive)")
8+
9+
self.p = p
10+
self.mask = None
11+
12+
def forward(self, x: np.ndarray, training: bool = True) -> np.ndarray:
13+
"""Forward pass of the dropout layer."""
14+
if not training:
15+
return x
16+
17+
self.mask = np.random.binomial(1, 1 - self.p, x.shape)
18+
19+
return x * self.mask / (1 - self.p)
20+
21+
def backward(self, grad: np.ndarray) -> np.ndarray:
22+
"""Backward pass of the dropout layer."""
23+
if self.mask is None:
24+
raise ValueError("Forward pass must be called before backward pass")
25+
26+
return grad * self.mask / (1 - self.p)

0 commit comments

Comments
 (0)