Skip to content

Commit b1ed706

Browse files
authored
Merge pull request #456 from mavleo96/adagrad
Adagrad Optimizer
2 parents d2f20b5 + 0d3c6d0 commit b1ed706

File tree

7 files changed

+148
-0
lines changed

7 files changed

+148
-0
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Implement the Adagrad optimizer update step function. Your function should take the current parameter value, gradient, and accumulated squared gradients as inputs, and return the updated parameter value and new accumulated squared gradients. The function should also handle scalar and array inputs, and include proper input validation.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"input": "parameter = 1.0, grad = 0.1, G = 1.0",
3+
"output": "(0.999, 1.01)",
4+
"reasoning": "The Adagrad optimizer computes updated values for the parameter and the accumulated squared gradients. With input values parameter=1.0, grad=0.1, and G=1.0, the updated parameter becomes 0.999 and the updated G becomes 1.01."
5+
}
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Implementing Adagrad Optimizer
2+
3+
## Introduction
4+
Adagrad (Adaptive Gradient Algorithm) is an optimization algorithm that adapts the learning rate to each parameter, performing larger updates for infrequent parameters and smaller updates for frequent ones. This makes it particularly well-suited for dealing with sparse data.
5+
6+
## Learning Objectives
7+
- Understand how Adagrad optimizer works
8+
- Learn to implement adaptive learning rates
9+
- Gain practical experience with gradient-based optimization
10+
11+
## Theory
12+
Adagrad adapts the learning rate for each parameter based on the historical gradients. The key equations are:
13+
14+
$G_t = G_{t-1} + g_t^2$ (Accumulated squared gradients)
15+
16+
$\theta_t = \theta_{t-1} - \dfrac{\alpha}{\sqrt{G_t} + \epsilon} \cdot g_t$ (Parameter update)
17+
18+
Where:
19+
- $G_t$ is the sum of squared gradients up to time step t
20+
- $\alpha$ is the initial learning rate
21+
- $\epsilon$ is a small constant for numerical stability
22+
- $g_t$ is the gradient at time step t
23+
24+
Read more at:
25+
26+
1. Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121–2159. [PDF](https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
27+
2. Ruder, S. (2017). An overview of gradient descent optimization algorithms. [arXiv:1609.04747](https://arxiv.org/pdf/1609.04747)
28+
29+
30+
## Problem Statement
31+
Implement the Adagrad optimizer update step function. Your function should take the current parameter value, gradient, and accumulated squared gradients as inputs, and return the updated parameter value and new accumulated squared gradients.
32+
33+
### Input Format
34+
The function should accept:
35+
- parameter: Current parameter value
36+
- grad: Current gradient
37+
- G: Accumulated squared gradients
38+
- learning_rate: Learning rate (default=0.01)
39+
- epsilon: Small constant for numerical stability (default=1e-8)
40+
41+
### Output Format
42+
Return tuple: (updated_parameter, updated_G)
43+
44+
## Example
45+
```python
46+
# Example usage:
47+
parameter = 1.0
48+
grad = 0.1
49+
G = 1.0
50+
51+
new_param, new_G = adagrad_optimizer(parameter, grad, G)
52+
```
53+
54+
## Tips
55+
- Initialize G as zeros
56+
- Use numpy for numerical operations
57+
- Test with both scalar and array inputs
58+
59+
---
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"id": "145",
3+
"title": "Adagrad Optimizer",
4+
"difficulty": "easy",
5+
"category": "Deep Learning",
6+
"video": "",
7+
"likes": "0",
8+
"dislikes": "0",
9+
"contributor": [
10+
{
11+
"profile_link": "https://github.com/mavleo96",
12+
"name": "Vijayabharathi Murugan"
13+
}
14+
],
15+
"tinygrad_difficulty": null,
16+
"pytorch_difficulty": null
17+
}
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
import numpy as np
2+
3+
def adagrad_optimizer(parameter, grad, G, learning_rate=0.01, epsilon=1e-8):
4+
"""
5+
Update parameters using the Adagrad optimizer.
6+
Adapts the learning rate for each parameter based on the historical gradients.
7+
8+
Args:
9+
parameter: Current parameter value
10+
grad: Current gradient
11+
G: Accumulated squared gradients
12+
learning_rate: Learning rate (default=0.01)
13+
epsilon: Small constant for numerical stability (default=1e-8)
14+
15+
Returns:
16+
tuple: (updated_parameter, updated_G)
17+
"""
18+
assert learning_rate > 0, "Learning rate must be positive"
19+
assert epsilon > 0, "Epsilon must be positive"
20+
assert all(G >= 0) if isinstance(G, np.ndarray) else G >= 0, "G must be non-negative"
21+
22+
# Update accumulated squared gradients
23+
G = G + grad**2
24+
25+
# Update parameters using adaptive learning rate
26+
update = learning_rate * grad / (np.sqrt(G) + epsilon)
27+
parameter = parameter - update
28+
29+
return np.round(parameter, 5), np.round(G, 5)
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
import numpy as np
2+
3+
def adagrad_optimizer(parameter, grad, G, learning_rate=0.01, epsilon=1e-8):
4+
"""
5+
Update parameters using the Adagrad optimizer.
6+
Adapts the learning rate for each parameter based on the historical gradients.
7+
8+
Args:
9+
parameter: Current parameter value
10+
grad: Current gradient
11+
G: Accumulated squared gradients
12+
learning_rate: Learning rate (default=0.01)
13+
epsilon: Small constant for numerical stability (default=1e-8)
14+
15+
Returns:
16+
tuple: (updated_parameter, updated_G)
17+
"""
18+
# Your code here
19+
return np.round(parameter, 5), np.round(G, 5)
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
[
2+
{
3+
"test": "print(adagrad_optimizer(1., 0.5, 1., 0.01, 1e-8))",
4+
"expected_output": "(0.99553, 1.25)"
5+
},
6+
{
7+
"test": "print(adagrad_optimizer(np.array([1., 2.]), np.array([0.1, 0.2]), np.array([1., 1.]), 0.01, 1e-8))",
8+
"expected_output": "(array([0.999, 1.99804]), array([1.01, 1.04]))"
9+
},
10+
{
11+
"test": "print(adagrad_optimizer(np.array([1., 2.]), np.array([0., 0.2]), np.array([0., 1.]), 0.01, 1e-8))",
12+
"expected_output": "(array([1., 1.99804]), array([0., 1.04]))"
13+
},
14+
{
15+
"test": "print(adagrad_optimizer(np.array([1., 1.]), np.array([1., 1.]), np.array([10000., 1.]), 0.01, 1e-8))",
16+
"expected_output": "(array([0.9999, 0.99293]), array([10001., 2.]))"
17+
}
18+
]

0 commit comments

Comments
 (0)