Skip to content

Commit 8317003

Browse files
authored
Merge pull request #457 from mavleo96/momentum
Momentum Optimizer
2 parents b1ed706 + 83d2752 commit 8317003

File tree

7 files changed

+145
-0
lines changed

7 files changed

+145
-0
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Implement the momentum optimizer update step function. Your function should take the current parameter value, gradient, and velocity as inputs, and return the updated parameter value and new velocity. The function should also handle scalar and array inputs.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"input": "parameter = 1.0, grad = 0.1, velocity = 0.1",
3+
"output": "(0.909, 0.091)",
4+
"reasoning": "The momentum optimizer computes updated values for the parameter and the velocity. With input values parameter=1.0, grad=0.1, and velocity=0.1, the updated parameter becomes 0.909 and the updated velocity becomes 0.091."
5+
}
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Implementing Momentum Optimizer
2+
3+
## Introduction
4+
Momentum is a popular optimization technique that helps accelerate gradient descent in the relevant direction and dampens oscillations. It works by adding a fraction of the previous update vector to the current gradient.
5+
6+
## Learning Objectives
7+
- Understand how momentum optimization works
8+
- Learn to implement momentum-based gradient updates
9+
- Understand the effect of momentum on optimization
10+
11+
## Theory
12+
Momentum optimization uses a moving average of gradients to determine the direction of the update. The key equations are:
13+
14+
$v_t = \gamma v_{t-1} + \eta \nabla_\theta J(\theta)$ (Velocity update)
15+
16+
$\theta_t = \theta_{t-1} - v_t$ (Parameter update)
17+
18+
Where:
19+
- $v_t$ is the velocity at time t
20+
- $\gamma$ is the momentum coefficient (typically 0.9)
21+
- $\eta$ is the learning rate
22+
- $\nabla_\theta J(\theta)$ is the gradient of the loss function
23+
24+
Read more at:
25+
26+
1. Ruder, S. (2017). An overview of gradient descent optimization algorithms. [arXiv:1609.04747](https://arxiv.org/pdf/1609.04747)
27+
28+
29+
## Problem Statement
30+
Implement the momentum optimizer update step function. Your function should take the current parameter value, gradient, and velocity as inputs, and return the updated parameter value and new velocity.
31+
32+
### Input Format
33+
The function should accept:
34+
- parameter: Current parameter value
35+
- grad: Current gradient
36+
- velocity: Current velocity
37+
- learning_rate: Learning rate (default=0.01)
38+
- momentum: Momentum coefficient (default=0.9)
39+
40+
### Output Format
41+
Return tuple: (updated_parameter, updated_velocity)
42+
43+
## Example
44+
```python
45+
# Example usage:
46+
parameter = 1.0
47+
grad = 0.1
48+
velocity = 0.1
49+
50+
new_param, new_velocity = momentum_optimizer(parameter, grad, velocity)
51+
```
52+
53+
## Tips
54+
- Initialize velocity as zero
55+
- Use numpy for numerical operations
56+
- Test with both scalar and array inputs
57+
58+
---
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"id": "146",
3+
"title": "Momentum Optimizer",
4+
"difficulty": "easy",
5+
"category": "Deep Learning",
6+
"video": "",
7+
"likes": "0",
8+
"dislikes": "0",
9+
"contributor": [
10+
{
11+
"profile_link": "https://github.com/mavleo96",
12+
"name": "Vijayabharathi Murugan"
13+
}
14+
],
15+
"tinygrad_difficulty": null,
16+
"pytorch_difficulty": null
17+
}
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
import numpy as np
2+
3+
def momentum_optimizer(parameter, grad, velocity, learning_rate=0.01, momentum=0.9):
4+
"""
5+
Update parameters using the momentum optimizer.
6+
Uses momentum to accelerate learning in relevant directions and dampen oscillations.
7+
8+
Args:
9+
parameter: Current parameter value
10+
grad: Current gradient
11+
velocity: Current velocity/momentum term
12+
learning_rate: Learning rate (default=0.01)
13+
momentum: Momentum coefficient (default=0.9)
14+
15+
Returns:
16+
tuple: (updated_parameter, updated_velocity)
17+
"""
18+
assert learning_rate > 0, "Learning rate must be positive"
19+
assert 0 <= momentum < 1, "Momentum must be between 0 and 1"
20+
21+
# Update velocity
22+
velocity = momentum * velocity + learning_rate * grad
23+
24+
# Update parameters
25+
parameter = parameter - velocity
26+
27+
return np.round(parameter, 5), np.round(velocity, 5)
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
import numpy as np
2+
3+
def momentum_optimizer(parameter, grad, velocity, learning_rate=0.01, momentum=0.9):
4+
"""
5+
Update parameters using the momentum optimizer.
6+
Uses momentum to accelerate learning in relevant directions and dampen oscillations.
7+
8+
Args:
9+
parameter: Current parameter value
10+
grad: Current gradient
11+
velocity: Current velocity/momentum term
12+
learning_rate: Learning rate (default=0.01)
13+
momentum: Momentum coefficient (default=0.9)
14+
15+
Returns:
16+
tuple: (updated_parameter, updated_velocity)
17+
"""
18+
# Your code here
19+
return np.round(parameter, 5), np.round(velocity, 5)
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
[
2+
{
3+
"test": "print(momentum_optimizer(1., 0.1, 0.5, 0.01, 0.9))",
4+
"expected_output": "(0.549, 0.451)"
5+
},
6+
{
7+
"test": "print(momentum_optimizer(np.array([1., 2.]), np.array([0.1, 0.2]), np.array([0.5, 1.0]), 0.01, 0.9))",
8+
"expected_output": "(array([0.549, 1.098]), array([0.451, 0.902]))"
9+
},
10+
{
11+
"test": "print(momentum_optimizer(np.array([1., 2.]), np.array([0.1, 0.2]), np.array([0.5, 1.0]), 0.01, 0.))",
12+
"expected_output": "(array([0.999, 1.998]), array([0.001, 0.002]))"
13+
},
14+
{
15+
"test": "print(momentum_optimizer(np.array([1., 2.]), np.array([0., 0.]), np.array([0.5, 0.5]), 0.01, 0.9))",
16+
"expected_output": "(array([0.55, 1.55]), array([0.45, 0.45]))"
17+
}
18+
]

0 commit comments

Comments
 (0)