Skip to content

Commit ca5ad32

Browse files
authored
Merge pull request #461 from mavleo96/nesterov
Nesterov Accelerated Gradient Optimizer
2 parents 66930fc + c8cb59f commit ca5ad32

File tree

7 files changed

+161
-0
lines changed

7 files changed

+161
-0
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Implement the Nesterov Accelerated Gradient (NAG) optimizer update step function. Your function should take the current parameter value, gradient function, and velocity as inputs, and return the updated parameter value and new velocity. The function should use the "look-ahead" approach where momentum is applied before computing the gradient, and should handle both scalar and array inputs.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"input": "parameter = 1.0, grad_fn = lambda x: x, velocity = 0.1",
3+
"output": "(0.9009, 0.0991)",
4+
"reasoning": "The Nesterov Accelerated Gradient optimizer computes updated values for the parameter and velocity using a look-ahead approach. With input values parameter=1.0, grad_fn=lambda x: x, and velocity=0.1, the updated parameter becomes 0.9009 and the updated velocity becomes 0.0991."
5+
}
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Implementing Nesterov Accelerated Gradient (NAG) Optimizer
2+
3+
## Introduction
4+
Nesterov Accelerated Gradient (NAG) is an improvement over classical momentum optimization. While momentum helps accelerate gradient descent in the relevant direction, NAG takes this a step further by looking ahead in the direction of the momentum before computing the gradient. This "look-ahead" property helps NAG make more informed updates and often leads to better convergence.
5+
6+
## Learning Objectives
7+
- Understand how Nesterov Accelerated Gradient optimization works
8+
- Learn to implement NAG-based gradient updates
9+
- Understand the advantages of NAG over classical momentum
10+
- Gain practical experience with advanced gradient-based optimization
11+
12+
## Theory
13+
Nesterov Accelerated Gradient uses a "look-ahead" approach where it first makes a momentum-based step and then computes the gradient at that position. The key equations are:
14+
15+
$\theta_{lookahead, t-1} = \theta_{t-1} - \gamma v_{t-1}$ (Look-ahead position)
16+
17+
$v_t = \gamma v_{t-1} + \eta \nabla_\theta J(\theta_{lookahead, t-1})$ (Velocity update)
18+
19+
$\theta_t = \theta_{t-1} - v_t$ (Parameter update)
20+
21+
Where:
22+
- $v_t$ is the velocity at time t
23+
- $\gamma$ is the momentum coefficient (typically 0.9)
24+
- $\eta$ is the learning rate
25+
- $\nabla_\theta J(\theta)$ is the gradient of the loss function
26+
27+
The key difference from classical momentum is that the gradient is evaluated at $\theta_{lookahead, t-1}$ instead of $\theta_{t-1}$
28+
29+
Read more at:
30+
31+
1. Nesterov, Y. (1983). A method for solving the convex programming problem with convergence rate O(1/k²). Doklady Akademii Nauk SSSR, 269(3), 543-547.
32+
2. Ruder, S. (2017). An overview of gradient descent optimization algorithms. [arXiv:1609.04747](https://arxiv.org/pdf/1609.04747)
33+
34+
35+
## Problem Statement
36+
Implement the Nesterov Accelerated Gradient optimizer update step function. Your function should take the current parameter value, gradient function, and velocity as inputs, and return the updated parameter value and new velocity.
37+
38+
### Input Format
39+
The function should accept:
40+
- parameter: Current parameter value
41+
- gradient function: A function that accepts parameters and returns gradient computed at that point
42+
- velocity: Current velocity
43+
- learning_rate: Learning rate (default=0.01)
44+
- momentum: Momentum coefficient (default=0.9)
45+
46+
### Output Format
47+
Return tuple: (updated_parameter, updated_velocity)
48+
49+
## Example
50+
```python
51+
# Example usage:
52+
def grad_func(parameter):
53+
# Returns gradient
54+
pass
55+
56+
parameter = 1.0
57+
velocity = 0.1
58+
59+
new_param, new_velocity = nag_optimizer(parameter, grad_func, velocity)
60+
```
61+
62+
## Tips
63+
- Initialize velocity as zero
64+
- Use numpy for numerical operations
65+
- Test with both scalar and array inputs
66+
- Remember that the gradient should be computed at the look-ahead position
67+
68+
---
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"id": "150",
3+
"title": "Nesterov Accelerated Gradient Optimizer",
4+
"difficulty": "easy",
5+
"category": "Deep Learning",
6+
"video": "",
7+
"likes": "0",
8+
"dislikes": "0",
9+
"contributor": [
10+
{
11+
"profile_link": "https://github.com/mavleo96",
12+
"name": "Vijayabharathi Murugan"
13+
}
14+
],
15+
"tinygrad_difficulty": null,
16+
"pytorch_difficulty": null
17+
}
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
import numpy as np
2+
3+
def nag_optimizer(parameter, grad_fn, velocity, learning_rate=0.01, momentum=0.9):
4+
"""
5+
Update parameters using the Nesterov Accelerated Gradient optimizer.
6+
Uses a "look-ahead" approach to improve convergence by applying momentum before computing the gradient.
7+
8+
Args:
9+
parameter: Current parameter value
10+
grad_fn: Function that computes the gradient at a given position
11+
velocity: Current velocity (momentum term)
12+
learning_rate: Learning rate (default=0.01)
13+
momentum: Momentum coefficient (default=0.9)
14+
15+
Returns:
16+
tuple: (updated_parameter, updated_velocity)
17+
"""
18+
assert 0 <= momentum < 1, "Momentum must be between 0 and 1"
19+
assert learning_rate > 0, "Learning rate must be positive"
20+
21+
# Compute look-ahead position
22+
look_ahead = parameter - momentum * velocity
23+
24+
# Compute gradient at look-ahead position
25+
grad = grad_fn(look_ahead)
26+
27+
# Update velocity using momentum and gradient
28+
velocity = momentum * velocity + learning_rate * grad
29+
30+
# Update parameters using the new velocity
31+
parameter = parameter - velocity
32+
33+
return np.round(parameter, 5), np.round(velocity, 5)
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
import numpy as np
2+
3+
def nag_optimizer(parameter, grad_fn, velocity, learning_rate=0.01, momentum=0.9):
4+
"""
5+
Update parameters using the Nesterov Accelerated Gradient optimizer.
6+
Uses a "look-ahead" approach to improve convergence by applying momentum before computing the gradient.
7+
8+
Args:
9+
parameter: Current parameter value
10+
grad_fn: Function that computes the gradient at a given position
11+
velocity: Current velocity (momentum term)
12+
learning_rate: Learning rate (default=0.01)
13+
momentum: Momentum coefficient (default=0.9)
14+
15+
Returns:
16+
tuple: (updated_parameter, updated_velocity)
17+
"""
18+
# Your code here
19+
return np.round(parameter, 5), np.round(velocity, 5)
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
[
2+
{
3+
"test": "import numpy as np\ndef gradient_function(x):\n if isinstance(x, np.ndarray):\n n = len(x)\n return x - np.arange(n)\n else:\n return x - 0\nprint(nag_optimizer(1., gradient_function, 0.5, 0.01, 0.9))",
4+
"expected_output": "(0.5445, 0.4555)"
5+
},
6+
{
7+
"test": "import numpy as np\ndef gradient_function(x):\n if isinstance(x, np.ndarray):\n n = len(x)\n return x - np.arange(n)\n else:\n return x - 0\nprint(nag_optimizer(np.array([1.0, 2.0]), gradient_function, np.array([0.5, 1.0]), 0.01, 0.9))",
8+
"expected_output": "(array([0.5445, 1.099]), array([0.4555, 0.901]))"
9+
},
10+
{
11+
"test": "import numpy as np\ndef gradient_function(x):\n if isinstance(x, np.ndarray):\n n = len(x)\n return x - np.arange(n)\n else:\n return x - 0\nprint(nag_optimizer(np.array([1.0, 2.0]), gradient_function, np.array([0.5, 1.0]), 0.01, 0.0))",
12+
"expected_output": "(array([0.99, 1.99]), array([0.01, 0.01]))"
13+
},
14+
{
15+
"test": "import numpy as np\ndef gradient_function(x):\n if isinstance(x, np.ndarray):\n n = len(x)\n return x - np.arange(n)\n else:\n return x - 0\nprint(nag_optimizer(0.9, gradient_function, 1, 0.01, 0.9))",
16+
"expected_output": "(0.0, 0.9)"
17+
}
18+
]

0 commit comments

Comments
 (0)