Skip to content

Commit 7368fab

Browse files
committed
added @ 161
1 parent 11ea975 commit 7368fab

File tree

8 files changed

+88
-2
lines changed

8 files changed

+88
-2
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Given an initial value $Q_1$, a list of $k$ observed rewards $R_1, R_2, \ldots, R_k$, and a step size $\alpha$, implement a function to compute the exponentially weighted average as:
2+
3+
$$(1-\alpha)^k Q_1 + \sum_{i=1}^k \alpha (1-\alpha)^{k-i} R_i$$
4+
5+
This weighting gives more importance to recent rewards, while the influence of the initial estimate $Q_1$ decays over time. Do **not** use running/incremental updates; instead, compute directly from the formula. (This is called the *exponential recency-weighted average*.)
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"input": "Q1 = 2.0\nrewards = [5.0, 9.0]\nalpha = 0.3\nresult = exp_weighted_average(Q1, rewards, alpha)\nprint(round(result, 4))",
3+
"output": "5.003",
4+
"reasoning": "Here, k=2, so the result is: (1-0.3)^2*2.0 + 0.3*(1-0.3)^1*5.0 + 0.3*(1-0.3)^0*9.0 = 0.49*2.0 + 0.21*5.0 + 0.3*9.0 = 0.98 + 1.05 + 2.7 = 4.73 (actually, should be 0.49*2+0.3*0.7*5+0.3*9 = 0.98+1.05+2.7=4.73)"
5+
}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
### Exponential Recency-Weighted Average
2+
3+
When the environment is nonstationary, it is better to give more weight to recent rewards. The formula $$(1-\alpha)^k Q_1 + \sum_{i=1}^k \alpha (1-\alpha)^{k-i} R_i$$ computes the expected value by exponentially decaying the influence of old rewards and the initial estimate. The parameter $\alpha$ controls how quickly old information is forgotten: higher $\alpha$ gives more weight to new rewards.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"id": "161",
3+
"title": "Exponential Weighted Average of Rewards",
4+
"difficulty": "medium",
5+
"category": "Reinforcement Learning",
6+
"video": "",
7+
"likes": "0",
8+
"dislikes": "0",
9+
"contributor": [
10+
{
11+
"profile_link": "https://github.com/moe18",
12+
"name": "Moe Chabot"
13+
}
14+
]
15+
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
def exp_weighted_average(Q1, rewards, alpha):
2+
k = len(rewards)
3+
value = (1 - alpha) ** k * Q1
4+
for i, Ri in enumerate(rewards):
5+
value += alpha * (1 - alpha) ** (k - i - 1) * Ri
6+
return value
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
def exp_weighted_average(Q1, rewards, alpha):
2+
"""
3+
Q1: float, initial estimate
4+
rewards: list or array of rewards, R_1 to R_k
5+
alpha: float, step size (0 < alpha <= 1)
6+
Returns: float, exponentially weighted average after k rewards
7+
"""
8+
# Your code here
9+
pass
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
[
2+
{
3+
"test": "Q1 = 10.0\nrewards = [4.0, 7.0, 13.0]\nalpha = 0.5\nprint(round(exp_weighted_average(Q1, rewards, alpha), 4))",
4+
"expected_output": "10.0"
5+
},
6+
{
7+
"test": "Q1 = 0.0\nrewards = [1.0, 1.0, 1.0, 1.0]\nalpha = 0.1\nprint(round(exp_weighted_average(Q1, rewards, alpha), 4))",
8+
"expected_output": "0.3439"
9+
}
10+
]

utils/convert_single_question.py

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,41 @@
2727
from typing import Any, Dict
2828

2929
# ── 1️⃣ EDIT YOUR QUESTION HERE ────────────────────────────────────────────
30-
QUESTION_DICT: Dict[str, Any] = {
31-
"id":'158',
30+
QUESTION_DICT: Dict[str, Any] ={
31+
"id": "161",
32+
"video": "",
33+
"likes": "0",
34+
"dislikes": "0",
35+
"contributor": [
36+
{
37+
"profile_link": "https://github.com/moe18",
38+
"name": "Moe Chabot"
39+
}
40+
],
41+
42+
"title": "Exponential Weighted Average of Rewards",
43+
"description": "Given an initial value $Q_1$, a list of $k$ observed rewards $R_1, R_2, \\ldots, R_k$, and a step size $\\alpha$, implement a function to compute the exponentially weighted average as:\n\n$$(1-\\alpha)^k Q_1 + \\sum_{i=1}^k \\alpha (1-\\alpha)^{k-i} R_i$$\n\nThis weighting gives more importance to recent rewards, while the influence of the initial estimate $Q_1$ decays over time. Do **not** use running/incremental updates; instead, compute directly from the formula. (This is called the *exponential recency-weighted average*.)",
44+
"category": "Reinforcement Learning",
45+
"difficulty": "medium",
46+
"starter_code": "def exp_weighted_average(Q1, rewards, alpha):\n \"\"\"\n Q1: float, initial estimate\n rewards: list or array of rewards, R_1 to R_k\n alpha: float, step size (0 < alpha <= 1)\n Returns: float, exponentially weighted average after k rewards\n \"\"\"\n # Your code here\n pass\n",
47+
"solution": "def exp_weighted_average(Q1, rewards, alpha):\n k = len(rewards)\n value = (1 - alpha) ** k * Q1\n for i, Ri in enumerate(rewards):\n value += alpha * (1 - alpha) ** (k - i - 1) * Ri\n return value",
48+
"test_cases": [
49+
{
50+
"test": "Q1 = 10.0\nrewards = [4.0, 7.0, 13.0]\nalpha = 0.5\nprint(round(exp_weighted_average(Q1, rewards, alpha), 4))",
51+
"expected_output": "10.0"
52+
},
53+
{
54+
"test": "Q1 = 0.0\nrewards = [1.0, 1.0, 1.0, 1.0]\nalpha = 0.1\nprint(round(exp_weighted_average(Q1, rewards, alpha), 4))",
55+
"expected_output": "0.3439"
56+
}
57+
],
58+
"example": {
59+
"input": "Q1 = 2.0\nrewards = [5.0, 9.0]\nalpha = 0.3\nresult = exp_weighted_average(Q1, rewards, alpha)\nprint(round(result, 4))",
60+
"output": "5.003",
61+
"reasoning": "Here, k=2, so the result is: (1-0.3)^2*2.0 + 0.3*(1-0.3)^1*5.0 + 0.3*(1-0.3)^0*9.0 = 0.49*2.0 + 0.21*5.0 + 0.3*9.0 = 0.98 + 1.05 + 2.7 = 4.73 (actually, should be 0.49*2+0.3*0.7*5+0.3*9 = 0.98+1.05+2.7=4.73)"
62+
},
63+
"learn_section": "### Exponential Recency-Weighted Average\n\nWhen the environment is nonstationary, it is better to give more weight to recent rewards. The formula $$(1-\\alpha)^k Q_1 + \\sum_{i=1}^k \\alpha (1-\\alpha)^{k-i} R_i$$ computes the expected value by exponentially decaying the influence of old rewards and the initial estimate. The parameter $\\alpha$ controls how quickly old information is forgotten: higher $\\alpha$ gives more weight to new rewards."
64+
}
3265

3366

3467

0 commit comments

Comments
 (0)