From 0e1a6e72a9e7346a5c32926ca44943af3422c61b Mon Sep 17 00:00:00 2001
From: Amir Mohammad Fakhimi <fakhimi.amirmohamad@gmail.com>
Date: Fri, 28 Feb 2025 21:14:53 +0330
Subject: [PATCH 1/4] Fixing some writing mistakes and a problem in An Example
 of Gridworld part in intro-to-rl.md

---
 docs/course_notes/intro-to-rl.md | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/docs/course_notes/intro-to-rl.md b/docs/course_notes/intro-to-rl.md
index 48de3094..3b4836cd 100644
--- a/docs/course_notes/intro-to-rl.md
+++ b/docs/course_notes/intro-to-rl.md
@@ -204,7 +204,7 @@ Two fundamental problems in sequential decision making:
 
 2. **Planning**:
     - A model of the environment is **known**
-    - The agent performs computations with its model (w**ithout any external
+    - The agent performs computations with its model (**without any external
     interaction**)
     - The agent **improves** its policy, a.k.a. deliberation, reasoning, introspection, pondering, thought, search 
 
@@ -289,18 +289,19 @@ In the provided Gridworld example, the agent starts from the yellow square and h
    Fig4. Grid World Example </center>
 
 The agent's choice depends on:
-- The **discount factor ($\gamma$)**, which determines whether it prioritizes short-term or long-term rewards.
-- The **noise level**, which introduces randomness into actions.
+   - The **discount factor ($\gamma$)**, which determines whether it prioritizes short-term or long-term rewards.
+   - The **noise level**, which introduces randomness into actions.
 
 Depending on the values of $\gamma$ and noise, the agent's behavior varies:
+
 1. **$\gamma$ = 0.1, noise = 0.5:**  
-   - The agent **prefers the close exit (+1) but takes the risk of stepping into the cliff (-10).**  
+   - The agent **prefers the close exit (+1) but doesn't take the risk of stepping into the cliff (-10).**  
 2. **$\gamma$ = 0.99, noise = 0:**  
-   - The agent **prefers the distant exit (+10) while avoiding the cliff (-10).**  
+   - The agent **prefers the distant exit (+10) and takes the risk of the cliff (-10).**  
 3. **$\gamma$ = 0.99, noise = 0.5:**  
-   - The agent **still prefers the distant exit (+10), but due to noise, it risks the cliff (-10).**  
+   - The agent **still prefers the distant exit (+10), but due to noise, it doesn't risk the cliff (-10).**  
 4. **$\gamma$ = 0.1, noise = 0:**  
-   - The agent **chooses the close exit (+1) while avoiding the cliff.**  
+   - The agent **chooses the close exit (+1) and takes the risk of the cliff.**  
 
 ### Stochastic Policy  
 
@@ -445,4 +446,4 @@ Consider the Grid World example where the agent navigates to a goal while avoidi
         [:fontawesome-brands-linkedin-in:](https://www.linkedin.com/in/masoud-tahmasbi-fard/){:target="_blank"}
         </p>
     </span>
-</div>
\ No newline at end of file
+</div>

From a6a86f58e44e20a444310d50ea1d3b3c989762f4 Mon Sep 17 00:00:00 2001
From: Amir Mohammad Fakhimi <fakhimi.amirmohamad@gmail.com>
Date: Fri, 28 Feb 2025 21:32:21 +0330
Subject: [PATCH 2/4] Added Bellman Optimality Equation for state value
 function in value-based.md

---
 docs/course_notes/value-based.md | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/docs/course_notes/value-based.md b/docs/course_notes/value-based.md
index 3c593d6b..59a151e5 100644
--- a/docs/course_notes/value-based.md
+++ b/docs/course_notes/value-based.md
@@ -44,6 +44,15 @@ Where:
 
 This equation allows for the iterative computation of state values in a model-based setting.
 
+#### Bellman Optimality Equation for $V^*(s)$:
+The **Bellman Optimality Equation** for $V^*(s)$ expresses the optimal state value function. It is given by:
+
+$$
+V^*(s) = \max_a \mathbb{E} \left[ R_{t+1} + \gamma V^\*(S_{t+1}) \mid s_t = s, a_t = a \right]
+$$
+
+This shows that the optimal value at each state is the immediate reward plus the discounted maximum expected value from the next state, where the next action is chosen optimally.
+
 ---
 
 ### 1.2. Action Value Function $Q(s, a)$
@@ -80,7 +89,7 @@ Where:
 The **Bellman Optimality Equation** for $Q^*(s, a)$ expresses the optimal action value function. It is given by:
 
 $$
-Q^*(s, a) = \mathbb{E} \left[ R_{t+1} + \gamma \max_{a'} Q^*(s_{t+1}, a') \mid s_t = s, a_t = a \right]
+Q^*(s, a) = \mathbb{E} \left[ R_{t+1} + \gamma \max_{a'} Q^\*(s_{t+1}, a') \mid s_t = s, a_t = a \right]
 $$
 
 This shows that the optimal action value at each state-action pair is the immediate reward plus the discounted maximum expected value from the next state, where the next action is chosen optimally.
@@ -639,4 +648,4 @@ The choice of method depends on the environment, the availability of a model, an
         [:fontawesome-brands-linkedin-in:](https://www.linkedin.com/in/ghazal-hosseini-mighan-8b911823a){:target="_blank"}
         </p>
     </span>
-</div> -->
\ No newline at end of file
+</div> -->

From 34312f840191f3f691b1e8d3a80a83bc10d83e48 Mon Sep 17 00:00:00 2001
From: Amir Mohammad Fakhimi <fakhimi.amirmohamad@gmail.com>
Date: Sat, 1 Mar 2025 04:58:22 +0330
Subject: [PATCH 3/4] Fixing writing mistakes in value-based.md

---
 docs/course_notes/value-based.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/course_notes/value-based.md b/docs/course_notes/value-based.md
index 59a151e5..c6a4c26f 100644
--- a/docs/course_notes/value-based.md
+++ b/docs/course_notes/value-based.md
@@ -283,7 +283,7 @@ $$
 \hat{I}_N = \frac{1}{N} \sum_{i=1}^{N} f(x_i),
 $$
 
-where $ x_i $ are **independent** samples drawn from $ p(x) $. The **Law of Large Numbers (LLN)** ensures that as $N \to \infty$:
+where $x_i$ are **independent** samples drawn from $p(x)$. The **Law of Large Numbers (LLN)** ensures that as $N \to \infty$:
 
 $$
 \hat{I}_N \to I.

From f421eb60e7c9e0b99d7b1b9fa617732ab2c016e3 Mon Sep 17 00:00:00 2001
From: Amir Mohammad Fakhimi <fakhimi.amirmohamad@gmail.com>
Date: Sat, 1 Mar 2025 04:59:20 +0330
Subject: [PATCH 4/4] Fixed a typo in week2.md

---
 docs/workshops/week2.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/workshops/week2.md b/docs/workshops/week2.md
index c4de72ed..f2abc513 100644
--- a/docs/workshops/week2.md
+++ b/docs/workshops/week2.md
@@ -46,5 +46,5 @@ comments: True
 ### Notebook(s)
 
 <div style="display: flex; align-items: center; flex-wrap: wrap;">
-  <a href="https://github.com/DeepRLCourse/Workshop-2-Material" target="_blank" class="md-button" style="height: 50px; margin-bottom: 10px; margin-right: 10px;">Workshop 1 Notebook(s)</a>
-</div>
\ No newline at end of file
+  <a href="https://github.com/DeepRLCourse/Workshop-2-Material" target="_blank" class="md-button" style="height: 50px; margin-bottom: 10px; margin-right: 10px;">Workshop 2 Notebook(s)</a>
+</div>