From c4e0c14b355f9f448c5ff32551d4ad2f9b53e95e Mon Sep 17 00:00:00 2001
From: Bryson Jones <bkjones97@gmail.com>
Date: Sun, 29 Mar 2026 08:27:24 -0700
Subject: [PATCH] fix math rendering

---
 README.md | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index f68a95a..c01d745 100644
--- a/README.md
+++ b/README.md
@@ -273,13 +273,23 @@ Calculates advantage values $A^\pi(\mathbf{o}_t, \mathbf{a}_t)$ from offline tra
 
 #### 1. Post-Training (N-Step Lookahead)
 
-* **Formula:** $$A^\pi(\mathbf{o}_t, \mathbf{a}_t) = \sum_{t'=t}^{t+N-1} r'_{t'} + V^\pi(\mathbf{o}_{t+N}) - V^\pi(\mathbf{o}_t)$$
+**Formula:**
+
+```math
+A^\pi(\mathbf{o}_t, \mathbf{a}_t) = \sum_{t'=t}^{t+N-1} r'_{t'} + V^\pi(\mathbf{o}_{t+N}) - V^\pi(\mathbf{o}_t)
+```
+
 * **Configuration:** $N = 50$
 * **Execution:** Sum rewards over the $N$-step window, add the future value $V^\pi(\mathbf{o}_{t+N})$, and subtract the current value $V^\pi(\mathbf{o}_t)$.
 
 #### 2. Pre-Training (Full Episode)
 
-* **Formula:** $$A^\pi(\mathbf{o}_t, \mathbf{a}_t) = \sum_{t'=t}^{T} r'_{t'} - V^\pi(\mathbf{o}_t)$$
+**Formula:**
+
+```math
+A^\pi(\mathbf{o}_t, \mathbf{a}_t) = \sum_{t'=t}^{T} r'_{t'} - V^\pi(\mathbf{o}_t)
+```
+
 * **Configuration:** $N = T$ (where $T$ is the terminal episode step)
 * **Execution:** Calculate the empirical return from step $t$ to the episode's end, then subtract the baseline $V^\pi(\mathbf{o}_t)$.