Skip to content

Commit 8d2e243

Browse files
authored
Merge branch 'Open-Deep-ML:main' into adamax
2 parents 1033d3a + 52e6105 commit 8d2e243

File tree

36 files changed

+1132
-0
lines changed

36 files changed

+1132
-0
lines changed
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
2+
# Learn Section
3+
4+
## Understanding Bhattacharyya Distance
5+
6+
**Bhattacharyya Distance (BD)** is a concept in statistics used to measure the **similarity** or **overlap** between two probability distributions **P(x)** and **Q(x)** on the same domain **x**.
7+
8+
This differs from **KL Divergence**, which measures the **loss of information** when projecting one probability distribution onto another (reference distribution).
9+
10+
### **Bhattacharyya Distance Formula**
11+
The Bhattacharyya distance is defined as:
12+
13+
$$
14+
BC (P, Q) = \sum \sqrt{P(X) \cdot Q(X)}
15+
$$
16+
17+
$$
18+
BD (P, Q) = -\ln(BC (P, Q))
19+
$$
20+
21+
where **BC (P, Q)** is the **Bhattacharyya coefficient**.
22+
23+
### **Key Properties**
24+
1. **BD is always non-negative**:
25+
$$ BD \geq 0 $$
26+
2. **Symmetric in nature**:
27+
$$ BD (P, Q) = BD (Q, P) $$
28+
3. **Applications**:
29+
- Risk assessment
30+
- Stock predictions
31+
- Feature scaling
32+
- Classification problems
33+
34+
### **Example Calculation**
35+
Consider two probability distributions **P(x)** and **Q(x)**:
36+
37+
$$
38+
P(x) = [0.1, 0.2, 0.3, 0.4], \quad Q(x) = [0.4, 0.3, 0.2, 0.1]
39+
$$
40+
41+
1. **Bhattacharyya Coefficient**:
42+
43+
$$
44+
BC (P, Q) = \sum \sqrt{P(X) \cdot Q(X)} = 0.8898
45+
$$
46+
47+
2. **Bhattacharyya Distance**:
48+
49+
$$
50+
BD (P, Q) = -\ln(BC (P, Q)) = -\ln(0.8898) = 0.1166
51+
$$
52+
53+
This illustrates how BD quantifies the **overlap** between two probability distributions.
54+
55+
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
import numpy as np
2+
3+
def bhattacharyya_distance(p : list[float], q : list[float]) -> float:
4+
5+
if len(p) != len(q) :
6+
return 0.0
7+
8+
p, q = np.array(p), np.array(q)
9+
10+
BC = np.sum(np.sqrt(p * q)) #### Bhattacharya coefficient
11+
12+
DB = -np.log(BC) #### Bhattacharya distance
13+
14+
return round(DB, 4)
15+
16+
def test_bhattacharyya_distance() -> None:
17+
18+
# Test Case 1
19+
p = [0.1, 0.2, 0.3, 0.4]
20+
q = [0.4, 0.3, 0.2, 0.1]
21+
assert bhattacharyya_distance(p, q) == 0.1166
22+
23+
# Test Case 2
24+
p = [0.7, 0.2, 0.1]
25+
q = [0.4, 0.3, 0.3]
26+
assert bhattacharyya_distance(p, q) == 0.0541
27+
28+
# Test Case 3
29+
p = []
30+
q = [0.5, 0.4, 0.1]
31+
assert bhattacharyya_distance(p, q) == 0.0
32+
33+
# Test Case 4
34+
p = [0.6, 0.4]
35+
q = [0.1, 0.7, 0.2]
36+
assert bhattacharyya_distance(p, q) == 0.0
37+
38+
# Test Case 5
39+
p = [0.6, 0.2, 0.1, 0.1]
40+
q = [0.1, 0.2, 0.3, 0.4]
41+
assert bhattacharyya_distance(p, q) == 0.2007
42+
43+
if __name__ == '__main__':
44+
45+
test_bhattacharyya_distance()
46+
print('All Bhattacharyya Distance test cases passed')
47+
48+
49+

Problems/128_dyt/learn.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
A new study (https://arxiv.org/pdf/2503.10622) demonstrates that layer normalization, that is ubiquitous in transformers, produces Tanh-like S-shapes. By incorporating a new layer replacement for normalization called "Dynamic Tanh" (DyT for short), Transformers without normalization can match or exceed the performance of their normalized counterparts, mostly without hyperparameter tuning.
2+
3+
### Normalization layer
4+
Consider an standard NLP task, where an input $x$ has a shape of $(B,T,C)$, where $B$ is the batch size, $T$ - number of tokens (sequence length) and $C$ - embedding dimensions. Then an output of a normalization layer is generally computed as $norm(x)=\gamma(\frac{x-\mu}{\sqrt{\sigma^2+\varepsilon}})+\beta$, where $\gamma$ and $\beta$ are learnable parameters of shape $(C,)$. Distribution's statistics are calculated as follows: $\mu_k=\frac{1}{BT}\sum_i^B\sum_j^Tx_{ij}$; $\sigma_k^2=\frac{1}{B T} \sum_{i, j}\left(x_{i j k}-\mu_k\right)^2$
5+
6+
### Hyperboloic tangent (Tanh)
7+
Tanh function is defined as a ratio: $tanh(x)=\frac{sinh(x)}{cosh(x)}=\frac{exp(x)-exp(-x)}{exp(x)+exp(-x)}$. Essentially the function allows transformation of an arbitrary domain to $[-1,1]$.
8+
9+
### Dynamic Tanh (DyT)
10+
Turns out that LN (layer normalization) produces different parts of a $tanh(kx)$, where $k$ controls the curvature of the tanh curve in the center. The smaller the $k$, the smoother is the change from $-1$ to $1$. Hence the study proposes a drop-in replacement for LN given an input tensor $x$:
11+
12+
$$
13+
DyT(x)=\gamma*tanh(\alpha x)+\beta,
14+
$$
15+
16+
where:
17+
* $\alpha$ - learnable parameter that allows scaling the input differently based on its range (tokens producing **smaller variance** produce **less smoother curves**). Authors suggest a **default value** of $0.5$.
18+
* $\gamma, \beta$ - learnable parameters, that scale our output based on the input. Authors suggest initializing these vectors with following **default values**:
19+
* $\gamma$ as all-one vector
20+
* $\beta$ as all-zero
21+
22+
Despite not calculating statistics, DyT preserves the "squashing" effect of LN on extreme values in a non-linear fashion, while almost linearly transforming central parts of the input.

Problems/128_dyt/solution.py

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
import numpy as np
2+
3+
4+
def dynamic_tanh(x: np.ndarray, alpha: float, gamma: float, beta: float) -> list[float]:
5+
"""
6+
Applies DyT to an array. Could serve as a replacement
7+
for layer normalization in Transformers.
8+
9+
Parameters
10+
----------
11+
x : np.ndarray
12+
Input tensor of shape (B,T,C)
13+
alpha : float
14+
Learnable parameter of the DyT layer
15+
gamma : float
16+
Learnable scaling parameter vector of shape (C, ) of the DyT layer
17+
beta : float
18+
Learnable scaling parameter vector of shape (C, ) of the DyT layer
19+
eps : float
20+
Epsilon constant
21+
22+
Returns
23+
-------
24+
x : list[float]
25+
Input x with DyT applied to it and rounded up to 4 floating points
26+
"""
27+
28+
def tanh(x: np.ndarray) -> np.ndarray:
29+
return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))
30+
31+
x = tanh(alpha * x)
32+
return (x * gamma + beta).round(4).tolist()
33+
34+
35+
def test_dynamic_tanh():
36+
alpha = .5
37+
38+
# Test 1
39+
x = np.array([[[0.14115588, 0.00372817, 0.24126647, 0.22183601],
40+
[0.36301332, 0.67681456, 0.3723281 , 0.62767559],
41+
[0.94926205, 0.80230257, 0.19737574, 0.04460771],
42+
[0.43777021, 0.95744001, 0.60795979, 0.58980314],
43+
[0.27250625, 0.48053656, 0.11087151, 0.06228769]],
44+
[[0.12620219, 0.63002473, 0.75673539, 0.60411435],
45+
[0.3918192 , 0.39810709, 0.42186426, 0.79954607],
46+
[0.67730682, 0.96539769, 0.13366266, 0.44462357],
47+
[0.31556188, 0.86050486, 0.96060468, 0.43953706],
48+
[0.80002165, 0.39582123, 0.35731605, 0.83600622]]])
49+
gamma, beta = np.ones(shape=(x.shape[2])), np.zeros(shape=(x.shape[2]))
50+
expected_x = [[[0.0705, 0.0019, 0.1201, 0.1105],
51+
[0.1795, 0.3261, 0.184, 0.3039],
52+
[0.4419, 0.3809, 0.0984, 0.0223],
53+
[0.2155, 0.4452, 0.295, 0.2866],
54+
[0.1354, 0.2357, 0.0554, 0.0311]],
55+
[[0.063, 0.305, 0.3613, 0.2932],
56+
[0.1934, 0.1965, 0.2079, 0.3798],
57+
[0.3263, 0.4484, 0.0667, 0.2187],
58+
[0.1565, 0.4055, 0.4465, 0.2163],
59+
[0.38, 0.1954, 0.1768, 0.3952]]]
60+
output_x = dynamic_tanh(x, alpha, gamma, beta)
61+
assert expected_x == output_x, 'Test case 1 failed'
62+
63+
# Test 2
64+
x = np.array([[[0.20793482, 0.16989285, 0.03898972],
65+
[0.17912554, 0.10962205, 0.3870742],
66+
[0.00107181, 0.35807922, 0.15861333]]])
67+
gamma, beta = np.ones(shape=(x.shape[2])), np.zeros(shape=(x.shape[2]))
68+
expected_x = [[[0.1036, 0.0847, 0.0195],
69+
[0.0893, 0.0548, 0.1912],
70+
[0.0005, 0.1772, 0.0791]]]
71+
output_x = dynamic_tanh(x, alpha, gamma, beta)
72+
assert expected_x == output_x, 'Test case 2 failed'
73+
74+
# Test 3
75+
x = np.array([[[0.94378259]],[[0.97754654]],[[0.36168351]],[[0.51821078]],[[0.76961589]]])
76+
gamma, beta = np.ones(shape=(x.shape[2])), np.zeros(shape=(x.shape[2]))
77+
expected_x = [[[0.4397]],[[0.4532]],[[0.1789]],[[0.2535]],[[0.3669]]]
78+
output_x = dynamic_tanh(x, alpha, gamma, beta)
79+
assert expected_x == output_x, 'Test case 3 failed'
80+
81+
print('All tests passed')
82+
83+
84+
if __name__ == '__main__':
85+
test_dynamic_tanh()
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
<!DOCTYPE html>
2+
<html>
3+
<head>
4+
<meta charset="UTF-8">
5+
<title>Gridworld Policy Evaluation</title>
6+
</head>
7+
<body>
8+
<h2>Gridworld Policy Evaluation</h2>
9+
<p>
10+
In reinforcement learning, <strong>policy evaluation</strong> is the process of computing the state-value function for a given policy. In a gridworld environment, this involves iteratively updating the value of each state based on the expected return from following the policy.
11+
</p>
12+
13+
<h3>Key Concepts</h3>
14+
<ul>
15+
<li><strong>State-Value Function (V):</strong> The expected return when starting from a state and following a policy.</li>
16+
<li><strong>Policy:</strong> A mapping from states to probabilities for each available action.</li>
17+
<li><strong>Bellman Expectation Equation:</strong>
18+
<p>
19+
For each state \( s \):<br>
20+
\[
21+
V(s) = \sum_{a} \pi(a|s) \sum_{s'} P(s'|s,a) [R(s,a,s') + \gamma V(s')]
22+
\]
23+
</p>
24+
</li>
25+
</ul>
26+
27+
<h3>Algorithm Overview</h3>
28+
<ol>
29+
<li><strong>Initialization:</strong> Start with an initial guess (e.g., zeros) for the state-value function \( V(s) \).</li>
30+
<li><strong>Iterative Update:</strong> Update the state value for each non-terminal state using the Bellman equation until the maximum change is less than a set threshold.</li>
31+
<li><strong>Terminal States:</strong> For this task, terminal states (the four corners) remain unchanged.</li>
32+
</ol>
33+
34+
<p>
35+
This method provides a foundation for assessing the quality of states under a given policy, which is crucial for many reinforcement learning techniques.
36+
</p>
37+
</body>
38+
</html>
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Gridworld Policy Evaluation
2+
3+
In reinforcement learning, **policy evaluation** is the process of computing the state-value function for a given policy. For a gridworld environment, this involves iteratively updating the value of each state based on the expected return following the policy.
4+
5+
## Key Concepts
6+
7+
- **State-Value Function (V):**
8+
The expected return when starting from a state and following a given policy.
9+
10+
- **Policy:**
11+
A mapping from states to probabilities of selecting each available action.
12+
13+
- **Bellman Expectation Equation:**
14+
For each state $s$:
15+
$$
16+
V(s) = \sum_{a} \pi(a|s) \sum_{s'} P(s'|s,a) [R(s,a,s') + \gamma V(s')]
17+
$$
18+
where:
19+
- $ \pi(a|s) $ is the probability of taking action $ a $ in state $ s $,
20+
- $ P(s'|s,a) $ is the probability of transitioning to state $ s' $,
21+
- $ R(s,a,s') $ is the reward for that transition,
22+
- $ \gamma $ is the discount factor.
23+
24+
## Algorithm Overview
25+
26+
1. **Initialization:**
27+
Start with an initial guess (commonly zeros) for the state-value function $ V(s) $.
28+
29+
2. **Iterative Update:**
30+
For each non-terminal state, update the state value using the Bellman expectation equation. Continue updating until the maximum change in value (delta) is less than a given threshold.
31+
32+
3. **Terminal States:**
33+
For this example, the four corners of the grid are considered terminal, so their values remain unchanged.
34+
35+
This evaluation method is essential for understanding how "good" each state is under a specific policy, and it forms the basis for more advanced reinforcement learning algorithms.
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Gridworld Policy Evaluation
2+
3+
Implement a function that evaluates the state-value function for a 5x5 gridworld under a given policy. In this gridworld, the agent can move in four directions: up, down, left, and right. Each move incurs a constant reward of -1, and terminal states (the four corners) remain unchanged. The policy is provided as a dictionary mapping each state (tuple: (row, col)) to a dictionary of action probabilities.
4+
5+
## Example
6+
7+
**Input:**
8+
9+
```python
10+
policy = {
11+
(i, j): {'up': 0.25, 'down': 0.25, 'left': 0.25, 'right': 0.25}
12+
for i in range(5) for j in range(5)
13+
}
14+
gamma = 0.9
15+
threshold = 0.001
16+
```
17+
18+
## Output:
19+
20+
A 5x5 list of state values that converges after iterative evaluation.
21+
22+
```
23+
[0.0, -4.864480919478529, -6.078955203735765, -4.864480919478529, 0.0]
24+
[-4.864480919478529, -6.23388594292537, -6.7676569349718365, -6.233885942925371, -4.864480919478529]
25+
[-6.078955203735764, -6.7676569349718365, -7.090189335232064, -6.7676569349718365, -6.078955203735764]
26+
[-4.864480919478529, -6.23388594292537, -6.7676569349718365, -6.233885942925371, -4.864480919478529]
27+
[0.0, -4.864480919478529, -6.078955203735765, -4.864480919478529, 0.0]
28+
```
29+
30+
## Reasoning:
31+
32+
For each non-terminal state, compute the expected value over all possible actions using the policy. Update the state value iteratively using the Bellman expectation equation until the maximum change across states is below the threshold, ensuring that terminal states remain fixed.

0 commit comments

Comments
 (0)