Skip to content

Commit 13c1f21

Browse files
authored
Update README.md
1 parent 1d2696a commit 13c1f21

File tree

1 file changed

+3
-4
lines changed

1 file changed

+3
-4
lines changed

README.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# Exercise Reinforcement Learning
22
This exercise explores Q-learning based on neural and table-based approaches.
3-
## Task 1 & 2: Q-Table based RL:
4-
### Frozen-Lake
3+
## Task 1: Q-Table based RL - Frozen-Lake
54

65
1. Take a look at the [frozen-lake-documentation](https://gymnasium.farama.org/environments/toy_text/frozen_lake/), and familiarize yourself with the setup. Recall the Q-Table update Rule we saw during the lecture or take a look at [1, page 153]:
76

@@ -31,7 +30,7 @@ cd /home/$USER/miniconda3/envs/aml/lib
3130
```
3231
Source: [Stackoverflow](https://stackoverflow.com/questions/72110384/libgl-error-mesa-loader-failed-to-open-iris)
3332

34-
### Tic-Tac-Toe
33+
## Task 2: Q-Table based RL - Tic-Tac-Toe
3534
1. Let's consider a more challenging example next. Navigate to `src/train_table_Q_tictactoe.py` and finish the `TicTacToeBoard` class in `gen.py`. Use `nox -s test` to check your
3635
progress. When `tests/test_board.py` checks out without an error, this task is done.
3736

@@ -44,7 +43,7 @@ progress. When `tests/test_board.py` checks out without an error, this task is d
4443
5. Stop sampling from the Q-table every time for your agent. Sample the agent with probability $\epsilon$ and
4544
perform an exploratory move with probability $1 - \epsilon$. Remember to use `jax.random.split` to generate new seeds for `jax.random.uniform`.
4645

47-
## Task 3: Q-Neural RL:
46+
## Task 3: Q-Neural RL - Tic-Tac-Toe
4847
1. Let's replace the Q-table with a neural network [2, Algorithm 1]. For simplicity, we do not implement batching, this works here, but won't scale to harder problems. Re-use as much code from task two as possible. Open `src/train_neural_Q_tictactoe.py`. Your `board_update`, `create_explore_move` functions are already imported.
4948
Recall the cost function for neural Q-Learning:
5049

0 commit comments

Comments
 (0)