Custom implementations of Dynamic Programming, Monte Carlo, Temporal Difference and Vanilla Policy Gradient for learning purposes.
- Each algorithm has its dedicated folder (
dynamic_programming/,monte_carlo/,temporal_difference/,policy_gradients/) containing code and related documentation. - Each folder contains
src/for the source code,/notebooksfor testing the code and for documentation./notebooksis under construction. Temporarily the files insrc/can be ran directly to train and test the algorithms. _environments/contains a modified verion of farama-foundations's FrozenLake environment that allows for modifying the reward structure.- The code is designed to be readable and well-documented to aid in understanding and learning.
