Explores Reinforcement Learning on the strategy game Nim.
Contents available on: GitHub
-
final_report.pdfprimary artifact and final report -
codedirectory-
agents.pyclass definitions for Q and Bayes agents -
learning.pyprimary script for generating results. Writes output tofinaldirectory. -
nimUtils.py,play.py,statesActions.pyall contain helper functions for the primary scriptlearning.py. -
visualization.ipynbreads fromfinaldirectory to create the figures in the report. It also produces many more not used in the final report.
-
-
finaldirectory-
Bayesdirectory: logs from Bayes training, shown in figure 6.c -
Bayes_visdirectory: logs to produce figure 4 -
PvPdirectory: logs from final best-vs-all simulations, as depicted by figure 5 -
Q_visdirectory: logs to produce figures 2 and 3 -
QtvQtdirectory: logs from$Q_t$ training, a small number of which are shown in figure 6.b -
QvQdirectory: logs from Q training, a small subset of which results in figure 6.a
-
This project did not require a special runtime environment. It used the latest versions of standard packages (numpy, pandas, matplotlib, functools, operator, sys, csv, datetime).
If you wish to recreate any of the results contained in the final report, execute learning.py. Within main, comment or uncomment calls to the following functions to reproduce results:
-
vis_learning()(15 minutes): visualize changes in Q-table, figures 2 and 3. Note: this will overwrite the content infinal/Q_vis/ -
QvQgrid()(5 hours): the 75 trials between Q-agents detailed in report. This will add timestamped files tofinal/QvQ/ -
QtvQtgrid()(20 minutes): trails between$Q_t$ -agents. This will add timestamped files tofinal/QtvQt/ -
bayesAgents()(2.5 hours): trails between Bayes agents. This will add timestamped files tofinal/Bayes/ -
bayesVis()(20 minutes): visualize Bayesian prior moments. This will overwrite the contents infinal/Bayes_vis/ -
BestvEachOther()(1.5 hours): final comparison between the best agents. This will add timestamped files tofinal/PvP/