General Overview
At first view the repository is well-ordered, easy to navigate throught python files, and that's important!
I really appreciated the accuracy in th explanations given in the readme, it helped a lot to understand the reasonings behind the code.
3.1 Fixed Rules
Strategy is okey, quite simple but efficient as the task asked
3.2 Genetic Algorithm
Tuning the parameters in different phases sounds worth because mainly important choices are at the last phases of the game, expecially with large nim values.
Also for the second strategy you proposed the learning is divided into different parts to tune better the parameters. Personally, having a look of the results, I do not undestand the reasoning of this strategy if the third strategy is proposed, but it's just an opinion. The third one is really good guys. I noticed you used XOR operation to implement it, I'm not sure it was permitted because helps a lot GA to learn similarly to a optimal nim-sum strategy... but seeing the results obtained also in this case I can only be impressed!
3.3 Min-Max
I found that leaving the old generic algorithm commented is always a good idea in order to better understand the changes done on the new proposals. The solution obtained is good so I can might state that's a well-working win-max. The pruning phase does not interferee with the result but speeds up the inference time.
3.4 Reinforced Learning
As you said in the readme, RL is hard to evaluate because of the mutability of the values. I did not know anything about bblais game and the implementing of RL on the ideas of another repo is laudable. I would rather try with more RL agent training phases versus himself, and a lighter penance if loses... but i's just a consideration.
Overall, again, congratulations for your work. Very very well done!
General Overview
At first view the repository is well-ordered, easy to navigate throught python files, and that's important!
I really appreciated the accuracy in th explanations given in the readme, it helped a lot to understand the reasonings behind the code.
3.1 Fixed Rules
Strategy is okey, quite simple but efficient as the task asked
3.2 Genetic Algorithm
Tuning the parameters in different phases sounds worth because mainly important choices are at the last phases of the game, expecially with large nim values.
Also for the second strategy you proposed the learning is divided into different parts to tune better the parameters. Personally, having a look of the results, I do not undestand the reasoning of this strategy if the third strategy is proposed, but it's just an opinion. The third one is really good guys. I noticed you used XOR operation to implement it, I'm not sure it was permitted because helps a lot GA to learn similarly to a optimal nim-sum strategy... but seeing the results obtained also in this case I can only be impressed!
3.3 Min-Max
I found that leaving the old generic algorithm commented is always a good idea in order to better understand the changes done on the new proposals. The solution obtained is good so I can might state that's a well-working win-max. The pruning phase does not interferee with the result but speeds up the inference time.
3.4 Reinforced Learning
As you said in the readme, RL is hard to evaluate because of the mutability of the values. I did not know anything about bblais game and the implementing of RL on the ideas of another repo is laudable. I would rather try with more RL agent training phases versus himself, and a lighter penance if loses... but i's just a consideration.
Overall, again, congratulations for your work. Very very well done!