Skip to content

Commit 91f790c

Browse files
committed
Merge branch 'tcmxx/docs'
2 parents f3a0b52 + 9511b95 commit 91f790c

File tree

10 files changed

+131
-6
lines changed

10 files changed

+131
-6
lines changed
43.7 KB
Loading
112 KB
Loading
166 KB
Loading
22.8 KB
Loading
312 KB
Loading
21.7 KB
Loading
124 KB
Loading
116 KB
Loading
501 KB
Loading

Documents/IntelligentPoolDetails.md

Lines changed: 131 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,23 +10,30 @@ The general name of those types of those games is cue sport(See [Wikipedia](http
1010
alt="BilliardGame"
1111
width="600" border="10" />
1212
</p>
13+
<p align="center">
14+
<em>Image from Google()</em>
15+
</p>
1316

14-
During the development of the materials for the Computational Intelligence in Games course, we decided to develop a whole set of examples with the billiard game, using different technologies, to showcase the concepts and power of each. The start of the examples is a simple case where the AI only need to hit the white ball once and try to make both of the red balls on desk into pockets, using [MAES-need link](something) algorithm. The final goal is to develop a AI that can play a whole game with itself, where it should plan not just one shot but multiple shots and prevent the opponents from getting advantages, using [PPO-need link](something).
17+
During the development of the materials for the Computational Intelligence in Games course, we decided to develop a whole set of examples with the billiard game, using different technologies, to showcase the concepts and power of each. The start of the examples is a simple case where the AI only need to hit the white ball once and try to make both of the red balls on desk into pockets, using [MAES(CMAES)](https://en.wikipedia.org/wiki/CMA-ES) algorithm. The final goal is to develop a AI that can play a whole game with itself, where it should plan not just one shot but multiple shots and prevent the opponents from getting advantages, using [PPO](https://arxiv.org/abs/1707.06347).
1518

1619
<p align="center">
1720
<img src="Images/IntelligentPool/SimpleCase.png"
1821
alt="SimpleCase"
1922
width="600" border="10" />
2023
</p>
21-
24+
<p align="center">
25+
<em>Simple case scene</em>
26+
</p>
2227
In the end, at least until now, PPO is not working at all. We ended up make a even simpler case than the simple case in the beginning, with only one red ball and 4 pockets, and the game restarts after every shot.
2328

24-
<p align="center">
29+
<p align="center" id="simpler-case-image">
2530
<img src="Images/IntelligentPool/SimplerCase.png"
2631
alt="SimplerCase"
2732
width="600" border="10" />
2833
</p>
29-
34+
<p align="center">
35+
<em>Even simpler case scene</em>
36+
</p>
3037
Here I will go through the development process, describe each example scenes, tell how to play with them, and explain why I think the billiard game does not work directly with pure PPO or supervised learning.
3138

3239
## What we have tried
@@ -42,9 +49,127 @@ Here is the list of what we have tried and their results:
4249
- Not work, and it won't work.
4350
5. Even simpler cacse with 1 red ball and 4 pocket on a square table. Use `Supervised Learning` together with `MAES` for `one shot`. Reward function is reshaped heavily.
4451
- The supervised learning can learn to hit red balls now. Sometimes it can shot well without MAES.
45-
6. Same as 5 but with GAN(TBD).
52+
6. Same as 5 but with GAN.
53+
- A demo to show how GAN works.
4654

4755
Next those cases will be discussed one by one in details.
48-
### - 2 red balls, 6 pockets, one shot, MAES
4956

57+
### Case 1 - 2 red balls, 6 pockets, one shot, MAES
58+
* Scenes: BilliardMAESOnly-OneShot-UseMAESDirectly and BilliardMAESOnly-OneShot-UseTrainer
59+
* If the reward function is not shaped, that is, one point for each red ball that ends up in pockets after one shot, the MAES is very like to not be able to find the best solution. The reason is simple: initial random samples are likely to all have 0 point, therefore the algorithm is not able to find next generation of better children.
60+
* If we shape the reward so that if the red balls end up being close to the pockets, extra reward is given, then MAES is able to find a good solution after couple of iterations.
61+
62+
<p align="center">
63+
<img src="Images/IntelligentPool/MAESDemo.gif"
64+
alt="MAESDemo"
65+
width="600" border="10" />
66+
67+
</p>
68+
69+
<p align="center">
70+
<em>MAES Demo</em>
71+
</p>
72+
73+
### Case 2 - 2 red balls, 6 pockets, two shot, MAES
74+
* Scenes: BilliardSLAndMAES-MultiShot(Select MAES only in the scene)
75+
* Instead of trying to find one best shot, in this case, the optimizer tries to find two consecutive shots that might have best result.
76+
* Result: As the Case 1, works with reward shaping. It is able to find a good action to take, but the optimization takes more time because it needs to evaluate two shots and the action space dimension is 4 instead of 2, which is larger for the optimizer to search.
77+
78+
### Case 3 - 2 red balls, 6 pockets, one shot, MAES and Supervised Learning
79+
* Scenes: BilliardSLAndMAES-OneShot
80+
* Result: the neural network does not learn anything meaningful to help MAES.
81+
82+
The goal for this case is to let the neural network learn a initial guess that is better than random, and with the help of this guess, the optimizer should be able to find the best result faster. The neural network learns from the collected data from running the scene with MAES only, using supervised learning.
83+
84+
The collected data is a list of states(position of balls)/action(best action found by MAES) pairs. The supervised learning tries to make the neural network memorize the pairs so that giving the neural network an input states, it can output an action that is close to what MAES will find.
85+
86+
The idea sounds good, and it has been proven to be working well in some scenarios. But for our case, it just does not work. The trained neural network still often outputs an action that is no even close to the best action that a hunam being or MAES might choose. It is misleading the MAES optimizer rather than helping it.
87+
88+
Why is it like this?
89+
90+
The first problem comes from the discontinuous distribution of optimzal solutions. Let's look at the following image with a heatmap of all possible shots in certain states.
91+
<p align="center">
92+
<img src="Images/IntelligentPool/HeatMap.png"
93+
alt="HeatMap"
94+
width="600" border="10" />
95+
</p>
96+
<p align="center">
97+
<em>Heat Map of all possible shots</em>
98+
</p>
99+
100+
In the heat map at the right side of the above image, the whiter a pixel is, the higher score it is to shoot at with correcponding parameters. In the middle of the map, it shoot with zero force, and angle/distance from center means the shoot direction/force magnitude respectively.
101+
102+
The most white pixels, which basically means both of the red balls are going to get in pocket with those shots, are very rare. However, there are plenty of less white pixels, which means one of the red balls will be in pocket, are scattered around quite a lot.
103+
104+
Since the perfect white pixels are rare, when optimizing with MAES, it is not very likely that it can find the optimal solution every time. Also, because the less white pixels are scattered around, the sub optimal solution found by MAES might be quite different every time as well. Therefore, when collecting the state/solution pair data from MAES results, one thing happens it that: For states that are very similiar, the solutions might vary a lot!
105+
106+
So what is the consequence of various solutions to similiar states?
107+
108+
For supervised learning, it usually tries to reduce the error between its own outputs and desired outputs from training data for different state inputs. Regular neural network can not generate multiple discontinuous outputs
109+
with the same input, therefore, it tries to output an average of all desired outputs in training data.
110+
111+
In our case, the supervised learning neural network learns to output the average between some of the whiter-pixel positions. If you randomly pick some white pixels and spot their average position, it is likely to be just a grey or black pixle on the heat map! This is one of the reason why our neural network can not learn anything helpful!
112+
113+
You might as is there a way to solve the problem and let the neural network learn generate multiple outputs? Some people might think of [GAN](https://en.wikipedia.org/wiki/Generative_adversarial_network). However, considering neural network does not really represent disconinuous functions(you can still approximate, but that is hard. See [Reference](https://www.quora.com/How-can-we-use-artificial-neural-networks-to-approximate-a-piecewise-continuous-function)),and GAN is stremely hard to train, I don't think it is worth trying it on this case.
114+
115+
Another minor reason why it is hard to train the neural network for our case is that, the optimal outputs might change a lot with only small change of input states. This makes the it requires more training data and larger neural network to be able to remember all different situations. I did not try to collect more data or use larger network, becaue the first problem is already hindering me and I don't have enough time to collecting the data.
116+
117+
In the end, I made a even simple enough case that does not have the problems above, before which I tried PPO, which did not work as expected.
118+
119+
### Case 4 - 2 red balls, 6 pockets, one shot, PPO
120+
* Scenes: BilliardRL-OneShot
121+
* Result: I have not been able to produce good result with short time training yet.
122+
123+
### Case 5 - 1 red ball, 4 pockets, one shot, MAES and Supervised Learning
124+
* Scenes: BilliardSLAndMAES-OneShotSimplified
125+
126+
This case, the same method is used as in case 3, but with much simpler scenario. Here is the screenshot of the scene and the heatmap.
127+
128+
<p align="center">
129+
<img src="Images/IntelligentPool/HeatMapSimpler.png"
130+
alt="HeatMapSimpler"
131+
width="400" border="10" />
132+
</p>
133+
<p align="center">
134+
<em>Heapmap</em>
135+
</p>
136+
137+
According to the heatmap, now the "better solutions" are not that scatterd as in case 3 anymore. That means it is easier for MAES to find the optimal solutions and the average solutions learned by neural network make more sense.
138+
139+
After collecting 20000 samples and training the neural network as in case 2 for a little while, the neural network is at least able to shoot at the red ball and sometimes pocket it.
140+
141+
<p align="center">
142+
<img src="Images/IntelligentPool/SimplerDemo.gif"
143+
alt="SimplerNeuralOnly"
144+
width="600" border="10" />
145+
146+
</p>
147+
148+
<p align="center">
149+
<em>Simpler case played by neural network only</em>
150+
</p>
151+
152+
If we use the output from neural network as the initial guess of the optimizer, the average iteration count to find a satisfied solution is about reduced from 10 to 5 in our case. Nice!
153+
<p align="center">
154+
<img src="Images/IntelligentPool/ReducedMAESIteration.png"
155+
alt="ReducedMAESIteration"
156+
width="600" border="10" />
157+
</p>
158+
<p align="center">
159+
<em>MAES iteration comparison when changingfrom MAES only to MAES with neural network- blue line: average iteration count. purple line: average score. </em>
160+
</p>
161+
162+
### Case 6 - 1 red ball, 4 pockets, one shot, MAES and Supervised Learning using GAN
163+
Scenes: BilliardSLAndMAES-OneShotSimplified-GAN
164+
165+
This case is a demo to show what kind of data GAN can generate. See the image below.
166+
<p align="center">
167+
<img src="Images/IntelligentPool/GAN.png"
168+
alt="GAN"
169+
width="600" border="10" />
170+
</p>
171+
<p align="center">
172+
<em>Comparison of heatmap and GAN samples </em>
173+
</p>
50174

175+
On the heat map at the top right corner, the green dots are samples generated by GAN. You can see that they are in mostly in the white area of the heap map, whcih makes sense.

0 commit comments

Comments
 (0)