You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Documents/IntelligentPoolDetails.md
+131-6Lines changed: 131 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,23 +10,30 @@ The general name of those types of those games is cue sport(See [Wikipedia](http
10
10
alt="BilliardGame"
11
11
width="600" border="10" />
12
12
</p>
13
+
<palign="center">
14
+
<em>Image from Google()</em>
15
+
</p>
13
16
14
-
During the development of the materials for the Computational Intelligence in Games course, we decided to develop a whole set of examples with the billiard game, using different technologies, to showcase the concepts and power of each. The start of the examples is a simple case where the AI only need to hit the white ball once and try to make both of the red balls on desk into pockets, using [MAES-need link](something) algorithm. The final goal is to develop a AI that can play a whole game with itself, where it should plan not just one shot but multiple shots and prevent the opponents from getting advantages, using [PPO-need link](something).
17
+
During the development of the materials for the Computational Intelligence in Games course, we decided to develop a whole set of examples with the billiard game, using different technologies, to showcase the concepts and power of each. The start of the examples is a simple case where the AI only need to hit the white ball once and try to make both of the red balls on desk into pockets, using [MAES(CMAES)](https://en.wikipedia.org/wiki/CMA-ES) algorithm. The final goal is to develop a AI that can play a whole game with itself, where it should plan not just one shot but multiple shots and prevent the opponents from getting advantages, using [PPO](https://arxiv.org/abs/1707.06347).
15
18
16
19
<palign="center">
17
20
<img src="Images/IntelligentPool/SimpleCase.png"
18
21
alt="SimpleCase"
19
22
width="600" border="10" />
20
23
</p>
21
-
24
+
<palign="center">
25
+
<em>Simple case scene</em>
26
+
</p>
22
27
In the end, at least until now, PPO is not working at all. We ended up make a even simpler case than the simple case in the beginning, with only one red ball and 4 pockets, and the game restarts after every shot.
23
28
24
-
<palign="center">
29
+
<palign="center"id="simpler-case-image">
25
30
<img src="Images/IntelligentPool/SimplerCase.png"
26
31
alt="SimplerCase"
27
32
width="600" border="10" />
28
33
</p>
29
-
34
+
<palign="center">
35
+
<em>Even simpler case scene</em>
36
+
</p>
30
37
Here I will go through the development process, describe each example scenes, tell how to play with them, and explain why I think the billiard game does not work directly with pure PPO or supervised learning.
31
38
32
39
## What we have tried
@@ -42,9 +49,127 @@ Here is the list of what we have tried and their results:
42
49
- Not work, and it won't work.
43
50
5. Even simpler cacse with 1 red ball and 4 pocket on a square table. Use `Supervised Learning` together with `MAES` for `one shot`. Reward function is reshaped heavily.
44
51
- The supervised learning can learn to hit red balls now. Sometimes it can shot well without MAES.
45
-
6. Same as 5 but with GAN(TBD).
52
+
6. Same as 5 but with GAN.
53
+
- A demo to show how GAN works.
46
54
47
55
Next those cases will be discussed one by one in details.
48
-
### - 2 red balls, 6 pockets, one shot, MAES
49
56
57
+
### Case 1 - 2 red balls, 6 pockets, one shot, MAES
58
+
* Scenes: BilliardMAESOnly-OneShot-UseMAESDirectly and BilliardMAESOnly-OneShot-UseTrainer
59
+
* If the reward function is not shaped, that is, one point for each red ball that ends up in pockets after one shot, the MAES is very like to not be able to find the best solution. The reason is simple: initial random samples are likely to all have 0 point, therefore the algorithm is not able to find next generation of better children.
60
+
* If we shape the reward so that if the red balls end up being close to the pockets, extra reward is given, then MAES is able to find a good solution after couple of iterations.
61
+
62
+
<palign="center">
63
+
<img src="Images/IntelligentPool/MAESDemo.gif"
64
+
alt="MAESDemo"
65
+
width="600" border="10" />
66
+
67
+
</p>
68
+
69
+
<palign="center">
70
+
<em>MAES Demo</em>
71
+
</p>
72
+
73
+
### Case 2 - 2 red balls, 6 pockets, two shot, MAES
74
+
* Scenes: BilliardSLAndMAES-MultiShot(Select MAES only in the scene)
75
+
* Instead of trying to find one best shot, in this case, the optimizer tries to find two consecutive shots that might have best result.
76
+
* Result: As the Case 1, works with reward shaping. It is able to find a good action to take, but the optimization takes more time because it needs to evaluate two shots and the action space dimension is 4 instead of 2, which is larger for the optimizer to search.
77
+
78
+
### Case 3 - 2 red balls, 6 pockets, one shot, MAES and Supervised Learning
79
+
* Scenes: BilliardSLAndMAES-OneShot
80
+
* Result: the neural network does not learn anything meaningful to help MAES.
81
+
82
+
The goal for this case is to let the neural network learn a initial guess that is better than random, and with the help of this guess, the optimizer should be able to find the best result faster. The neural network learns from the collected data from running the scene with MAES only, using supervised learning.
83
+
84
+
The collected data is a list of states(position of balls)/action(best action found by MAES) pairs. The supervised learning tries to make the neural network memorize the pairs so that giving the neural network an input states, it can output an action that is close to what MAES will find.
85
+
86
+
The idea sounds good, and it has been proven to be working well in some scenarios. But for our case, it just does not work. The trained neural network still often outputs an action that is no even close to the best action that a hunam being or MAES might choose. It is misleading the MAES optimizer rather than helping it.
87
+
88
+
Why is it like this?
89
+
90
+
The first problem comes from the discontinuous distribution of optimzal solutions. Let's look at the following image with a heatmap of all possible shots in certain states.
91
+
<palign="center">
92
+
<img src="Images/IntelligentPool/HeatMap.png"
93
+
alt="HeatMap"
94
+
width="600" border="10" />
95
+
</p>
96
+
<palign="center">
97
+
<em>Heat Map of all possible shots</em>
98
+
</p>
99
+
100
+
In the heat map at the right side of the above image, the whiter a pixel is, the higher score it is to shoot at with correcponding parameters. In the middle of the map, it shoot with zero force, and angle/distance from center means the shoot direction/force magnitude respectively.
101
+
102
+
The most white pixels, which basically means both of the red balls are going to get in pocket with those shots, are very rare. However, there are plenty of less white pixels, which means one of the red balls will be in pocket, are scattered around quite a lot.
103
+
104
+
Since the perfect white pixels are rare, when optimizing with MAES, it is not very likely that it can find the optimal solution every time. Also, because the less white pixels are scattered around, the sub optimal solution found by MAES might be quite different every time as well. Therefore, when collecting the state/solution pair data from MAES results, one thing happens it that: For states that are very similiar, the solutions might vary a lot!
105
+
106
+
So what is the consequence of various solutions to similiar states?
107
+
108
+
For supervised learning, it usually tries to reduce the error between its own outputs and desired outputs from training data for different state inputs. Regular neural network can not generate multiple discontinuous outputs
109
+
with the same input, therefore, it tries to output an average of all desired outputs in training data.
110
+
111
+
In our case, the supervised learning neural network learns to output the average between some of the whiter-pixel positions. If you randomly pick some white pixels and spot their average position, it is likely to be just a grey or black pixle on the heat map! This is one of the reason why our neural network can not learn anything helpful!
112
+
113
+
You might as is there a way to solve the problem and let the neural network learn generate multiple outputs? Some people might think of [GAN](https://en.wikipedia.org/wiki/Generative_adversarial_network). However, considering neural network does not really represent disconinuous functions(you can still approximate, but that is hard. See [Reference](https://www.quora.com/How-can-we-use-artificial-neural-networks-to-approximate-a-piecewise-continuous-function)),and GAN is stremely hard to train, I don't think it is worth trying it on this case.
114
+
115
+
Another minor reason why it is hard to train the neural network for our case is that, the optimal outputs might change a lot with only small change of input states. This makes the it requires more training data and larger neural network to be able to remember all different situations. I did not try to collect more data or use larger network, becaue the first problem is already hindering me and I don't have enough time to collecting the data.
116
+
117
+
In the end, I made a even simple enough case that does not have the problems above, before which I tried PPO, which did not work as expected.
118
+
119
+
### Case 4 - 2 red balls, 6 pockets, one shot, PPO
120
+
* Scenes: BilliardRL-OneShot
121
+
* Result: I have not been able to produce good result with short time training yet.
122
+
123
+
### Case 5 - 1 red ball, 4 pockets, one shot, MAES and Supervised Learning
124
+
* Scenes: BilliardSLAndMAES-OneShotSimplified
125
+
126
+
This case, the same method is used as in case 3, but with much simpler scenario. Here is the screenshot of the scene and the heatmap.
According to the heatmap, now the "better solutions" are not that scatterd as in case 3 anymore. That means it is easier for MAES to find the optimal solutions and the average solutions learned by neural network make more sense.
138
+
139
+
After collecting 20000 samples and training the neural network as in case 2 for a little while, the neural network is at least able to shoot at the red ball and sometimes pocket it.
140
+
141
+
<palign="center">
142
+
<img src="Images/IntelligentPool/SimplerDemo.gif"
143
+
alt="SimplerNeuralOnly"
144
+
width="600" border="10" />
145
+
146
+
</p>
147
+
148
+
<palign="center">
149
+
<em>Simpler case played by neural network only</em>
150
+
</p>
151
+
152
+
If we use the output from neural network as the initial guess of the optimizer, the average iteration count to find a satisfied solution is about reduced from 10 to 5 in our case. Nice!
<em>MAES iteration comparison when changingfrom MAES only to MAES with neural network- blue line: average iteration count. purple line: average score. </em>
160
+
</p>
161
+
162
+
### Case 6 - 1 red ball, 4 pockets, one shot, MAES and Supervised Learning using GAN
163
+
Scenes: BilliardSLAndMAES-OneShotSimplified-GAN
164
+
165
+
This case is a demo to show what kind of data GAN can generate. See the image below.
166
+
<palign="center">
167
+
<img src="Images/IntelligentPool/GAN.png"
168
+
alt="GAN"
169
+
width="600" border="10" />
170
+
</p>
171
+
<palign="center">
172
+
<em>Comparison of heatmap and GAN samples </em>
173
+
</p>
50
174
175
+
On the heat map at the top right corner, the green dots are samples generated by GAN. You can see that they are in mostly in the white area of the heap map, whcih makes sense.
0 commit comments