You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Documents/Training-SL.md
+29-1Lines changed: 29 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ The example scene `UnityTensorflow/Examples/Pong/PongSL` shows how to use superv
47
47
### SupervisedLearningModel.cs
48
48
*`checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer.
49
49
*`Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs.
50
-
*`optimizer`: The time of optimizer to use for this model when training. You can also set its parameters here.
50
+
*`optimizer`: The optimizer to use for this model when training. You can also set its parameters here.
51
51
52
52
### SupervisedLearningNetworkSimple
53
53
This is a simple implementation of SuperviseLearningNetowrk that you can create a plug it in as a neural network definition for any SupervisedLearningModel.
@@ -62,3 +62,31 @@ This is a simple implementation of SuperviseLearningNetowrk that you can create
62
62
-`minStd`: If it does outputs a variance of the action, the standard deviation will always be larger than this value.
63
63
64
64
## Training using GAN
65
+
You can also use a [conditional GAN](https://arxiv.org/abs/1411.1784) model instead of regular supervised learning model. GAN might be better if the correct actions of the same observation do not follow guassian distribution. However, training of GAN is very unstable.
66
+
67
+
Note that currently the GAN network we made does not support visual observation.
68
+
69
+
### Steps
70
+
Most the same steps as using regular [supervised learning](Overall Steps) as before, but change step 3 to create a GAN model, and change the `TrainerParamsMimic` in step 2-2 to `TrainerParamsGAN` instead.
71
+
72
+
- Create a GAN model:
73
+
1. Attach a `GANModel.cs` to any GameObject.
74
+
2. Create a `GANNetworkDense` scriptable object in your project and assign it to the Network field in `GANModel.cs`.
75
+
3. Assign the created Model to the `modelRef` field of in `TrainerMimic.cs`
76
+
77
+
### GANModel.cs
78
+
*`checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer.
79
+
*`Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs.
80
+
*`generatorL2LossWeight`: L2 loss weight of the generator. Usually 0 is fine.
81
+
*`outputShape`: Output shape of GAN. For ML-Agent, you can keep it unmodified, and the trainer will set it for you.
82
+
*`inputNoiseShape`: Input noise shape of GAN. Usually it is the same as the output shape.
83
+
*`inputConditionShape`: The input observation shape. For ML-Agent, you can keep it unmodified, and the trainer will set it for you.
84
+
*`generatorOptimizer`: The optimizer to use for this model to train generator.
85
+
*`discriminatorOptimizer`: The optimizer to use for this model to train discriminator.
86
+
*`initializeOnAwake`: Whether to initialize the GAN model on awake baed on shapes defined above. For ML-Agent environment, set this to false.
87
+
88
+
### TrainerParamsGAN
89
+
See [TrainerParamsMimic](#trainerparamsmimic) for other parameters not listed below.
90
+
*`discriminatorTrainCount`: How many times the discriminator will be trained each training step.
91
+
*`generatorTrainCount`: How many times the generator will be trained each training step.
92
+
*`usePrediction`: Whether use [prediction method](https://www.semanticscholar.org/paper/Stabilizing-Adversarial-Nets-With-Prediction-Yadav-Shah/ec25504486d8751e00e613ca6fa64b256e3581c8) to stablize the training.
0 commit comments