Skip to content

Commit 874962a

Browse files
authored
Update Training-SL.md
1 parent baa31ca commit 874962a

File tree

1 file changed

+29
-1
lines changed

1 file changed

+29
-1
lines changed

Documents/Training-SL.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ The example scene `UnityTensorflow/Examples/Pong/PongSL` shows how to use superv
4747
### SupervisedLearningModel.cs
4848
* `checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer.
4949
* `Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs.
50-
* `optimizer`: The time of optimizer to use for this model when training. You can also set its parameters here.
50+
* `optimizer`: The optimizer to use for this model when training. You can also set its parameters here.
5151

5252
### SupervisedLearningNetworkSimple
5353
This is a simple implementation of SuperviseLearningNetowrk that you can create a plug it in as a neural network definition for any SupervisedLearningModel.
@@ -62,3 +62,31 @@ This is a simple implementation of SuperviseLearningNetowrk that you can create
6262
- `minStd`: If it does outputs a variance of the action, the standard deviation will always be larger than this value.
6363

6464
## Training using GAN
65+
You can also use a [conditional GAN](https://arxiv.org/abs/1411.1784) model instead of regular supervised learning model. GAN might be better if the correct actions of the same observation do not follow guassian distribution. However, training of GAN is very unstable.
66+
67+
Note that currently the GAN network we made does not support visual observation.
68+
69+
### Steps
70+
Most the same steps as using regular [supervised learning](Overall Steps) as before, but change step 3 to create a GAN model, and change the `TrainerParamsMimic` in step 2-2 to `TrainerParamsGAN` instead.
71+
72+
- Create a GAN model:
73+
1. Attach a `GANModel.cs` to any GameObject.
74+
2. Create a `GANNetworkDense` scriptable object in your project and assign it to the Network field in `GANModel.cs`.
75+
3. Assign the created Model to the `modelRef` field of in `TrainerMimic.cs`
76+
77+
### GANModel.cs
78+
* `checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer.
79+
* `Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs.
80+
* `generatorL2LossWeight`: L2 loss weight of the generator. Usually 0 is fine.
81+
* `outputShape`: Output shape of GAN. For ML-Agent, you can keep it unmodified, and the trainer will set it for you.
82+
* `inputNoiseShape`: Input noise shape of GAN. Usually it is the same as the output shape.
83+
* `inputConditionShape`: The input observation shape. For ML-Agent, you can keep it unmodified, and the trainer will set it for you.
84+
* `generatorOptimizer`: The optimizer to use for this model to train generator.
85+
* `discriminatorOptimizer`: The optimizer to use for this model to train discriminator.
86+
* `initializeOnAwake`: Whether to initialize the GAN model on awake baed on shapes defined above. For ML-Agent environment, set this to false.
87+
88+
### TrainerParamsGAN
89+
See [TrainerParamsMimic](#trainerparamsmimic) for other parameters not listed below.
90+
* `discriminatorTrainCount`: How many times the discriminator will be trained each training step.
91+
* `generatorTrainCount`: How many times the generator will be trained each training step.
92+
* `usePrediction`: Whether use [prediction method](https://www.semanticscholar.org/paper/Stabilizing-Adversarial-Nets-With-Prediction-Yadav-Shah/ec25504486d8751e00e613ca6fa64b256e3581c8) to stablize the training.

0 commit comments

Comments
 (0)