Update Training-SL.md

tcmxx · web-flow · commit 874962abee50 · 2018-08-10T15:30:36.000+03:00
diff --git a/Documents/Training-SL.md b/Documents/Training-SL.md
@@ -47,7 +47,7 @@ The example scene `UnityTensorflow/Examples/Pong/PongSL` shows how to use superv
 ### SupervisedLearningModel.cs
 * `checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer.
 * `Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs. 
-* `optimizer`: The time of optimizer to use for this model when training. You can also set its parameters here.
+* `optimizer`: The optimizer to use for this model when training. You can also set its parameters here.
 
 ### SupervisedLearningNetworkSimple
 This is a simple implementation of SuperviseLearningNetowrk that you can create a plug it in as a neural network definition for any SupervisedLearningModel.
@@ -62,3 +62,31 @@ This is a simple implementation of SuperviseLearningNetowrk that you can create
 - `minStd`: If it does outputs a variance of the action, the standard deviation will always be larger than this value.
 
 ## Training using GAN
+You can also use a [conditional GAN](https://arxiv.org/abs/1411.1784) model instead of regular supervised learning model. GAN might be better if the correct actions of the same observation do not follow guassian distribution. However, training of GAN is very unstable.
+
+Note that currently the GAN network we made does not support visual observation.
+
+### Steps
+Most the same steps as using regular [supervised learning](Overall Steps) as before, but change step 3 to create a GAN model, and change the `TrainerParamsMimic` in step 2-2 to `TrainerParamsGAN` instead.
+
+- Create a GAN model:
+	1. Attach a `GANModel.cs` to any GameObject.
+    2. Create a `GANNetworkDense` scriptable object in your project and assign it to the Network field in `GANModel.cs`.
+    3. Assign the created Model to the `modelRef` field of in `TrainerMimic.cs`
+    
+### GANModel.cs
+* `checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer.
+* `Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs. 
+* `generatorL2LossWeight`: L2 loss weight of the generator. Usually 0 is fine. 
+* `outputShape`: Output shape of GAN. For ML-Agent, you can keep it unmodified, and the trainer will set it for you.
+* `inputNoiseShape`: Input noise shape of GAN. Usually it is the same as the output shape.
+* `inputConditionShape`: The input observation shape. For ML-Agent, you can keep it unmodified, and the trainer will set it for you.
+* `generatorOptimizer`: The optimizer to use for this model to train generator.
+* `discriminatorOptimizer`: The optimizer to use for this model to train discriminator.
+* `initializeOnAwake`: Whether to initialize the GAN model on awake baed on shapes defined above. For ML-Agent environment, set this to false.
+
+### TrainerParamsGAN
+See [TrainerParamsMimic](#trainerparamsmimic) for other parameters not listed below.
+* `discriminatorTrainCount`: How many times the discriminator will be trained each training step.
+* `generatorTrainCount`: How many times the generator will be trained each training step.
+* `usePrediction`: Whether use [prediction method](https://www.semanticscholar.org/paper/Stabilizing-Adversarial-Nets-With-Prediction-Yadav-Shah/ec25504486d8751e00e613ca6fa64b256e3581c8) to stablize the training.