Update Training-PPO.md

tcmxx · web-flow · commit e5b78fc927f5 · 2018-08-10T12:06:14.000+03:00
diff --git a/Documents/Training-PPO.md b/Documents/Training-PPO.md
@@ -21,6 +21,8 @@ The example [Getting Started with the 3D Balance Ball Environment](Getting-Start
 4. Play and see how it works.
 
 ## Explanation of fields in the inspector
+We use similar parameters as in Unity ML-Agents. If something is confusing, read see their [document](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-PPO.md) for mode datails.
+
 ### TrainerPPO.cs
 * `isTraining`: Toggle this to switch between training and inference mode. Note that if isTraining if false when the game starts, the training part of the PPO model will not be initialize and you won't be able to train it in this run. Also,
 * `parameters`: You need to assign this field with a TrainerParamsPPO scriptable object. 
@@ -59,6 +61,12 @@ This is a simple implementation of RLNetworkAC that you can create a plug it in
 - `actorOutputLayerBias`/`criticOutputLayerBias`/`visualEncoderBias`: Whether use bias.
 
 ## Training with Heuristics
+If you already know some policy that is better than random policy, you might give it as a hint to PPO to increase the training a bit. 
+
+1. Implement the [AgentDependentDeicision- needs link](dfsdf) for your policy and attach it to the agents that you want them to occasionally use this policy.
+2. In your trainer parameters, set `useHeuristicChance` to larger than 0.
+3. Use [TrainerParamOverride - needs link](asdfs) to decrease the `useHeuristicChance` over time during the training.
+