Skip to content

Commit e5b78fc

Browse files
authored
Update Training-PPO.md
1 parent 165c1e4 commit e5b78fc

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

Documents/Training-PPO.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ The example [Getting Started with the 3D Balance Ball Environment](Getting-Start
2121
4. Play and see how it works.
2222

2323
## Explanation of fields in the inspector
24+
We use similar parameters as in Unity ML-Agents. If something is confusing, read see their [document](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-PPO.md) for mode datails.
25+
2426
### TrainerPPO.cs
2527
* `isTraining`: Toggle this to switch between training and inference mode. Note that if isTraining if false when the game starts, the training part of the PPO model will not be initialize and you won't be able to train it in this run. Also,
2628
* `parameters`: You need to assign this field with a TrainerParamsPPO scriptable object.
@@ -59,6 +61,12 @@ This is a simple implementation of RLNetworkAC that you can create a plug it in
5961
- `actorOutputLayerBias`/`criticOutputLayerBias`/`visualEncoderBias`: Whether use bias.
6062

6163
## Training with Heuristics
64+
If you already know some policy that is better than random policy, you might give it as a hint to PPO to increase the training a bit.
65+
66+
1. Implement the [AgentDependentDeicision- needs link](dfsdf) for your policy and attach it to the agents that you want them to occasionally use this policy.
67+
2. In your trainer parameters, set `useHeuristicChance` to larger than 0.
68+
3. Use [TrainerParamOverride - needs link](asdfs) to decrease the `useHeuristicChance` over time during the training.
69+
6270

6371

6472

0 commit comments

Comments
 (0)