You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Documents/Training-PPO.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,9 +63,9 @@ This is a simple implementation of RLNetworkAC that you can create a plug it in
63
63
## Training with Heuristics
64
64
If you already know some policy that is better than random policy, you might give it as a hint to PPO to increase the training a bit.
65
65
66
-
1. Implement the [AgentDependentDeicision- needs link](dfsdf) for your policy and attach it to the agents that you want them to occasionally use this policy.
66
+
1. Implement the [AgentDependentDeicision](AgentDependentDeicision.md) for your policy and attach it to the agents that you want them to occasionally use this policy.
67
67
2. In your trainer parameters, set `useHeuristicChance` to larger than 0.
68
-
3. Use [TrainerParamOverride - needs link](asdfs) to decrease the `useHeuristicChance` over time during the training.
68
+
3. Use [TrainerParamOverride](TrainerParamOverride.md) to decrease the `useHeuristicChance` over time during the training.
0 commit comments