You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Documents/Training-PPO.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,6 +67,7 @@ If you already know some policy that is better than random policy, you might giv
67
67
2. In your trainer parameters, set `useHeuristicChance` to larger than 0.
68
68
3. Use [TrainerParamOverride](TrainerParamOverride.md) to decrease the `useHeuristicChance` over time during the training.
69
69
70
+
Note that your AgentDependentDeicision is only used in training mode. The chance of using it in each step for agent with the script attached depends on `useHeuristicChance`.
0 commit comments