Update Training-PPO.md

tcmxx · web-flow · commit a8ab515d9773 · 2018-08-10T12:44:13.000+03:00
diff --git a/Documents/Training-PPO.md b/Documents/Training-PPO.md
@@ -67,6 +67,7 @@ If you already know some policy that is better than random policy, you might giv
 2. In your trainer parameters, set `useHeuristicChance` to larger than 0.
 3. Use [TrainerParamOverride](TrainerParamOverride.md) to decrease the `useHeuristicChance` over time during the training.
 
+Note that your AgentDependentDeicision is only used in training mode. The chance of using it in each step for agent with the script attached depends on `useHeuristicChance`.
 
 
 

Original file line number	Diff line number	Diff line change
`@@ -67,6 +67,7 @@ If you already know some policy that is better than random policy, you might giv`
`67`	`67`	2. In your trainer parameters, set `useHeuristicChance` to larger than 0.
`68`	`68`	3. Use [TrainerParamOverride](TrainerParamOverride.md) to decrease the `useHeuristicChance` over time during the training.
`69`	`69`
	`70`	+Note that your AgentDependentDeicision is only used in training mode. The chance of using it in each step for agent with the script attached depends on `useHeuristicChance`.
`70`	`71`
`71`	`72`
`72`	`73`