Skip to content

Commit cdfe41e

Browse files
authored
created fallback implementation for legal_action_space_mask (#644)
* created fallback implementation for legal_action_space_mask * fixed doc typo
1 parent 5106305 commit cdfe41e

File tree

2 files changed

+8
-3
lines changed

2 files changed

+8
-3
lines changed

src/ReinforcementLearningBase/src/interface.jl

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -450,9 +450,14 @@ legal_action_space(::MinimalActionSet, env, player) = action_space(env)
450450
"""
451451
legal_action_space_mask(env, player=current_player(env)) -> AbstractArray{Bool}
452452
453-
Required for environments of [`FULL_ACTION_SET`](@ref).
453+
Required for environments of [`FULL_ACTION_SET`](@ref). As a default implementation,
454+
[`legal_action_space_mask`](@ref) creates a mask of [`action_space`](@ref) with
455+
the subset [`legal_action_space`](@ref).
454456
"""
455-
@multi_agent_env_api legal_action_space_mask(env::AbstractEnv, player = current_player(env))
457+
@multi_agent_env_api legal_action_space_mask(env::AbstractEnv, player = current_player(env)) =
458+
map(action_space(env, player)) do action
459+
action in legal_action_space(env, player)
460+
end
456461

457462
"""
458463
state(env, style=[DefaultStateStyle(env)], player=[current_player(env)])

src/ReinforcementLearningCore/src/policies/q_based_policies/explorers/epsilon_greedy_explorer.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Two kinds of epsilon-decreasing strategy are implemented here (`linear` and `exp
2424
- `ϵ_stable::Float64`: the epsilon after `warmup_steps + decay_steps`.
2525
- `is_break_tie=false`: randomly select an action of the same maximum values if set to `true`.
2626
- `rng=Random.GLOBAL_RNG`: set the internal RNG.
27-
- `is_training=true`, in training mode, `step` will not be updated. And the `ϵ` will be set to 0.
27+
- `is_training=true`: when not in training mode, `step` will not be updated. And the `ϵ` will be set to 0.
2828
2929
# Example
3030

0 commit comments

Comments
 (0)