Skip to content

Learner Get Variables Calls stuck at 17 (16 actors + 1 evaluator) despite millions of learner steps #30

@xiaoweibit

Description

@xiaoweibit

After initialization, Learner Get Variables Calls = 17 (16 actors + 1 evaluator) — this is expected.
However, the counter remains stuck at 17 even after >4.7 million learner steps and ~79 million actor steps.
Remote learner variable synchronization succeeds only when called synchronously in the main thread, but fails when using a thread pool

[config]
actor_update_period=1000,

[Partial logs]
(Learner pid=1611473) [Learner] Actor Episodes = 22630 | Actor Steps = 4390187 | Critic Loss = 1.492 | Dual Alpha Mean = 3.0377115933788446e-08 | Dual Alpha Stddev = 1190.93408203125 | Dual Temperature = 0.0849275290966034 | Evaluator Episodes = 3620 | Evaluator Steps = 702280 | Kl Mean Rel = [0.25477177 0.03609788 0.2433558 0.17765276 0.24410644 0.12566373 (Learner pid=1611473) 0.20047836 0.215948 0.12209918 0.07083611 0.08791797 0.07737362] | Kl Q Rel = 1.0302705764770508 | Kl Stddev Rel = [2.606567 6.293708 3.1946325 5.158073 2.864661 7.876417 5.026118 (Learner pid=1611473) 3.5546534 4.3580575 7.471537 5.010251 7.091734 ] | Learner Get Variables Calls = 17 | Learner Steps = 261956 | Learner Walltime = 4200.173 | Loss Alpha = -0.005777047015726566 | Loss Policy = 59.64836883544922 | Loss Temperature = 14.739921569824219 | Penalty Kl Q Rel = 0.9972324967384338 | Pi Mean Abs Mean = 0.7514693737030029 | Pi Mean Stddev = 0.8732477426528931 | Pi Stddev Abs Mean = 0.5004111528396606 | Pi Stddev Cond = 1.738184928894043 | Pi Stddev Max = 1.0863072872161865 | Pi Stddev Min = 0.6287840604782104 | Pi Stddev Stddev = 0.1172175258398056 | Policy Loss = 74.390 | Q Max = 18.50464630126953 | Q Min = 18.358409881591797 (EnvironmentLoop pid=1611738) [Actor] Actor Episodes = 21143 | Actor Steps = 4101710 | Episode Length = 194 | Episode Return = 77.00794219970703 | Evaluator Episodes = 3379 | Evaluator Steps = 655526 | Learner Get Variables Calls = 17 | Learner Steps = 244631 | Learner Walltime = 3919.179 | Steps Per Second = 60.973 (EnvironmentLoop pid=1612739) [Evaluator] Actor Episodes = 22658 | Actor Steps = 4395619 | Episode Length = 194 | Episode Return = 80.77677917480469 | Evaluator Episodes = 3625 | Evaluator Steps = 703250 | Learner Get Variables Calls = 17 | Learner Steps = 262221 | Learner Walltime = 4204.792 | Steps Per Second = 144.578 (EnvironmentLoop pid=1612558) [Actor] Actor Episodes = 22667 | Actor Steps = 4397365 | Episode Length = 194 | Episode Return = 59.86222457885742 | Evaluator Episodes = 3626 | Evaluator Steps = 703444 | Learner Get Variables Calls = 17 | Learner Steps = 262301 | Learner Walltime = 4205.798 | Steps Per Second = 58.976 (EnvironmentLoop pid=1612618) [Actor] Actor Episodes = 22696 | Actor Steps = 4402991 | Episode Length = 194 | Episode Return = 67.46408081054688 | Evaluator Episodes = 3630 | Evaluator Steps = 704220 | Learner Get Variables Calls = 17 | Learner Steps = 262660 | Learner Walltime = 4211.852 | Steps Per Second = 61.372 [repeated 4x across cluster] (EnvironmentLoop pid=1612167) [Actor] Actor Episodes = 22723 | Actor Steps = 4408229 | Episode Length = 194 | Episode Return = 68.89933013916016 | Evaluator Episodes = 3635 | Evaluator Steps = 705190 | Learner Get Variables Calls = 17 | Learner Steps = 262973 | Learner Walltime = 4216.909 | Steps Per Second = 67.217 [repeated 7x across cluster]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions