-
Notifications
You must be signed in to change notification settings - Fork 51
Description
After initialization, Learner Get Variables Calls = 17 (16 actors + 1 evaluator) — this is expected.
However, the counter remains stuck at 17 even after >4.7 million learner steps and ~79 million actor steps.
Remote learner variable synchronization succeeds only when called synchronously in the main thread, but fails when using a thread pool
[config]
actor_update_period=1000,
[Partial logs]
(Learner pid=1611473) [Learner] Actor Episodes = 22630 | Actor Steps = 4390187 | Critic Loss = 1.492 | Dual Alpha Mean = 3.0377115933788446e-08 | Dual Alpha Stddev = 1190.93408203125 | Dual Temperature = 0.0849275290966034 | Evaluator Episodes = 3620 | Evaluator Steps = 702280 | Kl Mean Rel = [0.25477177 0.03609788 0.2433558 0.17765276 0.24410644 0.12566373 (Learner pid=1611473) 0.20047836 0.215948 0.12209918 0.07083611 0.08791797 0.07737362] | Kl Q Rel = 1.0302705764770508 | Kl Stddev Rel = [2.606567 6.293708 3.1946325 5.158073 2.864661 7.876417 5.026118 (Learner pid=1611473) 3.5546534 4.3580575 7.471537 5.010251 7.091734 ] | Learner Get Variables Calls = 17 | Learner Steps = 261956 | Learner Walltime = 4200.173 | Loss Alpha = -0.005777047015726566 | Loss Policy = 59.64836883544922 | Loss Temperature = 14.739921569824219 | Penalty Kl Q Rel = 0.9972324967384338 | Pi Mean Abs Mean = 0.7514693737030029 | Pi Mean Stddev = 0.8732477426528931 | Pi Stddev Abs Mean = 0.5004111528396606 | Pi Stddev Cond = 1.738184928894043 | Pi Stddev Max = 1.0863072872161865 | Pi Stddev Min = 0.6287840604782104 | Pi Stddev Stddev = 0.1172175258398056 | Policy Loss = 74.390 | Q Max = 18.50464630126953 | Q Min = 18.358409881591797 (EnvironmentLoop pid=1611738) [Actor] Actor Episodes = 21143 | Actor Steps = 4101710 | Episode Length = 194 | Episode Return = 77.00794219970703 | Evaluator Episodes = 3379 | Evaluator Steps = 655526 | Learner Get Variables Calls = 17 | Learner Steps = 244631 | Learner Walltime = 3919.179 | Steps Per Second = 60.973 (EnvironmentLoop pid=1612739) [Evaluator] Actor Episodes = 22658 | Actor Steps = 4395619 | Episode Length = 194 | Episode Return = 80.77677917480469 | Evaluator Episodes = 3625 | Evaluator Steps = 703250 | Learner Get Variables Calls = 17 | Learner Steps = 262221 | Learner Walltime = 4204.792 | Steps Per Second = 144.578 (EnvironmentLoop pid=1612558) [Actor] Actor Episodes = 22667 | Actor Steps = 4397365 | Episode Length = 194 | Episode Return = 59.86222457885742 | Evaluator Episodes = 3626 | Evaluator Steps = 703444 | Learner Get Variables Calls = 17 | Learner Steps = 262301 | Learner Walltime = 4205.798 | Steps Per Second = 58.976 (EnvironmentLoop pid=1612618) [Actor] Actor Episodes = 22696 | Actor Steps = 4402991 | Episode Length = 194 | Episode Return = 67.46408081054688 | Evaluator Episodes = 3630 | Evaluator Steps = 704220 | Learner Get Variables Calls = 17 | Learner Steps = 262660 | Learner Walltime = 4211.852 | Steps Per Second = 61.372 [repeated 4x across cluster] (EnvironmentLoop pid=1612167) [Actor] Actor Episodes = 22723 | Actor Steps = 4408229 | Episode Length = 194 | Episode Return = 68.89933013916016 | Evaluator Episodes = 3635 | Evaluator Steps = 705190 | Learner Get Variables Calls = 17 | Learner Steps = 262973 | Learner Walltime = 4216.909 | Steps Per Second = 67.217 [repeated 7x across cluster]