[2024-08-14 01:18:01,575][01002] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-08-14 01:18:01,579][01002] Rollout worker 0 uses device cpu [2024-08-14 01:18:01,580][01002] Rollout worker 1 uses device cpu [2024-08-14 01:18:01,584][01002] Rollout worker 2 uses device cpu [2024-08-14 01:18:01,585][01002] Rollout worker 3 uses device cpu [2024-08-14 01:18:01,587][01002] Rollout worker 4 uses device cpu [2024-08-14 01:18:01,589][01002] Rollout worker 5 uses device cpu [2024-08-14 01:18:01,590][01002] Rollout worker 6 uses device cpu [2024-08-14 01:18:01,591][01002] Rollout worker 7 uses device cpu [2024-08-14 01:18:01,755][01002] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-14 01:18:01,756][01002] InferenceWorker_p0-w0: min num requests: 2 [2024-08-14 01:18:01,797][01002] Starting all processes... [2024-08-14 01:18:01,799][01002] Starting process learner_proc0 [2024-08-14 01:18:01,847][01002] Starting all processes... [2024-08-14 01:18:01,859][01002] Starting process inference_proc0-0 [2024-08-14 01:18:01,862][01002] Starting process rollout_proc0 [2024-08-14 01:18:01,862][01002] Starting process rollout_proc1 [2024-08-14 01:18:01,862][01002] Starting process rollout_proc2 [2024-08-14 01:18:01,862][01002] Starting process rollout_proc3 [2024-08-14 01:18:01,862][01002] Starting process rollout_proc4 [2024-08-14 01:18:01,862][01002] Starting process rollout_proc5 [2024-08-14 01:18:01,862][01002] Starting process rollout_proc6 [2024-08-14 01:18:01,862][01002] Starting process rollout_proc7 [2024-08-14 01:18:11,068][04339] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-14 01:18:11,068][04339] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-08-14 01:18:11,208][04339] Num visible devices: 1 [2024-08-14 01:18:11,277][04339] Starting seed is not provided [2024-08-14 01:18:11,278][04339] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-14 01:18:11,279][04339] Initializing actor-critic model on device cuda:0 [2024-08-14 01:18:11,280][04339] RunningMeanStd input shape: (3, 72, 128) [2024-08-14 01:18:11,282][04339] RunningMeanStd input shape: (1,) [2024-08-14 01:18:11,465][04339] ConvEncoder: input_channels=3 [2024-08-14 01:18:13,274][04339] Conv encoder output size: 512 [2024-08-14 01:18:13,293][04339] Policy head output size: 512 [2024-08-14 01:18:13,520][04339] Created Actor Critic model with architecture: [2024-08-14 01:18:13,534][04339] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-08-14 01:18:13,572][04355] Worker 2 uses CPU cores [0] [2024-08-14 01:18:13,615][04354] Worker 1 uses CPU cores [1] [2024-08-14 01:18:13,665][04357] Worker 4 uses CPU cores [0] [2024-08-14 01:18:13,750][04352] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-14 01:18:13,751][04352] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-08-14 01:18:13,839][04359] Worker 6 uses CPU cores [0] [2024-08-14 01:18:13,840][04353] Worker 0 uses CPU cores [0] [2024-08-14 01:18:13,884][04360] Worker 7 uses CPU cores [1] [2024-08-14 01:18:13,897][04352] Num visible devices: 1 [2024-08-14 01:18:13,945][04356] Worker 3 uses CPU cores [1] [2024-08-14 01:18:13,961][04358] Worker 5 uses CPU cores [1] [2024-08-14 01:18:18,184][04339] Using optimizer [2024-08-14 01:18:18,185][04339] No checkpoints found [2024-08-14 01:18:18,186][04339] Did not load from checkpoint, starting from scratch! [2024-08-14 01:18:18,186][04339] Initialized policy 0 weights for model version 0 [2024-08-14 01:18:18,189][04339] LearnerWorker_p0 finished initialization! [2024-08-14 01:18:18,191][04339] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-14 01:18:18,290][04352] RunningMeanStd input shape: (3, 72, 128) [2024-08-14 01:18:18,292][04352] RunningMeanStd input shape: (1,) [2024-08-14 01:18:18,311][04352] ConvEncoder: input_channels=3 [2024-08-14 01:18:18,428][04352] Conv encoder output size: 512 [2024-08-14 01:18:18,429][04352] Policy head output size: 512 [2024-08-14 01:18:19,995][01002] Inference worker 0-0 is ready! [2024-08-14 01:18:19,997][01002] All inference workers are ready! Signal rollout workers to start! [2024-08-14 01:18:20,110][04357] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-14 01:18:20,143][04358] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-14 01:18:20,146][04355] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-14 01:18:20,163][04354] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-14 01:18:20,169][04356] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-14 01:18:20,169][04359] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-14 01:18:20,172][04353] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-14 01:18:20,182][04360] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-14 01:18:20,366][04355] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process... [2024-08-14 01:18:20,368][04357] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process... [2024-08-14 01:18:20,367][04356] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process... [2024-08-14 01:18:20,370][04354] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process... [2024-08-14 01:18:20,370][04355] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init self.game.init() vizdoom.vizdoom.ViZDoomUnexpectedExitException: Controlled ViZDoom instance exited unexpectedly. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 323, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 379, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 84, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 323, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset self._ensure_initialized() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized self.initialize() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize self._game_init() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init raise EnvCriticalError() sample_factory.envs.env_utils.EnvCriticalError [2024-08-14 01:18:20,374][04355] Unhandled exception in evt loop rollout_proc2_evt_loop [2024-08-14 01:18:20,370][04357] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init self.game.init() vizdoom.vizdoom.ViZDoomUnexpectedExitException: Controlled ViZDoom instance exited unexpectedly. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 323, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 379, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 84, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 323, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset self._ensure_initialized() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized self.initialize() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize self._game_init() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init raise EnvCriticalError() sample_factory.envs.env_utils.EnvCriticalError [2024-08-14 01:18:20,375][04357] Unhandled exception in evt loop rollout_proc4_evt_loop [2024-08-14 01:18:20,372][04356] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init self.game.init() vizdoom.vizdoom.ViZDoomUnexpectedExitException: Controlled ViZDoom instance exited unexpectedly. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 323, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 379, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 84, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 323, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset self._ensure_initialized() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized self.initialize() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize self._game_init() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init raise EnvCriticalError() sample_factory.envs.env_utils.EnvCriticalError [2024-08-14 01:18:20,382][04356] Unhandled exception in evt loop rollout_proc3_evt_loop [2024-08-14 01:18:20,373][04354] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init self.game.init() vizdoom.vizdoom.ViZDoomUnexpectedExitException: Controlled ViZDoom instance exited unexpectedly. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 323, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 379, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 84, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 323, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset self._ensure_initialized() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized self.initialize() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize self._game_init() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init raise EnvCriticalError() sample_factory.envs.env_utils.EnvCriticalError [2024-08-14 01:18:20,400][04354] Unhandled exception in evt loop rollout_proc1_evt_loop [2024-08-14 01:18:21,746][01002] Heartbeat connected on Batcher_0 [2024-08-14 01:18:21,754][01002] Heartbeat connected on LearnerWorker_p0 [2024-08-14 01:18:21,791][01002] Heartbeat connected on InferenceWorker_p0-w0 [2024-08-14 01:18:21,855][04353] Decorrelating experience for 0 frames... [2024-08-14 01:18:21,873][04358] Decorrelating experience for 0 frames... [2024-08-14 01:18:22,610][01002] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-14 01:18:22,649][04360] Decorrelating experience for 0 frames... [2024-08-14 01:18:22,655][04353] Decorrelating experience for 32 frames... [2024-08-14 01:18:22,673][04358] Decorrelating experience for 32 frames... [2024-08-14 01:18:22,724][04359] Decorrelating experience for 0 frames... [2024-08-14 01:18:23,536][04359] Decorrelating experience for 32 frames... [2024-08-14 01:18:23,574][04360] Decorrelating experience for 32 frames... [2024-08-14 01:18:23,589][04353] Decorrelating experience for 64 frames... [2024-08-14 01:18:23,697][04358] Decorrelating experience for 64 frames... [2024-08-14 01:18:24,475][04359] Decorrelating experience for 64 frames... [2024-08-14 01:18:24,478][04353] Decorrelating experience for 96 frames... [2024-08-14 01:18:24,510][04360] Decorrelating experience for 64 frames... [2024-08-14 01:18:24,530][04358] Decorrelating experience for 96 frames... [2024-08-14 01:18:24,637][01002] Heartbeat connected on RolloutWorker_w0 [2024-08-14 01:18:24,696][01002] Heartbeat connected on RolloutWorker_w5 [2024-08-14 01:18:25,118][04360] Decorrelating experience for 96 frames... [2024-08-14 01:18:25,221][04359] Decorrelating experience for 96 frames... [2024-08-14 01:18:25,323][01002] Heartbeat connected on RolloutWorker_w6 [2024-08-14 01:18:25,371][01002] Heartbeat connected on RolloutWorker_w7 [2024-08-14 01:18:27,610][01002] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 4.0. Samples: 20. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-14 01:18:30,795][04339] Signal inference workers to stop experience collection... [2024-08-14 01:18:30,805][04352] InferenceWorker_p0-w0: stopping experience collection [2024-08-14 01:18:32,587][04339] Signal inference workers to resume experience collection... [2024-08-14 01:18:32,588][04352] InferenceWorker_p0-w0: resuming experience collection [2024-08-14 01:18:32,611][01002] Fps is (10 sec: 409.6, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 4096. Throughput: 0: 220.6. Samples: 2206. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-08-14 01:18:32,614][01002] Avg episode reward: [(0, '3.332')] [2024-08-14 01:18:37,610][01002] Fps is (10 sec: 2048.0, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 20480. Throughput: 0: 252.1. Samples: 3782. Policy #0 lag: (min: 0.0, avg: 0.2, max: 2.0) [2024-08-14 01:18:37,618][01002] Avg episode reward: [(0, '4.248')] [2024-08-14 01:18:42,610][01002] Fps is (10 sec: 3276.8, 60 sec: 1843.2, 300 sec: 1843.2). Total num frames: 36864. Throughput: 0: 446.6. Samples: 8932. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:18:42,613][01002] Avg episode reward: [(0, '4.277')] [2024-08-14 01:18:43,713][04352] Updated weights for policy 0, policy_version 10 (0.0017) [2024-08-14 01:18:47,610][01002] Fps is (10 sec: 3276.8, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 53248. Throughput: 0: 533.6. Samples: 13340. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:18:47,617][01002] Avg episode reward: [(0, '4.387')] [2024-08-14 01:18:52,610][01002] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 73728. Throughput: 0: 546.3. Samples: 16390. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:18:52,616][01002] Avg episode reward: [(0, '4.350')] [2024-08-14 01:18:54,343][04352] Updated weights for policy 0, policy_version 20 (0.0015) [2024-08-14 01:18:57,615][01002] Fps is (10 sec: 3684.8, 60 sec: 2574.3, 300 sec: 2574.3). Total num frames: 90112. Throughput: 0: 633.1. Samples: 22160. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:18:57,617][01002] Avg episode reward: [(0, '4.513')] [2024-08-14 01:19:02,610][01002] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 106496. Throughput: 0: 659.2. Samples: 26370. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:19:02,616][01002] Avg episode reward: [(0, '4.471')] [2024-08-14 01:19:02,619][04339] Saving new best policy, reward=4.471! [2024-08-14 01:19:06,685][04352] Updated weights for policy 0, policy_version 30 (0.0013) [2024-08-14 01:19:07,610][01002] Fps is (10 sec: 3278.1, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 122880. Throughput: 0: 652.5. Samples: 29364. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:19:07,614][01002] Avg episode reward: [(0, '4.289')] [2024-08-14 01:19:12,614][01002] Fps is (10 sec: 3685.2, 60 sec: 2867.0, 300 sec: 2867.0). Total num frames: 143360. Throughput: 0: 786.9. Samples: 35434. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:19:12,616][01002] Avg episode reward: [(0, '4.570')] [2024-08-14 01:19:12,618][04339] Saving new best policy, reward=4.570! [2024-08-14 01:19:17,610][01002] Fps is (10 sec: 3276.8, 60 sec: 2830.0, 300 sec: 2830.0). Total num frames: 155648. Throughput: 0: 822.8. Samples: 39234. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:19:17,614][01002] Avg episode reward: [(0, '4.651')] [2024-08-14 01:19:17,623][04339] Saving new best policy, reward=4.651! [2024-08-14 01:19:19,017][04352] Updated weights for policy 0, policy_version 40 (0.0012) [2024-08-14 01:19:22,610][01002] Fps is (10 sec: 3277.9, 60 sec: 2935.5, 300 sec: 2935.5). Total num frames: 176128. Throughput: 0: 853.9. Samples: 42206. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:19:22,613][01002] Avg episode reward: [(0, '4.591')] [2024-08-14 01:19:27,610][01002] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3024.7). Total num frames: 196608. Throughput: 0: 875.7. Samples: 48338. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:19:27,613][01002] Avg episode reward: [(0, '4.605')] [2024-08-14 01:19:30,389][04352] Updated weights for policy 0, policy_version 50 (0.0021) [2024-08-14 01:19:32,612][01002] Fps is (10 sec: 3276.2, 60 sec: 3413.2, 300 sec: 2984.2). Total num frames: 208896. Throughput: 0: 869.3. Samples: 52458. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:19:32,617][01002] Avg episode reward: [(0, '4.668')] [2024-08-14 01:19:32,621][04339] Saving new best policy, reward=4.668! [2024-08-14 01:19:37,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3058.3). Total num frames: 229376. Throughput: 0: 862.5. Samples: 55204. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:19:37,617][01002] Avg episode reward: [(0, '4.648')] [2024-08-14 01:19:41,518][04352] Updated weights for policy 0, policy_version 60 (0.0013) [2024-08-14 01:19:42,610][01002] Fps is (10 sec: 3687.1, 60 sec: 3481.6, 300 sec: 3072.0). Total num frames: 245760. Throughput: 0: 867.7. Samples: 61202. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:19:42,613][01002] Avg episode reward: [(0, '4.522')] [2024-08-14 01:19:47,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3084.0). Total num frames: 262144. Throughput: 0: 871.6. Samples: 65594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:19:47,613][01002] Avg episode reward: [(0, '4.545')] [2024-08-14 01:19:52,612][01002] Fps is (10 sec: 3276.3, 60 sec: 3413.2, 300 sec: 3094.7). Total num frames: 278528. Throughput: 0: 859.2. Samples: 68028. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:19:52,618][01002] Avg episode reward: [(0, '4.654')] [2024-08-14 01:19:53,939][04352] Updated weights for policy 0, policy_version 70 (0.0013) [2024-08-14 01:19:57,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.8, 300 sec: 3147.5). Total num frames: 299008. Throughput: 0: 858.1. Samples: 74046. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:19:57,613][01002] Avg episode reward: [(0, '4.525')] [2024-08-14 01:19:57,620][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth... [2024-08-14 01:20:02,610][01002] Fps is (10 sec: 3687.0, 60 sec: 3481.6, 300 sec: 3153.9). Total num frames: 315392. Throughput: 0: 879.2. Samples: 78796. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:20:02,617][01002] Avg episode reward: [(0, '4.580')] [2024-08-14 01:20:06,415][04352] Updated weights for policy 0, policy_version 80 (0.0017) [2024-08-14 01:20:07,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3159.8). Total num frames: 331776. Throughput: 0: 861.5. Samples: 80974. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:20:07,613][01002] Avg episode reward: [(0, '4.710')] [2024-08-14 01:20:07,623][04339] Saving new best policy, reward=4.710! [2024-08-14 01:20:12,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.8, 300 sec: 3202.3). Total num frames: 352256. Throughput: 0: 856.0. Samples: 86858. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:20:12,613][01002] Avg episode reward: [(0, '4.615')] [2024-08-14 01:20:17,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3169.9). Total num frames: 364544. Throughput: 0: 873.5. Samples: 91762. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:20:17,616][01002] Avg episode reward: [(0, '4.436')] [2024-08-14 01:20:18,222][04352] Updated weights for policy 0, policy_version 90 (0.0018) [2024-08-14 01:20:22,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3174.4). Total num frames: 380928. Throughput: 0: 853.6. Samples: 93618. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:20:22,613][01002] Avg episode reward: [(0, '4.580')] [2024-08-14 01:20:27,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3211.3). Total num frames: 401408. Throughput: 0: 854.0. Samples: 99630. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:20:27,616][01002] Avg episode reward: [(0, '4.882')] [2024-08-14 01:20:27,624][04339] Saving new best policy, reward=4.882! [2024-08-14 01:20:29,148][04352] Updated weights for policy 0, policy_version 100 (0.0013) [2024-08-14 01:20:32,611][01002] Fps is (10 sec: 3686.0, 60 sec: 3481.6, 300 sec: 3213.8). Total num frames: 417792. Throughput: 0: 871.4. Samples: 104806. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:20:32,620][01002] Avg episode reward: [(0, '4.943')] [2024-08-14 01:20:32,622][04339] Saving new best policy, reward=4.943! [2024-08-14 01:20:37,611][01002] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3216.1). Total num frames: 434176. Throughput: 0: 859.0. Samples: 106680. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:20:37,618][01002] Avg episode reward: [(0, '4.798')] [2024-08-14 01:20:41,583][04352] Updated weights for policy 0, policy_version 110 (0.0014) [2024-08-14 01:20:42,610][01002] Fps is (10 sec: 3277.1, 60 sec: 3413.3, 300 sec: 3218.3). Total num frames: 450560. Throughput: 0: 850.9. Samples: 112338. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:20:42,613][01002] Avg episode reward: [(0, '4.924')] [2024-08-14 01:20:47,616][01002] Fps is (10 sec: 3684.5, 60 sec: 3481.3, 300 sec: 3248.4). Total num frames: 471040. Throughput: 0: 866.2. Samples: 117778. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:20:47,622][01002] Avg episode reward: [(0, '4.842')] [2024-08-14 01:20:52,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3222.2). Total num frames: 483328. Throughput: 0: 860.8. Samples: 119710. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:20:52,617][01002] Avg episode reward: [(0, '4.805')] [2024-08-14 01:20:54,077][04352] Updated weights for policy 0, policy_version 120 (0.0021) [2024-08-14 01:20:57,610][01002] Fps is (10 sec: 3278.6, 60 sec: 3413.3, 300 sec: 3250.4). Total num frames: 503808. Throughput: 0: 849.6. Samples: 125088. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:20:57,618][01002] Avg episode reward: [(0, '4.755')] [2024-08-14 01:21:02,614][01002] Fps is (10 sec: 4094.3, 60 sec: 3481.4, 300 sec: 3276.7). Total num frames: 524288. Throughput: 0: 872.1. Samples: 131010. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:21:02,617][01002] Avg episode reward: [(0, '5.075')] [2024-08-14 01:21:02,619][04339] Saving new best policy, reward=5.075! [2024-08-14 01:21:05,738][04352] Updated weights for policy 0, policy_version 130 (0.0015) [2024-08-14 01:21:07,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3252.0). Total num frames: 536576. Throughput: 0: 871.3. Samples: 132826. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:21:07,619][01002] Avg episode reward: [(0, '5.008')] [2024-08-14 01:21:12,610][01002] Fps is (10 sec: 2868.4, 60 sec: 3345.1, 300 sec: 3252.7). Total num frames: 552960. Throughput: 0: 852.6. Samples: 137998. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:21:12,612][01002] Avg episode reward: [(0, '4.937')] [2024-08-14 01:21:16,759][04352] Updated weights for policy 0, policy_version 140 (0.0012) [2024-08-14 01:21:17,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 573440. Throughput: 0: 868.4. Samples: 143882. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:21:17,612][01002] Avg episode reward: [(0, '5.208')] [2024-08-14 01:21:17,632][04339] Saving new best policy, reward=5.208! [2024-08-14 01:21:22,611][01002] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3254.0). Total num frames: 585728. Throughput: 0: 870.3. Samples: 145844. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:21:22,617][01002] Avg episode reward: [(0, '5.020')] [2024-08-14 01:21:27,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 606208. Throughput: 0: 853.9. Samples: 150762. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:21:27,618][01002] Avg episode reward: [(0, '5.259')] [2024-08-14 01:21:27,627][04339] Saving new best policy, reward=5.259! [2024-08-14 01:21:29,164][04352] Updated weights for policy 0, policy_version 150 (0.0016) [2024-08-14 01:21:32,610][01002] Fps is (10 sec: 4096.1, 60 sec: 3481.7, 300 sec: 3298.4). Total num frames: 626688. Throughput: 0: 865.5. Samples: 156722. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:21:32,617][01002] Avg episode reward: [(0, '5.506')] [2024-08-14 01:21:32,620][04339] Saving new best policy, reward=5.506! [2024-08-14 01:21:37,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 638976. Throughput: 0: 872.8. Samples: 158988. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:21:37,613][01002] Avg episode reward: [(0, '5.609')] [2024-08-14 01:21:37,635][04339] Saving new best policy, reward=5.609! [2024-08-14 01:21:41,630][04352] Updated weights for policy 0, policy_version 160 (0.0030) [2024-08-14 01:21:42,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 655360. Throughput: 0: 853.1. Samples: 163478. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:21:42,615][01002] Avg episode reward: [(0, '5.567')] [2024-08-14 01:21:47,611][01002] Fps is (10 sec: 3686.3, 60 sec: 3413.6, 300 sec: 3296.8). Total num frames: 675840. Throughput: 0: 852.1. Samples: 169352. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:21:47,618][01002] Avg episode reward: [(0, '5.663')] [2024-08-14 01:21:47,630][04339] Saving new best policy, reward=5.663! [2024-08-14 01:21:52,611][01002] Fps is (10 sec: 3686.2, 60 sec: 3481.6, 300 sec: 3296.3). Total num frames: 692224. Throughput: 0: 869.0. Samples: 171932. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:21:52,618][01002] Avg episode reward: [(0, '5.536')] [2024-08-14 01:21:53,733][04352] Updated weights for policy 0, policy_version 170 (0.0012) [2024-08-14 01:21:57,610][01002] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3295.9). Total num frames: 708608. Throughput: 0: 847.9. Samples: 176154. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:21:57,617][01002] Avg episode reward: [(0, '5.654')] [2024-08-14 01:21:57,632][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000173_708608.pth... [2024-08-14 01:22:02,610][01002] Fps is (10 sec: 3686.5, 60 sec: 3413.6, 300 sec: 3314.0). Total num frames: 729088. Throughput: 0: 849.8. Samples: 182124. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:22:02,617][01002] Avg episode reward: [(0, '5.940')] [2024-08-14 01:22:02,623][04339] Saving new best policy, reward=5.940! [2024-08-14 01:22:04,485][04352] Updated weights for policy 0, policy_version 180 (0.0020) [2024-08-14 01:22:07,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3313.2). Total num frames: 745472. Throughput: 0: 869.9. Samples: 184988. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:22:07,613][01002] Avg episode reward: [(0, '6.107')] [2024-08-14 01:22:07,625][04339] Saving new best policy, reward=6.107! [2024-08-14 01:22:12,612][01002] Fps is (10 sec: 2866.7, 60 sec: 3413.2, 300 sec: 3294.6). Total num frames: 757760. Throughput: 0: 845.1. Samples: 188794. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:22:12,622][01002] Avg episode reward: [(0, '5.770')] [2024-08-14 01:22:17,104][04352] Updated weights for policy 0, policy_version 190 (0.0019) [2024-08-14 01:22:17,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3311.7). Total num frames: 778240. Throughput: 0: 843.9. Samples: 194698. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:22:17,618][01002] Avg episode reward: [(0, '5.906')] [2024-08-14 01:22:22,610][01002] Fps is (10 sec: 3687.1, 60 sec: 3481.6, 300 sec: 3310.9). Total num frames: 794624. Throughput: 0: 859.7. Samples: 197674. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:22:22,616][01002] Avg episode reward: [(0, '5.968')] [2024-08-14 01:22:27,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3310.2). Total num frames: 811008. Throughput: 0: 851.8. Samples: 201810. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:22:27,618][01002] Avg episode reward: [(0, '5.999')] [2024-08-14 01:22:29,432][04352] Updated weights for policy 0, policy_version 200 (0.0013) [2024-08-14 01:22:32,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3326.0). Total num frames: 831488. Throughput: 0: 849.4. Samples: 207574. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:22:32,616][01002] Avg episode reward: [(0, '6.058')] [2024-08-14 01:22:37,612][01002] Fps is (10 sec: 3685.9, 60 sec: 3481.5, 300 sec: 3325.0). Total num frames: 847872. Throughput: 0: 859.3. Samples: 210600. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:22:37,615][01002] Avg episode reward: [(0, '6.244')] [2024-08-14 01:22:37,630][04339] Saving new best policy, reward=6.244! [2024-08-14 01:22:41,607][04352] Updated weights for policy 0, policy_version 210 (0.0017) [2024-08-14 01:22:42,611][01002] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3308.3). Total num frames: 860160. Throughput: 0: 862.1. Samples: 214950. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:22:42,613][01002] Avg episode reward: [(0, '6.418')] [2024-08-14 01:22:42,620][04339] Saving new best policy, reward=6.418! [2024-08-14 01:22:47,611][01002] Fps is (10 sec: 3277.1, 60 sec: 3413.3, 300 sec: 3323.2). Total num frames: 880640. Throughput: 0: 848.2. Samples: 220292. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:22:47,619][01002] Avg episode reward: [(0, '6.435')] [2024-08-14 01:22:47,630][04339] Saving new best policy, reward=6.435! [2024-08-14 01:22:52,321][04352] Updated weights for policy 0, policy_version 220 (0.0018) [2024-08-14 01:22:52,614][01002] Fps is (10 sec: 4094.6, 60 sec: 3481.4, 300 sec: 3337.4). Total num frames: 901120. Throughput: 0: 849.9. Samples: 223236. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:22:52,618][01002] Avg episode reward: [(0, '6.873')] [2024-08-14 01:22:52,626][04339] Saving new best policy, reward=6.873! [2024-08-14 01:22:57,610][01002] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3321.5). Total num frames: 913408. Throughput: 0: 867.3. Samples: 227820. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:22:57,613][01002] Avg episode reward: [(0, '6.941')] [2024-08-14 01:22:57,630][04339] Saving new best policy, reward=6.941! [2024-08-14 01:23:02,610][01002] Fps is (10 sec: 2868.2, 60 sec: 3345.1, 300 sec: 3320.7). Total num frames: 929792. Throughput: 0: 851.5. Samples: 233014. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:23:02,618][01002] Avg episode reward: [(0, '7.122')] [2024-08-14 01:23:02,692][04339] Saving new best policy, reward=7.122! [2024-08-14 01:23:04,887][04352] Updated weights for policy 0, policy_version 230 (0.0024) [2024-08-14 01:23:07,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3334.3). Total num frames: 950272. Throughput: 0: 850.4. Samples: 235944. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:23:07,613][01002] Avg episode reward: [(0, '7.367')] [2024-08-14 01:23:07,628][04339] Saving new best policy, reward=7.367! [2024-08-14 01:23:12,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3319.2). Total num frames: 962560. Throughput: 0: 866.8. Samples: 240816. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:23:12,612][01002] Avg episode reward: [(0, '7.336')] [2024-08-14 01:23:17,356][04352] Updated weights for policy 0, policy_version 240 (0.0013) [2024-08-14 01:23:17,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 983040. Throughput: 0: 848.1. Samples: 245738. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:23:17,615][01002] Avg episode reward: [(0, '7.314')] [2024-08-14 01:23:22,610][01002] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 1003520. Throughput: 0: 847.8. Samples: 248752. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:23:22,613][01002] Avg episode reward: [(0, '7.373')] [2024-08-14 01:23:22,618][04339] Saving new best policy, reward=7.373! [2024-08-14 01:23:27,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 1015808. Throughput: 0: 862.4. Samples: 253758. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:23:27,616][01002] Avg episode reward: [(0, '7.209')] [2024-08-14 01:23:29,821][04352] Updated weights for policy 0, policy_version 250 (0.0013) [2024-08-14 01:23:32,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 1032192. Throughput: 0: 848.9. Samples: 258492. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:23:32,612][01002] Avg episode reward: [(0, '8.279')] [2024-08-14 01:23:32,622][04339] Saving new best policy, reward=8.279! [2024-08-14 01:23:37,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3443.4). Total num frames: 1052672. Throughput: 0: 849.2. Samples: 261448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:23:37,615][01002] Avg episode reward: [(0, '9.522')] [2024-08-14 01:23:37,628][04339] Saving new best policy, reward=9.522! [2024-08-14 01:23:40,420][04352] Updated weights for policy 0, policy_version 260 (0.0013) [2024-08-14 01:23:42,611][01002] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1069056. Throughput: 0: 867.3. Samples: 266850. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:23:42,616][01002] Avg episode reward: [(0, '9.961')] [2024-08-14 01:23:42,623][04339] Saving new best policy, reward=9.961! [2024-08-14 01:23:47,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 1085440. Throughput: 0: 848.3. Samples: 271188. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:23:47,613][01002] Avg episode reward: [(0, '9.416')] [2024-08-14 01:23:52,569][04352] Updated weights for policy 0, policy_version 270 (0.0016) [2024-08-14 01:23:52,610][01002] Fps is (10 sec: 3686.6, 60 sec: 3413.5, 300 sec: 3443.5). Total num frames: 1105920. Throughput: 0: 849.6. Samples: 274174. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:23:52,618][01002] Avg episode reward: [(0, '7.699')] [2024-08-14 01:23:57,611][01002] Fps is (10 sec: 3686.2, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1122304. Throughput: 0: 867.5. Samples: 279852. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:23:57,613][01002] Avg episode reward: [(0, '7.219')] [2024-08-14 01:23:57,627][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000274_1122304.pth... [2024-08-14 01:23:57,800][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth [2024-08-14 01:24:02,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 1134592. Throughput: 0: 850.6. Samples: 284014. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:24:02,613][01002] Avg episode reward: [(0, '7.557')] [2024-08-14 01:24:04,960][04352] Updated weights for policy 0, policy_version 280 (0.0013) [2024-08-14 01:24:07,610][01002] Fps is (10 sec: 3277.0, 60 sec: 3413.3, 300 sec: 3429.6). Total num frames: 1155072. Throughput: 0: 850.8. Samples: 287036. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:24:07,616][01002] Avg episode reward: [(0, '8.060')] [2024-08-14 01:24:12,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1171456. Throughput: 0: 872.5. Samples: 293022. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:24:12,613][01002] Avg episode reward: [(0, '8.682')] [2024-08-14 01:24:17,294][04352] Updated weights for policy 0, policy_version 290 (0.0018) [2024-08-14 01:24:17,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 1187840. Throughput: 0: 853.1. Samples: 296880. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:24:17,618][01002] Avg episode reward: [(0, '9.158')] [2024-08-14 01:24:22,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 1208320. Throughput: 0: 854.0. Samples: 299876. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:24:22,613][01002] Avg episode reward: [(0, '8.821')] [2024-08-14 01:24:27,615][01002] Fps is (10 sec: 3684.8, 60 sec: 3481.3, 300 sec: 3443.4). Total num frames: 1224704. Throughput: 0: 869.3. Samples: 305972. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:24:27,617][01002] Avg episode reward: [(0, '9.396')] [2024-08-14 01:24:27,944][04352] Updated weights for policy 0, policy_version 300 (0.0020) [2024-08-14 01:24:32,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 1236992. Throughput: 0: 865.7. Samples: 310144. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:24:32,614][01002] Avg episode reward: [(0, '8.954')] [2024-08-14 01:24:37,611][01002] Fps is (10 sec: 3688.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1261568. Throughput: 0: 861.5. Samples: 312940. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:24:37,613][01002] Avg episode reward: [(0, '9.516')] [2024-08-14 01:24:39,745][04352] Updated weights for policy 0, policy_version 310 (0.0023) [2024-08-14 01:24:42,610][01002] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1277952. Throughput: 0: 870.4. Samples: 319020. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:24:42,618][01002] Avg episode reward: [(0, '9.466')] [2024-08-14 01:24:47,614][01002] Fps is (10 sec: 2866.3, 60 sec: 3413.1, 300 sec: 3429.5). Total num frames: 1290240. Throughput: 0: 876.5. Samples: 323460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-14 01:24:47,626][01002] Avg episode reward: [(0, '9.586')] [2024-08-14 01:24:51,945][04352] Updated weights for policy 0, policy_version 320 (0.0020) [2024-08-14 01:24:52,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 1310720. Throughput: 0: 864.4. Samples: 325932. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:24:52,614][01002] Avg episode reward: [(0, '10.128')] [2024-08-14 01:24:52,618][04339] Saving new best policy, reward=10.128! [2024-08-14 01:24:57,611][01002] Fps is (10 sec: 4097.2, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1331200. Throughput: 0: 866.1. Samples: 331998. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:24:57,620][01002] Avg episode reward: [(0, '9.549')] [2024-08-14 01:25:02,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3443.4). Total num frames: 1347584. Throughput: 0: 887.0. Samples: 336794. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:25:02,613][01002] Avg episode reward: [(0, '10.042')] [2024-08-14 01:25:04,266][04352] Updated weights for policy 0, policy_version 330 (0.0013) [2024-08-14 01:25:07,610][01002] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 1363968. Throughput: 0: 868.8. Samples: 338970. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:25:07,615][01002] Avg episode reward: [(0, '9.623')] [2024-08-14 01:25:12,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 1384448. Throughput: 0: 868.5. Samples: 345050. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:25:12,613][01002] Avg episode reward: [(0, '10.537')] [2024-08-14 01:25:12,616][04339] Saving new best policy, reward=10.537! [2024-08-14 01:25:14,685][04352] Updated weights for policy 0, policy_version 340 (0.0022) [2024-08-14 01:25:17,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1396736. Throughput: 0: 884.4. Samples: 349942. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:25:17,613][01002] Avg episode reward: [(0, '11.956')] [2024-08-14 01:25:17,630][04339] Saving new best policy, reward=11.956! [2024-08-14 01:25:22,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 1413120. Throughput: 0: 861.8. Samples: 351722. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:25:22,617][01002] Avg episode reward: [(0, '12.013')] [2024-08-14 01:25:22,622][04339] Saving new best policy, reward=12.013! [2024-08-14 01:25:26,947][04352] Updated weights for policy 0, policy_version 350 (0.0014) [2024-08-14 01:25:27,611][01002] Fps is (10 sec: 3686.3, 60 sec: 3481.8, 300 sec: 3443.4). Total num frames: 1433600. Throughput: 0: 858.7. Samples: 357664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:25:27,618][01002] Avg episode reward: [(0, '13.534')] [2024-08-14 01:25:27,628][04339] Saving new best policy, reward=13.534! [2024-08-14 01:25:32,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3443.4). Total num frames: 1449984. Throughput: 0: 877.7. Samples: 362952. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:25:32,615][01002] Avg episode reward: [(0, '13.657')] [2024-08-14 01:25:32,623][04339] Saving new best policy, reward=13.657! [2024-08-14 01:25:37,610][01002] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1466368. Throughput: 0: 863.4. Samples: 364786. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:25:37,617][01002] Avg episode reward: [(0, '14.722')] [2024-08-14 01:25:37,630][04339] Saving new best policy, reward=14.722! [2024-08-14 01:25:39,445][04352] Updated weights for policy 0, policy_version 360 (0.0018) [2024-08-14 01:25:42,611][01002] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3443.5). Total num frames: 1486848. Throughput: 0: 853.5. Samples: 370406. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:25:42,612][01002] Avg episode reward: [(0, '14.299')] [2024-08-14 01:25:47,611][01002] Fps is (10 sec: 3686.3, 60 sec: 3550.0, 300 sec: 3457.3). Total num frames: 1503232. Throughput: 0: 872.7. Samples: 376064. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:25:47,613][01002] Avg episode reward: [(0, '12.927')] [2024-08-14 01:25:51,906][04352] Updated weights for policy 0, policy_version 370 (0.0015) [2024-08-14 01:25:52,610][01002] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 1515520. Throughput: 0: 865.4. Samples: 377912. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:25:52,618][01002] Avg episode reward: [(0, '13.545')] [2024-08-14 01:25:57,613][01002] Fps is (10 sec: 3276.1, 60 sec: 3413.2, 300 sec: 3429.6). Total num frames: 1536000. Throughput: 0: 849.7. Samples: 383290. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-14 01:25:57,620][01002] Avg episode reward: [(0, '12.905')] [2024-08-14 01:25:57,630][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000375_1536000.pth... [2024-08-14 01:25:57,738][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000173_708608.pth [2024-08-14 01:26:02,414][04352] Updated weights for policy 0, policy_version 380 (0.0016) [2024-08-14 01:26:02,610][01002] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1556480. Throughput: 0: 872.2. Samples: 389190. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:26:02,619][01002] Avg episode reward: [(0, '13.313')] [2024-08-14 01:26:07,610][01002] Fps is (10 sec: 3277.6, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1568768. Throughput: 0: 874.5. Samples: 391076. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:26:07,617][01002] Avg episode reward: [(0, '13.747')] [2024-08-14 01:26:12,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1589248. Throughput: 0: 854.0. Samples: 396094. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:26:12,614][01002] Avg episode reward: [(0, '13.237')] [2024-08-14 01:26:14,482][04352] Updated weights for policy 0, policy_version 390 (0.0025) [2024-08-14 01:26:17,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1605632. Throughput: 0: 869.2. Samples: 402068. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:26:17,613][01002] Avg episode reward: [(0, '13.855')] [2024-08-14 01:26:22,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1622016. Throughput: 0: 874.1. Samples: 404122. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:26:22,613][01002] Avg episode reward: [(0, '13.911')] [2024-08-14 01:26:26,898][04352] Updated weights for policy 0, policy_version 400 (0.0022) [2024-08-14 01:26:27,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 1638400. Throughput: 0: 855.1. Samples: 408884. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:26:27,615][01002] Avg episode reward: [(0, '13.629')] [2024-08-14 01:26:32,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1658880. Throughput: 0: 864.5. Samples: 414964. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:26:32,616][01002] Avg episode reward: [(0, '13.902')] [2024-08-14 01:26:37,613][01002] Fps is (10 sec: 3685.6, 60 sec: 3481.5, 300 sec: 3457.3). Total num frames: 1675264. Throughput: 0: 877.1. Samples: 417382. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:26:37,620][01002] Avg episode reward: [(0, '12.909')] [2024-08-14 01:26:39,254][04352] Updated weights for policy 0, policy_version 410 (0.0013) [2024-08-14 01:26:42,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1691648. Throughput: 0: 857.9. Samples: 421894. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:26:42,613][01002] Avg episode reward: [(0, '12.409')] [2024-08-14 01:26:47,610][01002] Fps is (10 sec: 3687.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1712128. Throughput: 0: 861.8. Samples: 427970. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:26:47,617][01002] Avg episode reward: [(0, '13.705')] [2024-08-14 01:26:49,789][04352] Updated weights for policy 0, policy_version 420 (0.0021) [2024-08-14 01:26:52,611][01002] Fps is (10 sec: 3276.5, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1724416. Throughput: 0: 875.8. Samples: 430486. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:26:52,617][01002] Avg episode reward: [(0, '14.214')] [2024-08-14 01:26:57,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3413.5, 300 sec: 3429.5). Total num frames: 1740800. Throughput: 0: 859.6. Samples: 434776. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:26:57,615][01002] Avg episode reward: [(0, '14.701')] [2024-08-14 01:27:01,802][04352] Updated weights for policy 0, policy_version 430 (0.0014) [2024-08-14 01:27:02,610][01002] Fps is (10 sec: 3686.7, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1761280. Throughput: 0: 861.1. Samples: 440818. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:27:02,616][01002] Avg episode reward: [(0, '15.455')] [2024-08-14 01:27:02,619][04339] Saving new best policy, reward=15.455! [2024-08-14 01:27:07,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1777664. Throughput: 0: 879.0. Samples: 443676. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:27:07,620][01002] Avg episode reward: [(0, '13.446')] [2024-08-14 01:27:12,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1794048. Throughput: 0: 862.6. Samples: 447700. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:27:12,612][01002] Avg episode reward: [(0, '13.456')] [2024-08-14 01:27:14,182][04352] Updated weights for policy 0, policy_version 440 (0.0019) [2024-08-14 01:27:17,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1814528. Throughput: 0: 860.0. Samples: 453664. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:27:17,616][01002] Avg episode reward: [(0, '13.385')] [2024-08-14 01:27:22,611][01002] Fps is (10 sec: 3686.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1830912. Throughput: 0: 871.0. Samples: 456576. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:27:22,613][01002] Avg episode reward: [(0, '12.927')] [2024-08-14 01:27:26,628][04352] Updated weights for policy 0, policy_version 450 (0.0013) [2024-08-14 01:27:27,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 1843200. Throughput: 0: 859.7. Samples: 460580. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:27:27,615][01002] Avg episode reward: [(0, '14.325')] [2024-08-14 01:27:32,610][01002] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1863680. Throughput: 0: 855.5. Samples: 466466. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:27:32,618][01002] Avg episode reward: [(0, '15.551')] [2024-08-14 01:27:32,667][04339] Saving new best policy, reward=15.551! [2024-08-14 01:27:37,308][04352] Updated weights for policy 0, policy_version 460 (0.0014) [2024-08-14 01:27:37,616][01002] Fps is (10 sec: 4093.8, 60 sec: 3481.4, 300 sec: 3471.1). Total num frames: 1884160. Throughput: 0: 863.6. Samples: 469350. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:27:37,620][01002] Avg episode reward: [(0, '15.938')] [2024-08-14 01:27:37,631][04339] Saving new best policy, reward=15.938! [2024-08-14 01:27:42,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1896448. Throughput: 0: 861.4. Samples: 473540. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:27:42,618][01002] Avg episode reward: [(0, '16.240')] [2024-08-14 01:27:42,621][04339] Saving new best policy, reward=16.240! [2024-08-14 01:27:47,610][01002] Fps is (10 sec: 3278.6, 60 sec: 3413.3, 300 sec: 3443.5). Total num frames: 1916928. Throughput: 0: 849.1. Samples: 479028. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:27:47,618][01002] Avg episode reward: [(0, '16.582')] [2024-08-14 01:27:47,629][04339] Saving new best policy, reward=16.582! [2024-08-14 01:27:49,573][04352] Updated weights for policy 0, policy_version 470 (0.0012) [2024-08-14 01:27:52,614][01002] Fps is (10 sec: 3684.9, 60 sec: 3481.4, 300 sec: 3457.3). Total num frames: 1933312. Throughput: 0: 848.9. Samples: 481878. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:27:52,617][01002] Avg episode reward: [(0, '18.723')] [2024-08-14 01:27:52,619][04339] Saving new best policy, reward=18.723! [2024-08-14 01:27:57,612][01002] Fps is (10 sec: 2866.8, 60 sec: 3413.2, 300 sec: 3443.4). Total num frames: 1945600. Throughput: 0: 859.0. Samples: 486358. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:27:57,620][01002] Avg episode reward: [(0, '18.789')] [2024-08-14 01:27:57,634][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000475_1945600.pth... [2024-08-14 01:27:57,632][01002] Components not started: RolloutWorker_w1, RolloutWorker_w2, RolloutWorker_w3, RolloutWorker_w4, wait_time=600.0 seconds [2024-08-14 01:27:57,818][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000274_1122304.pth [2024-08-14 01:27:57,833][04339] Saving new best policy, reward=18.789! [2024-08-14 01:28:02,486][04352] Updated weights for policy 0, policy_version 480 (0.0013) [2024-08-14 01:28:02,610][01002] Fps is (10 sec: 3278.1, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1966080. Throughput: 0: 838.4. Samples: 491392. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:28:02,614][01002] Avg episode reward: [(0, '18.660')] [2024-08-14 01:28:07,613][01002] Fps is (10 sec: 3685.8, 60 sec: 3413.2, 300 sec: 3457.3). Total num frames: 1982464. Throughput: 0: 840.1. Samples: 494384. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:28:07,619][01002] Avg episode reward: [(0, '19.572')] [2024-08-14 01:28:07,630][04339] Saving new best policy, reward=19.572! [2024-08-14 01:28:12,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1998848. Throughput: 0: 856.4. Samples: 499120. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:28:12,615][01002] Avg episode reward: [(0, '20.464')] [2024-08-14 01:28:12,623][04339] Saving new best policy, reward=20.464! [2024-08-14 01:28:15,026][04352] Updated weights for policy 0, policy_version 490 (0.0015) [2024-08-14 01:28:17,610][01002] Fps is (10 sec: 3277.8, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 2015232. Throughput: 0: 835.4. Samples: 504060. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:28:17,612][01002] Avg episode reward: [(0, '18.921')] [2024-08-14 01:28:22,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3457.3). Total num frames: 2035712. Throughput: 0: 835.2. Samples: 506928. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:28:22,612][01002] Avg episode reward: [(0, '18.682')] [2024-08-14 01:28:26,147][04352] Updated weights for policy 0, policy_version 500 (0.0013) [2024-08-14 01:28:27,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2048000. Throughput: 0: 857.1. Samples: 512108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-14 01:28:27,612][01002] Avg episode reward: [(0, '18.653')] [2024-08-14 01:28:32,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2068480. Throughput: 0: 842.9. Samples: 516958. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:28:32,612][01002] Avg episode reward: [(0, '18.303')] [2024-08-14 01:28:37,572][04352] Updated weights for policy 0, policy_version 510 (0.0019) [2024-08-14 01:28:37,611][01002] Fps is (10 sec: 4095.9, 60 sec: 3413.6, 300 sec: 3457.3). Total num frames: 2088960. Throughput: 0: 846.7. Samples: 519978. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:28:37,613][01002] Avg episode reward: [(0, '17.758')] [2024-08-14 01:28:42,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2101248. Throughput: 0: 867.8. Samples: 525406. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:28:42,617][01002] Avg episode reward: [(0, '17.938')] [2024-08-14 01:28:47,610][01002] Fps is (10 sec: 2867.3, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 2117632. Throughput: 0: 857.0. Samples: 529956. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:28:47,613][01002] Avg episode reward: [(0, '17.605')] [2024-08-14 01:28:49,900][04352] Updated weights for policy 0, policy_version 520 (0.0012) [2024-08-14 01:28:52,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3413.6, 300 sec: 3443.4). Total num frames: 2138112. Throughput: 0: 855.0. Samples: 532856. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:28:52,613][01002] Avg episode reward: [(0, '18.388')] [2024-08-14 01:28:57,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3457.3). Total num frames: 2154496. Throughput: 0: 877.2. Samples: 538594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:28:57,616][01002] Avg episode reward: [(0, '18.649')] [2024-08-14 01:29:02,031][04352] Updated weights for policy 0, policy_version 530 (0.0015) [2024-08-14 01:29:02,611][01002] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2170880. Throughput: 0: 862.0. Samples: 542848. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:29:02,618][01002] Avg episode reward: [(0, '19.399')] [2024-08-14 01:29:07,611][01002] Fps is (10 sec: 3686.3, 60 sec: 3481.8, 300 sec: 3457.3). Total num frames: 2191360. Throughput: 0: 866.4. Samples: 545916. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:29:07,618][01002] Avg episode reward: [(0, '20.772')] [2024-08-14 01:29:07,629][04339] Saving new best policy, reward=20.772! [2024-08-14 01:29:12,610][01002] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 2207744. Throughput: 0: 879.8. Samples: 551700. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:29:12,621][01002] Avg episode reward: [(0, '20.118')] [2024-08-14 01:29:13,216][04352] Updated weights for policy 0, policy_version 540 (0.0021) [2024-08-14 01:29:17,610][01002] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2224128. Throughput: 0: 861.2. Samples: 555710. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:29:17,612][01002] Avg episode reward: [(0, '21.060')] [2024-08-14 01:29:17,623][04339] Saving new best policy, reward=21.060! [2024-08-14 01:29:22,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.5). Total num frames: 2240512. Throughput: 0: 857.3. Samples: 558558. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:29:22,613][01002] Avg episode reward: [(0, '20.337')] [2024-08-14 01:29:24,782][04352] Updated weights for policy 0, policy_version 550 (0.0014) [2024-08-14 01:29:27,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2260992. Throughput: 0: 870.7. Samples: 564588. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:29:27,616][01002] Avg episode reward: [(0, '20.181')] [2024-08-14 01:29:32,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2273280. Throughput: 0: 855.7. Samples: 568462. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:29:32,619][01002] Avg episode reward: [(0, '19.864')] [2024-08-14 01:29:37,234][04352] Updated weights for policy 0, policy_version 560 (0.0015) [2024-08-14 01:29:37,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2293760. Throughput: 0: 858.8. Samples: 571500. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:29:37,613][01002] Avg episode reward: [(0, '19.783')] [2024-08-14 01:29:42,610][01002] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2314240. Throughput: 0: 865.0. Samples: 577520. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:29:42,616][01002] Avg episode reward: [(0, '19.264')] [2024-08-14 01:29:47,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2326528. Throughput: 0: 863.8. Samples: 581720. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:29:47,618][01002] Avg episode reward: [(0, '19.433')] [2024-08-14 01:29:49,691][04352] Updated weights for policy 0, policy_version 570 (0.0017) [2024-08-14 01:29:52,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2342912. Throughput: 0: 851.8. Samples: 584246. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:29:52,613][01002] Avg episode reward: [(0, '20.229')] [2024-08-14 01:29:57,611][01002] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2363392. Throughput: 0: 855.9. Samples: 590214. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:29:57,613][01002] Avg episode reward: [(0, '20.656')] [2024-08-14 01:29:57,623][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000577_2363392.pth... [2024-08-14 01:29:57,722][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000375_1536000.pth [2024-08-14 01:30:01,379][04352] Updated weights for policy 0, policy_version 580 (0.0017) [2024-08-14 01:30:02,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2375680. Throughput: 0: 865.4. Samples: 594652. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:30:02,613][01002] Avg episode reward: [(0, '20.289')] [2024-08-14 01:30:07,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2396160. Throughput: 0: 857.3. Samples: 597138. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:30:07,614][01002] Avg episode reward: [(0, '20.684')] [2024-08-14 01:30:12,390][04352] Updated weights for policy 0, policy_version 590 (0.0013) [2024-08-14 01:30:12,610][01002] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 2416640. Throughput: 0: 856.9. Samples: 603148. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:30:12,618][01002] Avg episode reward: [(0, '19.863')] [2024-08-14 01:30:17,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2428928. Throughput: 0: 873.5. Samples: 607770. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:30:17,615][01002] Avg episode reward: [(0, '20.480')] [2024-08-14 01:30:22,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2445312. Throughput: 0: 853.7. Samples: 609916. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:30:22,617][01002] Avg episode reward: [(0, '21.105')] [2024-08-14 01:30:22,621][04339] Saving new best policy, reward=21.105! [2024-08-14 01:30:24,826][04352] Updated weights for policy 0, policy_version 600 (0.0025) [2024-08-14 01:30:27,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2465792. Throughput: 0: 850.2. Samples: 615780. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:30:27,612][01002] Avg episode reward: [(0, '20.015')] [2024-08-14 01:30:32,611][01002] Fps is (10 sec: 3686.2, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2482176. Throughput: 0: 867.2. Samples: 620746. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:30:32,614][01002] Avg episode reward: [(0, '20.201')] [2024-08-14 01:30:37,155][04352] Updated weights for policy 0, policy_version 610 (0.0014) [2024-08-14 01:30:37,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2498560. Throughput: 0: 855.1. Samples: 622724. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:30:37,618][01002] Avg episode reward: [(0, '21.085')] [2024-08-14 01:30:42,610][01002] Fps is (10 sec: 3686.7, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2519040. Throughput: 0: 855.8. Samples: 628726. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:30:42,616][01002] Avg episode reward: [(0, '19.658')] [2024-08-14 01:30:47,614][01002] Fps is (10 sec: 3685.1, 60 sec: 3481.4, 300 sec: 3457.3). Total num frames: 2535424. Throughput: 0: 872.1. Samples: 633900. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:30:47,616][01002] Avg episode reward: [(0, '18.419')] [2024-08-14 01:30:48,900][04352] Updated weights for policy 0, policy_version 620 (0.0013) [2024-08-14 01:30:52,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2551808. Throughput: 0: 859.1. Samples: 635796. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:30:52,618][01002] Avg episode reward: [(0, '18.905')] [2024-08-14 01:30:57,611][01002] Fps is (10 sec: 3277.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2568192. Throughput: 0: 853.2. Samples: 641542. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:30:57,617][01002] Avg episode reward: [(0, '17.852')] [2024-08-14 01:30:59,901][04352] Updated weights for policy 0, policy_version 630 (0.0017) [2024-08-14 01:31:02,614][01002] Fps is (10 sec: 3275.6, 60 sec: 3481.4, 300 sec: 3443.4). Total num frames: 2584576. Throughput: 0: 872.7. Samples: 647046. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:31:02,619][01002] Avg episode reward: [(0, '18.642')] [2024-08-14 01:31:07,610][01002] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2600960. Throughput: 0: 867.5. Samples: 648952. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:31:07,617][01002] Avg episode reward: [(0, '20.375')] [2024-08-14 01:31:12,121][04352] Updated weights for policy 0, policy_version 640 (0.0014) [2024-08-14 01:31:12,611][01002] Fps is (10 sec: 3687.5, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2621440. Throughput: 0: 860.1. Samples: 654484. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:31:12,616][01002] Avg episode reward: [(0, '20.638')] [2024-08-14 01:31:17,612][01002] Fps is (10 sec: 3685.8, 60 sec: 3481.5, 300 sec: 3443.4). Total num frames: 2637824. Throughput: 0: 879.3. Samples: 660314. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:31:17,616][01002] Avg episode reward: [(0, '22.220')] [2024-08-14 01:31:17,633][04339] Saving new best policy, reward=22.220! [2024-08-14 01:31:22,610][01002] Fps is (10 sec: 2867.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2650112. Throughput: 0: 875.8. Samples: 662134. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-14 01:31:22,619][01002] Avg episode reward: [(0, '22.679')] [2024-08-14 01:31:22,660][04339] Saving new best policy, reward=22.679! [2024-08-14 01:31:24,763][04352] Updated weights for policy 0, policy_version 650 (0.0013) [2024-08-14 01:31:27,610][01002] Fps is (10 sec: 3277.3, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2670592. Throughput: 0: 854.0. Samples: 667156. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:31:27,618][01002] Avg episode reward: [(0, '22.901')] [2024-08-14 01:31:27,627][04339] Saving new best policy, reward=22.901! [2024-08-14 01:31:32,610][01002] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2691072. Throughput: 0: 871.8. Samples: 673126. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:31:32,613][01002] Avg episode reward: [(0, '20.945')] [2024-08-14 01:31:36,594][04352] Updated weights for policy 0, policy_version 660 (0.0013) [2024-08-14 01:31:37,614][01002] Fps is (10 sec: 3275.7, 60 sec: 3413.2, 300 sec: 3429.5). Total num frames: 2703360. Throughput: 0: 872.2. Samples: 675046. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-14 01:31:37,616][01002] Avg episode reward: [(0, '20.848')] [2024-08-14 01:31:42,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2723840. Throughput: 0: 854.1. Samples: 679974. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:31:42,613][01002] Avg episode reward: [(0, '18.934')] [2024-08-14 01:31:47,383][04352] Updated weights for policy 0, policy_version 670 (0.0013) [2024-08-14 01:31:47,610][01002] Fps is (10 sec: 4097.3, 60 sec: 3481.8, 300 sec: 3457.3). Total num frames: 2744320. Throughput: 0: 865.4. Samples: 685984. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:31:47,613][01002] Avg episode reward: [(0, '18.758')] [2024-08-14 01:31:52,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2756608. Throughput: 0: 874.4. Samples: 688302. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:31:52,617][01002] Avg episode reward: [(0, '18.602')] [2024-08-14 01:31:57,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2772992. Throughput: 0: 851.3. Samples: 692790. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:31:57,612][01002] Avg episode reward: [(0, '19.356')] [2024-08-14 01:31:57,626][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000677_2772992.pth... [2024-08-14 01:31:57,757][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000475_1945600.pth [2024-08-14 01:31:59,861][04352] Updated weights for policy 0, policy_version 680 (0.0015) [2024-08-14 01:32:02,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.8, 300 sec: 3443.4). Total num frames: 2793472. Throughput: 0: 854.6. Samples: 698770. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:32:02,613][01002] Avg episode reward: [(0, '20.079')] [2024-08-14 01:32:07,611][01002] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2809856. Throughput: 0: 872.3. Samples: 701390. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:32:07,615][01002] Avg episode reward: [(0, '20.047')] [2024-08-14 01:32:12,212][04352] Updated weights for policy 0, policy_version 690 (0.0016) [2024-08-14 01:32:12,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 2826240. Throughput: 0: 856.6. Samples: 705704. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:32:12,616][01002] Avg episode reward: [(0, '19.890')] [2024-08-14 01:32:17,610][01002] Fps is (10 sec: 3686.5, 60 sec: 3481.7, 300 sec: 3443.4). Total num frames: 2846720. Throughput: 0: 857.8. Samples: 711726. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:32:17,613][01002] Avg episode reward: [(0, '20.917')] [2024-08-14 01:32:22,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 2863104. Throughput: 0: 878.1. Samples: 714558. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:32:22,618][01002] Avg episode reward: [(0, '20.675')] [2024-08-14 01:32:23,946][04352] Updated weights for policy 0, policy_version 700 (0.0013) [2024-08-14 01:32:27,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2879488. Throughput: 0: 857.8. Samples: 718574. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:32:27,616][01002] Avg episode reward: [(0, '20.926')] [2024-08-14 01:32:32,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.5). Total num frames: 2899968. Throughput: 0: 859.4. Samples: 724658. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:32:32,612][01002] Avg episode reward: [(0, '21.741')] [2024-08-14 01:32:34,552][04352] Updated weights for policy 0, policy_version 710 (0.0017) [2024-08-14 01:32:37,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3457.3). Total num frames: 2916352. Throughput: 0: 875.6. Samples: 727702. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:32:37,613][01002] Avg episode reward: [(0, '21.889')] [2024-08-14 01:32:42,610][01002] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2928640. Throughput: 0: 864.3. Samples: 731682. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:32:42,615][01002] Avg episode reward: [(0, '22.540')] [2024-08-14 01:32:46,945][04352] Updated weights for policy 0, policy_version 720 (0.0020) [2024-08-14 01:32:47,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.5). Total num frames: 2949120. Throughput: 0: 865.1. Samples: 737698. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:32:47,616][01002] Avg episode reward: [(0, '23.188')] [2024-08-14 01:32:47,625][04339] Saving new best policy, reward=23.188! [2024-08-14 01:32:52,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 2965504. Throughput: 0: 873.1. Samples: 740678. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:32:52,619][01002] Avg episode reward: [(0, '24.361')] [2024-08-14 01:32:52,664][04339] Saving new best policy, reward=24.361! [2024-08-14 01:32:57,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2981888. Throughput: 0: 868.4. Samples: 744782. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-14 01:32:57,619][01002] Avg episode reward: [(0, '23.388')] [2024-08-14 01:32:59,469][04352] Updated weights for policy 0, policy_version 730 (0.0012) [2024-08-14 01:33:02,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3002368. Throughput: 0: 860.3. Samples: 750440. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-14 01:33:02,613][01002] Avg episode reward: [(0, '21.822')] [2024-08-14 01:33:07,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3018752. Throughput: 0: 863.8. Samples: 753428. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:33:07,612][01002] Avg episode reward: [(0, '23.618')] [2024-08-14 01:33:11,077][04352] Updated weights for policy 0, policy_version 740 (0.0015) [2024-08-14 01:33:12,611][01002] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 3031040. Throughput: 0: 873.9. Samples: 757900. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:33:12,618][01002] Avg episode reward: [(0, '23.409')] [2024-08-14 01:33:17,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 3051520. Throughput: 0: 857.9. Samples: 763262. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:33:17,616][01002] Avg episode reward: [(0, '23.336')] [2024-08-14 01:33:21,966][04352] Updated weights for policy 0, policy_version 750 (0.0014) [2024-08-14 01:33:22,610][01002] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3072000. Throughput: 0: 856.5. Samples: 766246. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:33:22,617][01002] Avg episode reward: [(0, '22.157')] [2024-08-14 01:33:27,611][01002] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 3084288. Throughput: 0: 874.5. Samples: 771036. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:33:27,616][01002] Avg episode reward: [(0, '22.111')] [2024-08-14 01:33:32,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 3104768. Throughput: 0: 857.3. Samples: 776276. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:33:32,612][01002] Avg episode reward: [(0, '23.935')] [2024-08-14 01:33:34,297][04352] Updated weights for policy 0, policy_version 760 (0.0013) [2024-08-14 01:33:37,610][01002] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3125248. Throughput: 0: 858.7. Samples: 779318. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:33:37,613][01002] Avg episode reward: [(0, '24.950')] [2024-08-14 01:33:37,629][04339] Saving new best policy, reward=24.950! [2024-08-14 01:33:42,612][01002] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3457.3). Total num frames: 3137536. Throughput: 0: 879.4. Samples: 784356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:33:42,615][01002] Avg episode reward: [(0, '24.497')] [2024-08-14 01:33:46,502][04352] Updated weights for policy 0, policy_version 770 (0.0020) [2024-08-14 01:33:47,611][01002] Fps is (10 sec: 3276.6, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3158016. Throughput: 0: 862.5. Samples: 789254. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:33:47,616][01002] Avg episode reward: [(0, '24.604')] [2024-08-14 01:33:52,610][01002] Fps is (10 sec: 3687.0, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3174400. Throughput: 0: 862.4. Samples: 792238. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:33:52,621][01002] Avg episode reward: [(0, '26.330')] [2024-08-14 01:33:52,624][04339] Saving new best policy, reward=26.330! [2024-08-14 01:33:57,614][01002] Fps is (10 sec: 3275.7, 60 sec: 3481.4, 300 sec: 3457.3). Total num frames: 3190784. Throughput: 0: 877.4. Samples: 797384. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:33:57,616][01002] Avg episode reward: [(0, '27.235')] [2024-08-14 01:33:57,629][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000779_3190784.pth... [2024-08-14 01:33:57,773][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000577_2363392.pth [2024-08-14 01:33:57,789][04339] Saving new best policy, reward=27.235! [2024-08-14 01:33:58,790][04352] Updated weights for policy 0, policy_version 780 (0.0018) [2024-08-14 01:34:02,611][01002] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 3207168. Throughput: 0: 860.7. Samples: 801992. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:34:02,613][01002] Avg episode reward: [(0, '28.589')] [2024-08-14 01:34:02,617][04339] Saving new best policy, reward=28.589! [2024-08-14 01:34:07,610][01002] Fps is (10 sec: 3687.8, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3227648. Throughput: 0: 860.2. Samples: 804954. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:34:07,614][01002] Avg episode reward: [(0, '27.841')] [2024-08-14 01:34:09,216][04352] Updated weights for policy 0, policy_version 790 (0.0018) [2024-08-14 01:34:12,611][01002] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 3244032. Throughput: 0: 875.6. Samples: 810438. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:34:12,613][01002] Avg episode reward: [(0, '26.666')] [2024-08-14 01:34:17,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3260416. Throughput: 0: 861.0. Samples: 815022. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:34:17,613][01002] Avg episode reward: [(0, '27.368')] [2024-08-14 01:34:21,469][04352] Updated weights for policy 0, policy_version 800 (0.0014) [2024-08-14 01:34:22,610][01002] Fps is (10 sec: 3686.6, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3280896. Throughput: 0: 859.8. Samples: 818010. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:34:22,614][01002] Avg episode reward: [(0, '27.110')] [2024-08-14 01:34:27,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 3297280. Throughput: 0: 874.9. Samples: 823726. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:34:27,613][01002] Avg episode reward: [(0, '27.298')] [2024-08-14 01:34:32,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 3309568. Throughput: 0: 859.2. Samples: 827918. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:34:32,612][01002] Avg episode reward: [(0, '25.413')] [2024-08-14 01:34:33,659][04352] Updated weights for policy 0, policy_version 810 (0.0016) [2024-08-14 01:34:37,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 3330048. Throughput: 0: 860.0. Samples: 830936. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:34:37,613][01002] Avg episode reward: [(0, '24.395')] [2024-08-14 01:34:42,611][01002] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 3350528. Throughput: 0: 881.4. Samples: 837044. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:34:42,616][01002] Avg episode reward: [(0, '25.019')] [2024-08-14 01:34:45,351][04352] Updated weights for policy 0, policy_version 820 (0.0027) [2024-08-14 01:34:47,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3457.3). Total num frames: 3362816. Throughput: 0: 865.5. Samples: 840938. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:34:47,617][01002] Avg episode reward: [(0, '24.909')] [2024-08-14 01:34:52,610][01002] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3383296. Throughput: 0: 865.6. Samples: 843908. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:34:52,619][01002] Avg episode reward: [(0, '23.943')] [2024-08-14 01:34:56,209][04352] Updated weights for policy 0, policy_version 830 (0.0013) [2024-08-14 01:34:57,612][01002] Fps is (10 sec: 4095.4, 60 sec: 3550.0, 300 sec: 3485.1). Total num frames: 3403776. Throughput: 0: 876.5. Samples: 849882. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:34:57,616][01002] Avg episode reward: [(0, '24.559')] [2024-08-14 01:35:02,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3416064. Throughput: 0: 865.3. Samples: 853960. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:35:02,613][01002] Avg episode reward: [(0, '24.664')] [2024-08-14 01:35:07,610][01002] Fps is (10 sec: 2867.6, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 3432448. Throughput: 0: 861.6. Samples: 856784. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:35:07,613][01002] Avg episode reward: [(0, '27.434')] [2024-08-14 01:35:08,649][04352] Updated weights for policy 0, policy_version 840 (0.0023) [2024-08-14 01:35:12,610][01002] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 3457024. Throughput: 0: 870.1. Samples: 862880. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:35:12,613][01002] Avg episode reward: [(0, '26.890')] [2024-08-14 01:35:17,614][01002] Fps is (10 sec: 3685.1, 60 sec: 3481.4, 300 sec: 3471.1). Total num frames: 3469312. Throughput: 0: 875.1. Samples: 867302. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-14 01:35:17,616][01002] Avg episode reward: [(0, '26.016')] [2024-08-14 01:35:20,840][04352] Updated weights for policy 0, policy_version 850 (0.0013) [2024-08-14 01:35:22,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3485696. Throughput: 0: 865.6. Samples: 869886. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:35:22,618][01002] Avg episode reward: [(0, '27.537')] [2024-08-14 01:35:27,610][01002] Fps is (10 sec: 3687.6, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3506176. Throughput: 0: 862.5. Samples: 875858. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:35:27,613][01002] Avg episode reward: [(0, '28.506')] [2024-08-14 01:35:32,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3518464. Throughput: 0: 878.7. Samples: 880480. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:35:32,613][01002] Avg episode reward: [(0, '27.959')] [2024-08-14 01:35:32,747][04352] Updated weights for policy 0, policy_version 860 (0.0013) [2024-08-14 01:35:37,611][01002] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3538944. Throughput: 0: 866.3. Samples: 882890. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:35:37,618][01002] Avg episode reward: [(0, '25.837')] [2024-08-14 01:35:42,611][01002] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3559424. Throughput: 0: 869.2. Samples: 888996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-14 01:35:42,620][01002] Avg episode reward: [(0, '24.209')] [2024-08-14 01:35:43,345][04352] Updated weights for policy 0, policy_version 870 (0.0016) [2024-08-14 01:35:47,610][01002] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3571712. Throughput: 0: 885.3. Samples: 893800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-14 01:35:47,620][01002] Avg episode reward: [(0, '25.037')] [2024-08-14 01:35:52,610][01002] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3592192. Throughput: 0: 870.6. Samples: 895962. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-14 01:35:52,621][01002] Avg episode reward: [(0, '24.098')] [2024-08-14 01:35:55,647][04352] Updated weights for policy 0, policy_version 880 (0.0022) [2024-08-14 01:35:57,610][01002] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3485.1). Total num frames: 3612672. Throughput: 0: 866.2. Samples: 901858. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:35:57,612][01002] Avg episode reward: [(0, '23.090')] [2024-08-14 01:35:57,629][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000882_3612672.pth... [2024-08-14 01:35:57,749][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000677_2772992.pth [2024-08-14 01:36:02,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3624960. Throughput: 0: 876.6. Samples: 906744. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:36:02,615][01002] Avg episode reward: [(0, '23.091')] [2024-08-14 01:36:07,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3641344. Throughput: 0: 861.2. Samples: 908640. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:36:07,620][01002] Avg episode reward: [(0, '23.174')] [2024-08-14 01:36:07,938][04352] Updated weights for policy 0, policy_version 890 (0.0026) [2024-08-14 01:36:12,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3661824. Throughput: 0: 861.4. Samples: 914622. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:36:12,617][01002] Avg episode reward: [(0, '23.026')] [2024-08-14 01:36:17,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.8, 300 sec: 3485.1). Total num frames: 3678208. Throughput: 0: 878.6. Samples: 920018. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:36:17,616][01002] Avg episode reward: [(0, '23.617')] [2024-08-14 01:36:19,750][04352] Updated weights for policy 0, policy_version 900 (0.0018) [2024-08-14 01:36:22,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3694592. Throughput: 0: 867.0. Samples: 921906. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:36:22,619][01002] Avg episode reward: [(0, '24.219')] [2024-08-14 01:36:27,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3715072. Throughput: 0: 858.2. Samples: 927616. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:36:27,620][01002] Avg episode reward: [(0, '24.556')] [2024-08-14 01:36:30,590][04352] Updated weights for policy 0, policy_version 910 (0.0020) [2024-08-14 01:36:32,611][01002] Fps is (10 sec: 3686.2, 60 sec: 3549.8, 300 sec: 3485.1). Total num frames: 3731456. Throughput: 0: 874.4. Samples: 933148. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:36:32,618][01002] Avg episode reward: [(0, '23.819')] [2024-08-14 01:36:37,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3743744. Throughput: 0: 869.3. Samples: 935082. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:36:37,613][01002] Avg episode reward: [(0, '23.500')] [2024-08-14 01:36:42,610][01002] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3764224. Throughput: 0: 858.9. Samples: 940508. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:36:42,618][01002] Avg episode reward: [(0, '24.491')] [2024-08-14 01:36:42,858][04352] Updated weights for policy 0, policy_version 920 (0.0013) [2024-08-14 01:36:47,612][01002] Fps is (10 sec: 4095.3, 60 sec: 3549.8, 300 sec: 3485.0). Total num frames: 3784704. Throughput: 0: 879.7. Samples: 946330. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:36:47,619][01002] Avg episode reward: [(0, '23.912')] [2024-08-14 01:36:52,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3796992. Throughput: 0: 879.2. Samples: 948202. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:36:52,618][01002] Avg episode reward: [(0, '23.941')] [2024-08-14 01:36:55,333][04352] Updated weights for policy 0, policy_version 930 (0.0014) [2024-08-14 01:36:57,610][01002] Fps is (10 sec: 3277.4, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3817472. Throughput: 0: 857.9. Samples: 953228. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:36:57,618][01002] Avg episode reward: [(0, '22.676')] [2024-08-14 01:37:02,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3833856. Throughput: 0: 872.1. Samples: 959262. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:37:02,619][01002] Avg episode reward: [(0, '23.737')] [2024-08-14 01:37:07,588][04352] Updated weights for policy 0, policy_version 940 (0.0016) [2024-08-14 01:37:07,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3850240. Throughput: 0: 873.2. Samples: 961202. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:37:07,616][01002] Avg episode reward: [(0, '23.752')] [2024-08-14 01:37:12,611][01002] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3866624. Throughput: 0: 856.4. Samples: 966154. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:37:12,618][01002] Avg episode reward: [(0, '23.832')] [2024-08-14 01:37:17,610][01002] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3887104. Throughput: 0: 866.9. Samples: 972160. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:37:17,613][01002] Avg episode reward: [(0, '23.790')] [2024-08-14 01:37:17,818][04352] Updated weights for policy 0, policy_version 950 (0.0016) [2024-08-14 01:37:22,611][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3899392. Throughput: 0: 874.0. Samples: 974412. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-14 01:37:22,613][01002] Avg episode reward: [(0, '23.244')] [2024-08-14 01:37:27,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3919872. Throughput: 0: 854.8. Samples: 978972. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:37:27,615][01002] Avg episode reward: [(0, '24.416')] [2024-08-14 01:37:30,354][04352] Updated weights for policy 0, policy_version 960 (0.0015) [2024-08-14 01:37:32,610][01002] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3940352. Throughput: 0: 859.8. Samples: 985018. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:37:32,616][01002] Avg episode reward: [(0, '24.496')] [2024-08-14 01:37:37,610][01002] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3952640. Throughput: 0: 873.7. Samples: 987520. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-14 01:37:37,615][01002] Avg episode reward: [(0, '23.770')] [2024-08-14 01:37:42,610][01002] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3969024. Throughput: 0: 857.7. Samples: 991826. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:37:42,618][01002] Avg episode reward: [(0, '24.162')] [2024-08-14 01:37:42,767][04352] Updated weights for policy 0, policy_version 970 (0.0014) [2024-08-14 01:37:47,619][01002] Fps is (10 sec: 3683.4, 60 sec: 3413.0, 300 sec: 3471.1). Total num frames: 3989504. Throughput: 0: 857.0. Samples: 997832. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-14 01:37:47,621][01002] Avg episode reward: [(0, '24.805')] [2024-08-14 01:37:51,661][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-14 01:37:51,663][04339] Stopping Batcher_0... [2024-08-14 01:37:51,674][04339] Loop batcher_evt_loop terminating... [2024-08-14 01:37:51,673][01002] Component Batcher_0 stopped! [2024-08-14 01:37:51,677][01002] Component RolloutWorker_w1 process died already! Don't wait for it. [2024-08-14 01:37:51,683][01002] Component RolloutWorker_w2 process died already! Don't wait for it. [2024-08-14 01:37:51,688][01002] Component RolloutWorker_w3 process died already! Don't wait for it. [2024-08-14 01:37:51,699][01002] Component RolloutWorker_w4 process died already! Don't wait for it. [2024-08-14 01:37:51,733][01002] Component RolloutWorker_w5 stopped! [2024-08-14 01:37:51,736][04358] Stopping RolloutWorker_w5... [2024-08-14 01:37:51,747][04358] Loop rollout_proc5_evt_loop terminating... [2024-08-14 01:37:51,757][01002] Component RolloutWorker_w7 stopped! [2024-08-14 01:37:51,764][04360] Stopping RolloutWorker_w7... [2024-08-14 01:37:51,764][04360] Loop rollout_proc7_evt_loop terminating... [2024-08-14 01:37:51,785][04352] Weights refcount: 2 0 [2024-08-14 01:37:51,790][04352] Stopping InferenceWorker_p0-w0... [2024-08-14 01:37:51,790][04352] Loop inference_proc0-0_evt_loop terminating... [2024-08-14 01:37:51,791][01002] Component InferenceWorker_p0-w0 stopped! [2024-08-14 01:37:51,811][01002] Component RolloutWorker_w6 stopped! [2024-08-14 01:37:51,814][04359] Stopping RolloutWorker_w6... [2024-08-14 01:37:51,814][04359] Loop rollout_proc6_evt_loop terminating... [2024-08-14 01:37:51,826][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000779_3190784.pth [2024-08-14 01:37:51,845][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-14 01:37:51,860][01002] Component RolloutWorker_w0 stopped! [2024-08-14 01:37:51,860][04353] Stopping RolloutWorker_w0... [2024-08-14 01:37:51,876][04353] Loop rollout_proc0_evt_loop terminating... [2024-08-14 01:37:52,065][01002] Component LearnerWorker_p0 stopped! [2024-08-14 01:37:52,069][01002] Waiting for process learner_proc0 to stop... [2024-08-14 01:37:52,075][04339] Stopping LearnerWorker_p0... [2024-08-14 01:37:52,075][04339] Loop learner_proc0_evt_loop terminating... [2024-08-14 01:37:54,160][01002] Waiting for process inference_proc0-0 to join... [2024-08-14 01:37:54,427][01002] Waiting for process rollout_proc0 to join... [2024-08-14 01:37:55,059][01002] Waiting for process rollout_proc1 to join... [2024-08-14 01:37:55,060][01002] Waiting for process rollout_proc2 to join... [2024-08-14 01:37:55,062][01002] Waiting for process rollout_proc3 to join... [2024-08-14 01:37:55,064][01002] Waiting for process rollout_proc4 to join... [2024-08-14 01:37:55,065][01002] Waiting for process rollout_proc5 to join... [2024-08-14 01:37:55,068][01002] Waiting for process rollout_proc6 to join... [2024-08-14 01:37:55,074][01002] Waiting for process rollout_proc7 to join... [2024-08-14 01:37:55,077][01002] Batcher 0 profile tree view: batching: 22.4744, releasing_batches: 0.0225 [2024-08-14 01:37:55,079][01002] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 485.9292 update_model: 8.6423 weight_update: 0.0013 one_step: 0.0179 handle_policy_step: 627.3159 deserialize: 16.4046, stack: 3.6338, obs_to_device_normalize: 135.2892, forward: 322.6475, send_messages: 24.4686 prepare_outputs: 92.3061 to_cpu: 58.9834 [2024-08-14 01:37:55,082][01002] Learner 0 profile tree view: misc: 0.0055, prepare_batch: 14.8862 train: 69.4920 epoch_init: 0.0058, minibatch_init: 0.0129, losses_postprocess: 0.5563, kl_divergence: 0.4851, after_optimizer: 32.2925 calculate_losses: 21.9522 losses_init: 0.0061, forward_head: 1.6063, bptt_initial: 14.4360, tail: 0.9763, advantages_returns: 0.2919, losses: 2.2250 bptt: 2.1097 bptt_forward_core: 2.0188 update: 13.6081 clip: 1.4773 [2024-08-14 01:37:55,084][01002] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4866, enqueue_policy_requests: 179.8481, env_step: 810.1954, overhead: 21.6131, complete_rollouts: 6.9065 save_policy_outputs: 36.0065 split_output_tensors: 12.0047 [2024-08-14 01:37:55,086][01002] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.5497, enqueue_policy_requests: 187.0485, env_step: 801.9753, overhead: 21.6396, complete_rollouts: 6.7246 save_policy_outputs: 35.8057 split_output_tensors: 12.3263 [2024-08-14 01:37:55,087][01002] Loop Runner_EvtLoop terminating... [2024-08-14 01:37:55,089][01002] Runner profile tree view: main_loop: 1193.2924 [2024-08-14 01:37:55,090][01002] Collected {0: 4005888}, FPS: 3357.0 [2024-08-14 01:41:00,386][01002] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-14 01:41:00,387][01002] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-14 01:41:00,390][01002] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-14 01:41:00,392][01002] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-14 01:41:00,396][01002] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-14 01:41:00,397][01002] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-14 01:41:00,399][01002] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-08-14 01:41:00,400][01002] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-14 01:41:00,402][01002] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-08-14 01:41:00,406][01002] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-08-14 01:41:00,407][01002] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-14 01:41:00,408][01002] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-14 01:41:00,409][01002] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-14 01:41:00,410][01002] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-14 01:41:00,412][01002] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-14 01:41:00,428][01002] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-14 01:41:00,432][01002] RunningMeanStd input shape: (3, 72, 128) [2024-08-14 01:41:00,435][01002] RunningMeanStd input shape: (1,) [2024-08-14 01:41:00,454][01002] ConvEncoder: input_channels=3 [2024-08-14 01:41:00,601][01002] Conv encoder output size: 512 [2024-08-14 01:41:00,602][01002] Policy head output size: 512 [2024-08-14 01:41:02,234][01002] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-14 01:41:03,131][01002] Num frames 100... [2024-08-14 01:41:03,267][01002] Num frames 200... [2024-08-14 01:41:03,392][01002] Num frames 300... [2024-08-14 01:41:03,514][01002] Num frames 400... [2024-08-14 01:41:03,641][01002] Num frames 500... [2024-08-14 01:41:03,765][01002] Num frames 600... [2024-08-14 01:41:03,894][01002] Num frames 700... [2024-08-14 01:41:03,996][01002] Avg episode rewards: #0: 15.360, true rewards: #0: 7.360 [2024-08-14 01:41:03,998][01002] Avg episode reward: 15.360, avg true_objective: 7.360 [2024-08-14 01:41:04,076][01002] Num frames 800... [2024-08-14 01:41:04,195][01002] Num frames 900... [2024-08-14 01:41:04,324][01002] Num frames 1000... [2024-08-14 01:41:04,453][01002] Num frames 1100... [2024-08-14 01:41:04,582][01002] Num frames 1200... [2024-08-14 01:41:04,711][01002] Num frames 1300... [2024-08-14 01:41:04,838][01002] Num frames 1400... [2024-08-14 01:41:04,982][01002] Avg episode rewards: #0: 13.860, true rewards: #0: 7.360 [2024-08-14 01:41:04,984][01002] Avg episode reward: 13.860, avg true_objective: 7.360 [2024-08-14 01:41:05,021][01002] Num frames 1500... [2024-08-14 01:41:05,143][01002] Num frames 1600... [2024-08-14 01:41:05,269][01002] Num frames 1700... [2024-08-14 01:41:05,405][01002] Num frames 1800... [2024-08-14 01:41:05,490][01002] Avg episode rewards: #0: 10.747, true rewards: #0: 6.080 [2024-08-14 01:41:05,491][01002] Avg episode reward: 10.747, avg true_objective: 6.080 [2024-08-14 01:41:05,589][01002] Num frames 1900... [2024-08-14 01:41:05,719][01002] Num frames 2000... [2024-08-14 01:41:05,838][01002] Num frames 2100... [2024-08-14 01:41:05,965][01002] Num frames 2200... [2024-08-14 01:41:06,087][01002] Num frames 2300... [2024-08-14 01:41:06,206][01002] Num frames 2400... [2024-08-14 01:41:06,334][01002] Num frames 2500... [2024-08-14 01:41:06,461][01002] Num frames 2600... [2024-08-14 01:41:06,588][01002] Avg episode rewards: #0: 12.140, true rewards: #0: 6.640 [2024-08-14 01:41:06,591][01002] Avg episode reward: 12.140, avg true_objective: 6.640 [2024-08-14 01:41:06,646][01002] Num frames 2700... [2024-08-14 01:41:06,766][01002] Num frames 2800... [2024-08-14 01:41:06,889][01002] Num frames 2900... [2024-08-14 01:41:07,009][01002] Num frames 3000... [2024-08-14 01:41:07,131][01002] Num frames 3100... [2024-08-14 01:41:07,192][01002] Avg episode rewards: #0: 10.808, true rewards: #0: 6.208 [2024-08-14 01:41:07,193][01002] Avg episode reward: 10.808, avg true_objective: 6.208 [2024-08-14 01:41:07,317][01002] Num frames 3200... [2024-08-14 01:41:07,447][01002] Num frames 3300... [2024-08-14 01:41:07,578][01002] Num frames 3400... [2024-08-14 01:41:07,705][01002] Num frames 3500... [2024-08-14 01:41:07,830][01002] Num frames 3600... [2024-08-14 01:41:07,957][01002] Num frames 3700... [2024-08-14 01:41:08,079][01002] Num frames 3800... [2024-08-14 01:41:08,244][01002] Num frames 3900... [2024-08-14 01:41:08,416][01002] Num frames 4000... [2024-08-14 01:41:08,592][01002] Num frames 4100... [2024-08-14 01:41:08,757][01002] Num frames 4200... [2024-08-14 01:41:08,924][01002] Num frames 4300... [2024-08-14 01:41:09,093][01002] Num frames 4400... [2024-08-14 01:41:09,258][01002] Num frames 4500... [2024-08-14 01:41:09,440][01002] Num frames 4600... [2024-08-14 01:41:09,622][01002] Num frames 4700... [2024-08-14 01:41:09,810][01002] Avg episode rewards: #0: 15.793, true rewards: #0: 7.960 [2024-08-14 01:41:09,812][01002] Avg episode reward: 15.793, avg true_objective: 7.960 [2024-08-14 01:41:09,858][01002] Num frames 4800... [2024-08-14 01:41:10,033][01002] Num frames 4900... [2024-08-14 01:41:10,209][01002] Num frames 5000... [2024-08-14 01:41:10,380][01002] Num frames 5100... [2024-08-14 01:41:10,573][01002] Num frames 5200... [2024-08-14 01:41:10,737][01002] Num frames 5300... [2024-08-14 01:41:10,858][01002] Num frames 5400... [2024-08-14 01:41:10,982][01002] Num frames 5500... [2024-08-14 01:41:11,105][01002] Num frames 5600... [2024-08-14 01:41:11,223][01002] Num frames 5700... [2024-08-14 01:41:11,351][01002] Num frames 5800... [2024-08-14 01:41:11,479][01002] Num frames 5900... [2024-08-14 01:41:11,604][01002] Num frames 6000... [2024-08-14 01:41:11,731][01002] Num frames 6100... [2024-08-14 01:41:11,855][01002] Num frames 6200... [2024-08-14 01:41:11,975][01002] Num frames 6300... [2024-08-14 01:41:12,095][01002] Num frames 6400... [2024-08-14 01:41:12,219][01002] Num frames 6500... [2024-08-14 01:41:12,338][01002] Num frames 6600... [2024-08-14 01:41:12,468][01002] Num frames 6700... [2024-08-14 01:41:12,612][01002] Num frames 6800... [2024-08-14 01:41:12,761][01002] Avg episode rewards: #0: 22.248, true rewards: #0: 9.820 [2024-08-14 01:41:12,763][01002] Avg episode reward: 22.248, avg true_objective: 9.820 [2024-08-14 01:41:12,797][01002] Num frames 6900... [2024-08-14 01:41:12,920][01002] Num frames 7000... [2024-08-14 01:41:13,048][01002] Num frames 7100... [2024-08-14 01:41:13,173][01002] Num frames 7200... [2024-08-14 01:41:13,296][01002] Num frames 7300... [2024-08-14 01:41:13,421][01002] Num frames 7400... [2024-08-14 01:41:13,500][01002] Avg episode rewards: #0: 20.397, true rewards: #0: 9.272 [2024-08-14 01:41:13,502][01002] Avg episode reward: 20.397, avg true_objective: 9.272 [2024-08-14 01:41:13,611][01002] Num frames 7500... [2024-08-14 01:41:13,735][01002] Num frames 7600... [2024-08-14 01:41:13,858][01002] Num frames 7700... [2024-08-14 01:41:13,986][01002] Num frames 7800... [2024-08-14 01:41:14,108][01002] Num frames 7900... [2024-08-14 01:41:14,225][01002] Num frames 8000... [2024-08-14 01:41:14,349][01002] Num frames 8100... [2024-08-14 01:41:14,477][01002] Num frames 8200... [2024-08-14 01:41:14,608][01002] Num frames 8300... [2024-08-14 01:41:14,732][01002] Num frames 8400... [2024-08-14 01:41:14,857][01002] Num frames 8500... [2024-08-14 01:41:15,007][01002] Avg episode rewards: #0: 21.197, true rewards: #0: 9.530 [2024-08-14 01:41:15,009][01002] Avg episode reward: 21.197, avg true_objective: 9.530 [2024-08-14 01:41:15,042][01002] Num frames 8600... [2024-08-14 01:41:15,162][01002] Num frames 8700... [2024-08-14 01:41:15,280][01002] Num frames 8800... [2024-08-14 01:41:15,404][01002] Num frames 8900... [2024-08-14 01:41:15,533][01002] Num frames 9000... [2024-08-14 01:41:15,725][01002] Avg episode rewards: #0: 19.889, true rewards: #0: 9.089 [2024-08-14 01:41:15,728][01002] Avg episode reward: 19.889, avg true_objective: 9.089 [2024-08-14 01:42:15,953][01002] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-08-14 01:44:13,729][01002] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-14 01:44:13,731][01002] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-14 01:44:13,733][01002] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-14 01:44:13,735][01002] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-14 01:44:13,737][01002] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-14 01:44:13,741][01002] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-14 01:44:13,745][01002] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-08-14 01:44:13,746][01002] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-14 01:44:13,748][01002] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-08-14 01:44:13,749][01002] Adding new argument 'hf_repository'='Emericzhito/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-08-14 01:44:13,750][01002] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-14 01:44:13,751][01002] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-14 01:44:13,752][01002] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-14 01:44:13,753][01002] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-14 01:44:13,754][01002] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-14 01:44:13,764][01002] RunningMeanStd input shape: (3, 72, 128) [2024-08-14 01:44:13,772][01002] RunningMeanStd input shape: (1,) [2024-08-14 01:44:13,785][01002] ConvEncoder: input_channels=3 [2024-08-14 01:44:13,821][01002] Conv encoder output size: 512 [2024-08-14 01:44:13,823][01002] Policy head output size: 512 [2024-08-14 01:44:13,842][01002] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-14 01:44:14,322][01002] Num frames 100... [2024-08-14 01:44:14,447][01002] Num frames 200... [2024-08-14 01:44:14,586][01002] Num frames 300... [2024-08-14 01:44:14,710][01002] Num frames 400... [2024-08-14 01:44:14,830][01002] Num frames 500... [2024-08-14 01:44:14,948][01002] Num frames 600... [2024-08-14 01:44:15,067][01002] Num frames 700... [2024-08-14 01:44:15,200][01002] Num frames 800... [2024-08-14 01:44:15,341][01002] Avg episode rewards: #0: 19.700, true rewards: #0: 8.700 [2024-08-14 01:44:15,342][01002] Avg episode reward: 19.700, avg true_objective: 8.700 [2024-08-14 01:44:15,383][01002] Num frames 900... [2024-08-14 01:44:15,511][01002] Num frames 1000... [2024-08-14 01:44:15,636][01002] Num frames 1100... [2024-08-14 01:44:15,760][01002] Num frames 1200... [2024-08-14 01:44:15,888][01002] Num frames 1300... [2024-08-14 01:44:16,008][01002] Num frames 1400... [2024-08-14 01:44:16,132][01002] Num frames 1500... [2024-08-14 01:44:16,258][01002] Num frames 1600... [2024-08-14 01:44:16,375][01002] Num frames 1700... [2024-08-14 01:44:16,507][01002] Num frames 1800... [2024-08-14 01:44:16,629][01002] Num frames 1900... [2024-08-14 01:44:16,751][01002] Num frames 2000... [2024-08-14 01:44:16,876][01002] Num frames 2100... [2024-08-14 01:44:16,999][01002] Num frames 2200... [2024-08-14 01:44:17,127][01002] Num frames 2300... [2024-08-14 01:44:17,259][01002] Num frames 2400... [2024-08-14 01:44:17,384][01002] Num frames 2500... [2024-08-14 01:44:17,566][01002] Avg episode rewards: #0: 31.990, true rewards: #0: 12.990 [2024-08-14 01:44:17,568][01002] Avg episode reward: 31.990, avg true_objective: 12.990 [2024-08-14 01:44:17,572][01002] Num frames 2600... [2024-08-14 01:44:17,698][01002] Num frames 2700... [2024-08-14 01:44:17,827][01002] Num frames 2800... [2024-08-14 01:44:17,953][01002] Num frames 2900... [2024-08-14 01:44:18,076][01002] Num frames 3000... [2024-08-14 01:44:18,196][01002] Num frames 3100... [2024-08-14 01:44:18,327][01002] Num frames 3200... [2024-08-14 01:44:18,454][01002] Num frames 3300... [2024-08-14 01:44:18,586][01002] Num frames 3400... [2024-08-14 01:44:18,712][01002] Num frames 3500... [2024-08-14 01:44:18,837][01002] Num frames 3600... [2024-08-14 01:44:18,962][01002] Num frames 3700... [2024-08-14 01:44:19,084][01002] Num frames 3800... [2024-08-14 01:44:19,201][01002] Num frames 3900... [2024-08-14 01:44:19,329][01002] Num frames 4000... [2024-08-14 01:44:19,459][01002] Num frames 4100... [2024-08-14 01:44:19,599][01002] Avg episode rewards: #0: 35.553, true rewards: #0: 13.887 [2024-08-14 01:44:19,601][01002] Avg episode reward: 35.553, avg true_objective: 13.887 [2024-08-14 01:44:19,644][01002] Num frames 4200... [2024-08-14 01:44:19,768][01002] Num frames 4300... [2024-08-14 01:44:19,892][01002] Num frames 4400... [2024-08-14 01:44:20,013][01002] Num frames 4500... [2024-08-14 01:44:20,135][01002] Num frames 4600... [2024-08-14 01:44:20,254][01002] Num frames 4700... [2024-08-14 01:44:20,385][01002] Num frames 4800... [2024-08-14 01:44:20,518][01002] Num frames 4900... [2024-08-14 01:44:20,642][01002] Num frames 5000... [2024-08-14 01:44:20,770][01002] Num frames 5100... [2024-08-14 01:44:20,896][01002] Num frames 5200... [2024-08-14 01:44:21,018][01002] Num frames 5300... [2024-08-14 01:44:21,139][01002] Num frames 5400... [2024-08-14 01:44:21,259][01002] Num frames 5500... [2024-08-14 01:44:21,399][01002] Num frames 5600... [2024-08-14 01:44:21,531][01002] Num frames 5700... [2024-08-14 01:44:21,651][01002] Num frames 5800... [2024-08-14 01:44:21,768][01002] Num frames 5900... [2024-08-14 01:44:21,894][01002] Num frames 6000... [2024-08-14 01:44:22,013][01002] Num frames 6100... [2024-08-14 01:44:22,137][01002] Num frames 6200... [2024-08-14 01:44:22,270][01002] Avg episode rewards: #0: 41.914, true rewards: #0: 15.665 [2024-08-14 01:44:22,272][01002] Avg episode reward: 41.914, avg true_objective: 15.665 [2024-08-14 01:44:22,315][01002] Num frames 6300... [2024-08-14 01:44:22,443][01002] Num frames 6400... [2024-08-14 01:44:22,576][01002] Num frames 6500... [2024-08-14 01:44:22,752][01002] Num frames 6600... [2024-08-14 01:44:22,923][01002] Num frames 6700... [2024-08-14 01:44:23,095][01002] Num frames 6800... [2024-08-14 01:44:23,263][01002] Num frames 6900... [2024-08-14 01:44:23,442][01002] Num frames 7000... [2024-08-14 01:44:23,619][01002] Num frames 7100... [2024-08-14 01:44:23,788][01002] Num frames 7200... [2024-08-14 01:44:23,957][01002] Num frames 7300... [2024-08-14 01:44:24,136][01002] Num frames 7400... [2024-08-14 01:44:24,305][01002] Num frames 7500... [2024-08-14 01:44:24,492][01002] Num frames 7600... [2024-08-14 01:44:24,666][01002] Num frames 7700... [2024-08-14 01:44:24,847][01002] Num frames 7800... [2024-08-14 01:44:25,035][01002] Num frames 7900... [2024-08-14 01:44:25,215][01002] Num frames 8000... [2024-08-14 01:44:25,335][01002] Avg episode rewards: #0: 41.499, true rewards: #0: 16.100 [2024-08-14 01:44:25,336][01002] Avg episode reward: 41.499, avg true_objective: 16.100 [2024-08-14 01:44:25,399][01002] Num frames 8100... [2024-08-14 01:44:25,537][01002] Num frames 8200... [2024-08-14 01:44:25,666][01002] Num frames 8300... [2024-08-14 01:44:25,791][01002] Num frames 8400... [2024-08-14 01:44:25,918][01002] Num frames 8500... [2024-08-14 01:44:26,045][01002] Num frames 8600... [2024-08-14 01:44:26,167][01002] Num frames 8700... [2024-08-14 01:44:26,289][01002] Num frames 8800... [2024-08-14 01:44:26,414][01002] Num frames 8900... [2024-08-14 01:44:26,555][01002] Num frames 9000... [2024-08-14 01:44:26,679][01002] Num frames 9100... [2024-08-14 01:44:26,806][01002] Num frames 9200... [2024-08-14 01:44:26,936][01002] Num frames 9300... [2024-08-14 01:44:27,063][01002] Num frames 9400... [2024-08-14 01:44:27,194][01002] Num frames 9500... [2024-08-14 01:44:27,317][01002] Num frames 9600... [2024-08-14 01:44:27,442][01002] Num frames 9700... [2024-08-14 01:44:27,614][01002] Num frames 9800... [2024-08-14 01:44:27,743][01002] Num frames 9900... [2024-08-14 01:44:27,874][01002] Num frames 10000... [2024-08-14 01:44:28,002][01002] Num frames 10100... [2024-08-14 01:44:28,121][01002] Avg episode rewards: #0: 43.916, true rewards: #0: 16.917 [2024-08-14 01:44:28,123][01002] Avg episode reward: 43.916, avg true_objective: 16.917 [2024-08-14 01:44:28,185][01002] Num frames 10200... [2024-08-14 01:44:28,312][01002] Num frames 10300... [2024-08-14 01:44:28,437][01002] Num frames 10400... [2024-08-14 01:44:28,578][01002] Num frames 10500... [2024-08-14 01:44:28,704][01002] Num frames 10600... [2024-08-14 01:44:28,834][01002] Num frames 10700... [2024-08-14 01:44:28,959][01002] Num frames 10800... [2024-08-14 01:44:29,029][01002] Avg episode rewards: #0: 39.869, true rewards: #0: 15.441 [2024-08-14 01:44:29,031][01002] Avg episode reward: 39.869, avg true_objective: 15.441 [2024-08-14 01:44:29,141][01002] Num frames 10900... [2024-08-14 01:44:29,264][01002] Num frames 11000... [2024-08-14 01:44:29,387][01002] Num frames 11100... [2024-08-14 01:44:29,518][01002] Num frames 11200... [2024-08-14 01:44:29,650][01002] Num frames 11300... [2024-08-14 01:44:29,774][01002] Num frames 11400... [2024-08-14 01:44:29,898][01002] Num frames 11500... [2024-08-14 01:44:30,005][01002] Avg episode rewards: #0: 36.801, true rewards: #0: 14.426 [2024-08-14 01:44:30,007][01002] Avg episode reward: 36.801, avg true_objective: 14.426 [2024-08-14 01:44:30,084][01002] Num frames 11600... [2024-08-14 01:44:30,202][01002] Num frames 11700... [2024-08-14 01:44:30,322][01002] Num frames 11800... [2024-08-14 01:44:30,442][01002] Num frames 11900... [2024-08-14 01:44:30,574][01002] Num frames 12000... [2024-08-14 01:44:30,703][01002] Num frames 12100... [2024-08-14 01:44:30,823][01002] Num frames 12200... [2024-08-14 01:44:30,950][01002] Num frames 12300... [2024-08-14 01:44:31,073][01002] Num frames 12400... [2024-08-14 01:44:31,193][01002] Num frames 12500... [2024-08-14 01:44:31,311][01002] Num frames 12600... [2024-08-14 01:44:31,438][01002] Num frames 12700... [2024-08-14 01:44:31,569][01002] Num frames 12800... [2024-08-14 01:44:31,657][01002] Avg episode rewards: #0: 36.023, true rewards: #0: 14.246 [2024-08-14 01:44:31,659][01002] Avg episode reward: 36.023, avg true_objective: 14.246 [2024-08-14 01:44:31,753][01002] Num frames 12900... [2024-08-14 01:44:31,879][01002] Num frames 13000... [2024-08-14 01:44:31,998][01002] Num frames 13100... [2024-08-14 01:44:32,119][01002] Num frames 13200... [2024-08-14 01:44:32,236][01002] Num frames 13300... [2024-08-14 01:44:32,355][01002] Num frames 13400... [2024-08-14 01:44:32,487][01002] Num frames 13500... [2024-08-14 01:44:32,580][01002] Avg episode rewards: #0: 34.025, true rewards: #0: 13.525 [2024-08-14 01:44:32,581][01002] Avg episode reward: 34.025, avg true_objective: 13.525 [2024-08-14 01:46:01,840][01002] Replay video saved to /content/train_dir/default_experiment/replay.mp4!