pableitorr's picture
Upload folder using huggingface_hub
ca135da verified
[2024-10-12 09:20:01,134][00738] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-10-12 09:20:01,139][00738] Rollout worker 0 uses device cpu
[2024-10-12 09:20:01,140][00738] Rollout worker 1 uses device cpu
[2024-10-12 09:20:01,142][00738] Rollout worker 2 uses device cpu
[2024-10-12 09:20:01,143][00738] Rollout worker 3 uses device cpu
[2024-10-12 09:20:01,144][00738] Rollout worker 4 uses device cpu
[2024-10-12 09:20:01,145][00738] Rollout worker 5 uses device cpu
[2024-10-12 09:20:01,146][00738] Rollout worker 6 uses device cpu
[2024-10-12 09:20:01,147][00738] Rollout worker 7 uses device cpu
[2024-10-12 09:20:01,298][00738] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-10-12 09:20:01,300][00738] InferenceWorker_p0-w0: min num requests: 2
[2024-10-12 09:20:01,337][00738] Starting all processes...
[2024-10-12 09:20:01,339][00738] Starting process learner_proc0
[2024-10-12 09:20:01,994][00738] Starting all processes...
[2024-10-12 09:20:02,004][00738] Starting process inference_proc0-0
[2024-10-12 09:20:02,005][00738] Starting process rollout_proc0
[2024-10-12 09:20:02,006][00738] Starting process rollout_proc1
[2024-10-12 09:20:02,007][00738] Starting process rollout_proc2
[2024-10-12 09:20:02,008][00738] Starting process rollout_proc3
[2024-10-12 09:20:02,008][00738] Starting process rollout_proc4
[2024-10-12 09:20:02,008][00738] Starting process rollout_proc5
[2024-10-12 09:20:02,008][00738] Starting process rollout_proc6
[2024-10-12 09:20:02,008][00738] Starting process rollout_proc7
[2024-10-12 09:20:17,797][03547] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-10-12 09:20:17,798][03547] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-10-12 09:20:17,882][03547] Num visible devices: 1
[2024-10-12 09:20:18,126][03557] Worker 6 uses CPU cores [0]
[2024-10-12 09:20:18,138][03549] Worker 2 uses CPU cores [0]
[2024-10-12 09:20:18,178][03558] Worker 5 uses CPU cores [1]
[2024-10-12 09:20:18,227][03551] Worker 4 uses CPU cores [0]
[2024-10-12 09:20:18,252][03550] Worker 1 uses CPU cores [1]
[2024-10-12 09:20:18,312][03559] Worker 7 uses CPU cores [1]
[2024-10-12 09:20:18,330][03556] Worker 3 uses CPU cores [1]
[2024-10-12 09:20:18,333][03534] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-10-12 09:20:18,334][03534] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-10-12 09:20:18,345][03548] Worker 0 uses CPU cores [0]
[2024-10-12 09:20:18,351][03534] Num visible devices: 1
[2024-10-12 09:20:18,358][03534] Starting seed is not provided
[2024-10-12 09:20:18,358][03534] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-10-12 09:20:18,359][03534] Initializing actor-critic model on device cuda:0
[2024-10-12 09:20:18,359][03534] RunningMeanStd input shape: (3, 72, 128)
[2024-10-12 09:20:18,362][03534] RunningMeanStd input shape: (1,)
[2024-10-12 09:20:18,373][03534] ConvEncoder: input_channels=3
[2024-10-12 09:20:18,607][03534] Conv encoder output size: 512
[2024-10-12 09:20:18,607][03534] Policy head output size: 512
[2024-10-12 09:20:18,665][03534] Created Actor Critic model with architecture:
[2024-10-12 09:20:18,665][03534] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2024-10-12 09:20:18,956][03534] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-10-12 09:20:19,660][03534] No checkpoints found
[2024-10-12 09:20:19,660][03534] Did not load from checkpoint, starting from scratch!
[2024-10-12 09:20:19,661][03534] Initialized policy 0 weights for model version 0
[2024-10-12 09:20:19,665][03534] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-10-12 09:20:19,672][03534] LearnerWorker_p0 finished initialization!
[2024-10-12 09:20:19,760][03547] RunningMeanStd input shape: (3, 72, 128)
[2024-10-12 09:20:19,761][03547] RunningMeanStd input shape: (1,)
[2024-10-12 09:20:19,773][03547] ConvEncoder: input_channels=3
[2024-10-12 09:20:19,873][03547] Conv encoder output size: 512
[2024-10-12 09:20:19,873][03547] Policy head output size: 512
[2024-10-12 09:20:19,923][00738] Inference worker 0-0 is ready!
[2024-10-12 09:20:19,924][00738] All inference workers are ready! Signal rollout workers to start!
[2024-10-12 09:20:20,112][03556] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-12 09:20:20,115][03558] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-12 09:20:20,118][03550] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-12 09:20:20,119][03559] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-12 09:20:20,133][03549] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-12 09:20:20,135][03548] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-12 09:20:20,138][03551] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-12 09:20:20,142][03557] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-12 09:20:20,565][00738] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-10-12 09:20:20,785][03557] Decorrelating experience for 0 frames...
[2024-10-12 09:20:21,118][03558] Decorrelating experience for 0 frames...
[2024-10-12 09:20:21,118][03550] Decorrelating experience for 0 frames...
[2024-10-12 09:20:21,290][00738] Heartbeat connected on Batcher_0
[2024-10-12 09:20:21,295][00738] Heartbeat connected on LearnerWorker_p0
[2024-10-12 09:20:21,333][00738] Heartbeat connected on InferenceWorker_p0-w0
[2024-10-12 09:20:21,738][03548] Decorrelating experience for 0 frames...
[2024-10-12 09:20:21,748][03557] Decorrelating experience for 32 frames...
[2024-10-12 09:20:22,135][03551] Decorrelating experience for 0 frames...
[2024-10-12 09:20:22,216][03550] Decorrelating experience for 32 frames...
[2024-10-12 09:20:22,221][03558] Decorrelating experience for 32 frames...
[2024-10-12 09:20:22,311][03559] Decorrelating experience for 0 frames...
[2024-10-12 09:20:22,838][03550] Decorrelating experience for 64 frames...
[2024-10-12 09:20:23,009][03557] Decorrelating experience for 64 frames...
[2024-10-12 09:20:23,241][03551] Decorrelating experience for 32 frames...
[2024-10-12 09:20:23,242][03550] Decorrelating experience for 96 frames...
[2024-10-12 09:20:23,249][03548] Decorrelating experience for 32 frames...
[2024-10-12 09:20:23,366][00738] Heartbeat connected on RolloutWorker_w1
[2024-10-12 09:20:23,969][03558] Decorrelating experience for 64 frames...
[2024-10-12 09:20:24,642][03557] Decorrelating experience for 96 frames...
[2024-10-12 09:20:24,655][03549] Decorrelating experience for 0 frames...
[2024-10-12 09:20:24,887][00738] Heartbeat connected on RolloutWorker_w6
[2024-10-12 09:20:25,052][03551] Decorrelating experience for 64 frames...
[2024-10-12 09:20:25,076][03548] Decorrelating experience for 64 frames...
[2024-10-12 09:20:25,565][00738] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-10-12 09:20:25,907][03559] Decorrelating experience for 32 frames...
[2024-10-12 09:20:26,750][03549] Decorrelating experience for 32 frames...
[2024-10-12 09:20:27,523][03551] Decorrelating experience for 96 frames...
[2024-10-12 09:20:27,979][00738] Heartbeat connected on RolloutWorker_w4
[2024-10-12 09:20:28,882][03558] Decorrelating experience for 96 frames...
[2024-10-12 09:20:29,078][00738] Heartbeat connected on RolloutWorker_w5
[2024-10-12 09:20:30,389][03559] Decorrelating experience for 64 frames...
[2024-10-12 09:20:30,565][00738] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 127.0. Samples: 1270. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-10-12 09:20:30,569][00738] Avg episode reward: [(0, '3.135')]
[2024-10-12 09:20:32,094][03549] Decorrelating experience for 64 frames...
[2024-10-12 09:20:32,296][03548] Decorrelating experience for 96 frames...
[2024-10-12 09:20:32,768][00738] Heartbeat connected on RolloutWorker_w0
[2024-10-12 09:20:33,178][03534] Signal inference workers to stop experience collection...
[2024-10-12 09:20:33,197][03547] InferenceWorker_p0-w0: stopping experience collection
[2024-10-12 09:20:33,392][03559] Decorrelating experience for 96 frames...
[2024-10-12 09:20:33,607][03549] Decorrelating experience for 96 frames...
[2024-10-12 09:20:33,624][00738] Heartbeat connected on RolloutWorker_w7
[2024-10-12 09:20:33,688][00738] Heartbeat connected on RolloutWorker_w2
[2024-10-12 09:20:35,120][03534] Signal inference workers to resume experience collection...
[2024-10-12 09:20:35,122][03547] InferenceWorker_p0-w0: resuming experience collection
[2024-10-12 09:20:35,565][00738] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 178.1. Samples: 2672. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2024-10-12 09:20:35,568][00738] Avg episode reward: [(0, '3.640')]
[2024-10-12 09:20:40,565][00738] Fps is (10 sec: 2867.2, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 269.2. Samples: 5384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:20:40,568][00738] Avg episode reward: [(0, '3.830')]
[2024-10-12 09:20:44,305][03547] Updated weights for policy 0, policy_version 10 (0.0163)
[2024-10-12 09:20:45,565][00738] Fps is (10 sec: 3686.4, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 40960. Throughput: 0: 418.4. Samples: 10460. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:20:45,572][00738] Avg episode reward: [(0, '4.220')]
[2024-10-12 09:20:50,565][00738] Fps is (10 sec: 3276.8, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 61440. Throughput: 0: 510.8. Samples: 15324. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2024-10-12 09:20:50,568][00738] Avg episode reward: [(0, '4.474')]
[2024-10-12 09:20:54,797][03547] Updated weights for policy 0, policy_version 20 (0.0031)
[2024-10-12 09:20:55,565][00738] Fps is (10 sec: 4096.0, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 538.6. Samples: 18852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:20:55,567][00738] Avg episode reward: [(0, '4.310')]
[2024-10-12 09:21:00,565][00738] Fps is (10 sec: 4096.0, 60 sec: 2560.0, 300 sec: 2560.0). Total num frames: 102400. Throughput: 0: 640.3. Samples: 25612. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:21:00,572][00738] Avg episode reward: [(0, '4.350')]
[2024-10-12 09:21:00,574][03534] Saving new best policy, reward=4.350!
[2024-10-12 09:21:05,565][00738] Fps is (10 sec: 3686.4, 60 sec: 2639.6, 300 sec: 2639.6). Total num frames: 118784. Throughput: 0: 664.8. Samples: 29914. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:21:05,568][00738] Avg episode reward: [(0, '4.368')]
[2024-10-12 09:21:05,574][03534] Saving new best policy, reward=4.368!
[2024-10-12 09:21:06,254][03547] Updated weights for policy 0, policy_version 30 (0.0040)
[2024-10-12 09:21:10,565][00738] Fps is (10 sec: 3686.4, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 139264. Throughput: 0: 738.7. Samples: 33240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:21:10,568][00738] Avg episode reward: [(0, '4.376')]
[2024-10-12 09:21:10,635][03534] Saving new best policy, reward=4.376!
[2024-10-12 09:21:15,060][03547] Updated weights for policy 0, policy_version 40 (0.0031)
[2024-10-12 09:21:15,565][00738] Fps is (10 sec: 4505.6, 60 sec: 2978.9, 300 sec: 2978.9). Total num frames: 163840. Throughput: 0: 865.3. Samples: 40208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:21:15,567][00738] Avg episode reward: [(0, '4.416')]
[2024-10-12 09:21:15,583][03534] Saving new best policy, reward=4.416!
[2024-10-12 09:21:20,567][00738] Fps is (10 sec: 3685.8, 60 sec: 2935.4, 300 sec: 2935.4). Total num frames: 176128. Throughput: 0: 937.7. Samples: 44870. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:21:20,569][00738] Avg episode reward: [(0, '4.452')]
[2024-10-12 09:21:20,616][03534] Saving new best policy, reward=4.452!
[2024-10-12 09:21:25,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3087.8). Total num frames: 200704. Throughput: 0: 937.8. Samples: 47586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:21:25,567][00738] Avg episode reward: [(0, '4.515')]
[2024-10-12 09:21:25,580][03534] Saving new best policy, reward=4.515!
[2024-10-12 09:21:26,453][03547] Updated weights for policy 0, policy_version 50 (0.0019)
[2024-10-12 09:21:30,565][00738] Fps is (10 sec: 4096.6, 60 sec: 3618.1, 300 sec: 3101.3). Total num frames: 217088. Throughput: 0: 966.8. Samples: 53968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:21:30,568][00738] Avg episode reward: [(0, '4.565')]
[2024-10-12 09:21:30,571][03534] Saving new best policy, reward=4.565!
[2024-10-12 09:21:35,566][00738] Fps is (10 sec: 2867.0, 60 sec: 3754.6, 300 sec: 3058.3). Total num frames: 229376. Throughput: 0: 943.5. Samples: 57784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:21:35,570][00738] Avg episode reward: [(0, '4.469')]
[2024-10-12 09:21:40,196][03547] Updated weights for policy 0, policy_version 60 (0.0029)
[2024-10-12 09:21:40,565][00738] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3072.0). Total num frames: 245760. Throughput: 0: 912.0. Samples: 59890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:21:40,567][00738] Avg episode reward: [(0, '4.542')]
[2024-10-12 09:21:45,582][00738] Fps is (10 sec: 3680.6, 60 sec: 3753.6, 300 sec: 3131.6). Total num frames: 266240. Throughput: 0: 900.6. Samples: 66154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:21:45,584][00738] Avg episode reward: [(0, '4.667')]
[2024-10-12 09:21:45,646][03534] Saving new best policy, reward=4.667!
[2024-10-12 09:21:49,268][03547] Updated weights for policy 0, policy_version 70 (0.0025)
[2024-10-12 09:21:50,567][00738] Fps is (10 sec: 4504.9, 60 sec: 3822.8, 300 sec: 3231.2). Total num frames: 290816. Throughput: 0: 953.7. Samples: 72832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:21:50,569][00738] Avg episode reward: [(0, '4.509')]
[2024-10-12 09:21:55,565][00738] Fps is (10 sec: 3692.4, 60 sec: 3686.4, 300 sec: 3190.6). Total num frames: 303104. Throughput: 0: 928.3. Samples: 75014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:21:55,572][00738] Avg episode reward: [(0, '4.401')]
[2024-10-12 09:21:55,581][03534] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth...
[2024-10-12 09:22:00,513][03547] Updated weights for policy 0, policy_version 80 (0.0022)
[2024-10-12 09:22:00,565][00738] Fps is (10 sec: 3686.9, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 327680. Throughput: 0: 897.4. Samples: 80592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:22:00,571][00738] Avg episode reward: [(0, '4.275')]
[2024-10-12 09:22:05,566][00738] Fps is (10 sec: 4505.2, 60 sec: 3822.9, 300 sec: 3315.8). Total num frames: 348160. Throughput: 0: 949.6. Samples: 87602. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:22:05,568][00738] Avg episode reward: [(0, '4.399')]
[2024-10-12 09:22:10,566][00738] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3314.0). Total num frames: 364544. Throughput: 0: 942.8. Samples: 90014. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:22:10,570][00738] Avg episode reward: [(0, '4.474')]
[2024-10-12 09:22:11,634][03547] Updated weights for policy 0, policy_version 90 (0.0033)
[2024-10-12 09:22:15,565][00738] Fps is (10 sec: 3686.8, 60 sec: 3686.4, 300 sec: 3348.0). Total num frames: 385024. Throughput: 0: 913.1. Samples: 95056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:22:15,572][00738] Avg episode reward: [(0, '4.390')]
[2024-10-12 09:22:20,565][00738] Fps is (10 sec: 4096.4, 60 sec: 3823.0, 300 sec: 3379.2). Total num frames: 405504. Throughput: 0: 980.9. Samples: 101926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:22:20,568][00738] Avg episode reward: [(0, '4.360')]
[2024-10-12 09:22:20,705][03547] Updated weights for policy 0, policy_version 100 (0.0024)
[2024-10-12 09:22:25,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3407.9). Total num frames: 425984. Throughput: 0: 1006.4. Samples: 105180. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:22:25,568][00738] Avg episode reward: [(0, '4.309')]
[2024-10-12 09:22:30,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3402.8). Total num frames: 442368. Throughput: 0: 965.1. Samples: 109566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:22:30,568][00738] Avg episode reward: [(0, '4.504')]
[2024-10-12 09:22:32,013][03547] Updated weights for policy 0, policy_version 110 (0.0042)
[2024-10-12 09:22:35,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3458.8). Total num frames: 466944. Throughput: 0: 972.1. Samples: 116574. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:22:35,570][00738] Avg episode reward: [(0, '4.570')]
[2024-10-12 09:22:40,567][00738] Fps is (10 sec: 3685.8, 60 sec: 3891.1, 300 sec: 3423.0). Total num frames: 479232. Throughput: 0: 992.2. Samples: 119666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:22:40,569][00738] Avg episode reward: [(0, '4.490')]
[2024-10-12 09:22:45,048][03547] Updated weights for policy 0, policy_version 120 (0.0020)
[2024-10-12 09:22:45,565][00738] Fps is (10 sec: 2457.6, 60 sec: 3755.7, 300 sec: 3389.8). Total num frames: 491520. Throughput: 0: 931.7. Samples: 122518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:22:45,568][00738] Avg episode reward: [(0, '4.493')]
[2024-10-12 09:22:50,565][00738] Fps is (10 sec: 3687.0, 60 sec: 3754.8, 300 sec: 3440.6). Total num frames: 516096. Throughput: 0: 917.0. Samples: 128868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:22:50,570][00738] Avg episode reward: [(0, '4.415')]
[2024-10-12 09:22:54,044][03547] Updated weights for policy 0, policy_version 130 (0.0020)
[2024-10-12 09:22:55,565][00738] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3461.8). Total num frames: 536576. Throughput: 0: 941.8. Samples: 132394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:22:55,570][00738] Avg episode reward: [(0, '4.582')]
[2024-10-12 09:23:00,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3456.0). Total num frames: 552960. Throughput: 0: 947.8. Samples: 137706. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:23:00,569][00738] Avg episode reward: [(0, '4.503')]
[2024-10-12 09:23:05,470][03547] Updated weights for policy 0, policy_version 140 (0.0025)
[2024-10-12 09:23:05,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3475.4). Total num frames: 573440. Throughput: 0: 919.2. Samples: 143290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:23:05,568][00738] Avg episode reward: [(0, '4.551')]
[2024-10-12 09:23:10,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3493.6). Total num frames: 593920. Throughput: 0: 925.6. Samples: 146834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:23:10,572][00738] Avg episode reward: [(0, '4.565')]
[2024-10-12 09:23:15,505][03547] Updated weights for policy 0, policy_version 150 (0.0019)
[2024-10-12 09:23:15,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3510.9). Total num frames: 614400. Throughput: 0: 966.2. Samples: 153046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:23:15,574][00738] Avg episode reward: [(0, '4.442')]
[2024-10-12 09:23:20,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3504.4). Total num frames: 630784. Throughput: 0: 918.3. Samples: 157898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:23:20,573][00738] Avg episode reward: [(0, '4.510')]
[2024-10-12 09:23:25,473][03547] Updated weights for policy 0, policy_version 160 (0.0039)
[2024-10-12 09:23:25,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3542.5). Total num frames: 655360. Throughput: 0: 926.5. Samples: 161358. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2024-10-12 09:23:25,567][00738] Avg episode reward: [(0, '4.587')]
[2024-10-12 09:23:30,567][00738] Fps is (10 sec: 4095.5, 60 sec: 3822.8, 300 sec: 3535.5). Total num frames: 671744. Throughput: 0: 1014.5. Samples: 168172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:23:30,573][00738] Avg episode reward: [(0, '4.487')]
[2024-10-12 09:23:35,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3528.9). Total num frames: 688128. Throughput: 0: 970.0. Samples: 172520. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:23:35,568][00738] Avg episode reward: [(0, '4.585')]
[2024-10-12 09:23:36,901][03547] Updated weights for policy 0, policy_version 170 (0.0035)
[2024-10-12 09:23:40,565][00738] Fps is (10 sec: 4096.6, 60 sec: 3891.3, 300 sec: 3563.5). Total num frames: 712704. Throughput: 0: 965.7. Samples: 175852. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:23:40,568][00738] Avg episode reward: [(0, '4.793')]
[2024-10-12 09:23:40,570][03534] Saving new best policy, reward=4.793!
[2024-10-12 09:23:45,565][00738] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3576.5). Total num frames: 733184. Throughput: 0: 1003.4. Samples: 182860. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:23:45,569][00738] Avg episode reward: [(0, '4.836')]
[2024-10-12 09:23:45,577][03534] Saving new best policy, reward=4.836!
[2024-10-12 09:23:45,922][03547] Updated weights for policy 0, policy_version 180 (0.0020)
[2024-10-12 09:23:50,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 745472. Throughput: 0: 983.0. Samples: 187524. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-10-12 09:23:50,571][00738] Avg episode reward: [(0, '4.970')]
[2024-10-12 09:23:50,574][03534] Saving new best policy, reward=4.970!
[2024-10-12 09:23:55,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3581.6). Total num frames: 770048. Throughput: 0: 963.2. Samples: 190180. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-10-12 09:23:55,568][00738] Avg episode reward: [(0, '4.940')]
[2024-10-12 09:23:55,579][03534] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth...
[2024-10-12 09:23:57,228][03547] Updated weights for policy 0, policy_version 190 (0.0026)
[2024-10-12 09:24:00,565][00738] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3593.3). Total num frames: 790528. Throughput: 0: 977.2. Samples: 197018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:24:00,569][00738] Avg episode reward: [(0, '4.831')]
[2024-10-12 09:24:05,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3586.3). Total num frames: 806912. Throughput: 0: 980.9. Samples: 202038. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:24:05,568][00738] Avg episode reward: [(0, '4.943')]
[2024-10-12 09:24:10,411][03547] Updated weights for policy 0, policy_version 200 (0.0015)
[2024-10-12 09:24:10,567][00738] Fps is (10 sec: 2866.8, 60 sec: 3754.6, 300 sec: 3561.7). Total num frames: 819200. Throughput: 0: 944.6. Samples: 203868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:24:10,571][00738] Avg episode reward: [(0, '4.893')]
[2024-10-12 09:24:15,565][00738] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3555.7). Total num frames: 835584. Throughput: 0: 890.2. Samples: 208228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-12 09:24:15,572][00738] Avg episode reward: [(0, '4.832')]
[2024-10-12 09:24:20,561][03547] Updated weights for policy 0, policy_version 210 (0.0032)
[2024-10-12 09:24:20,571][00738] Fps is (10 sec: 4094.4, 60 sec: 3822.6, 300 sec: 3583.9). Total num frames: 860160. Throughput: 0: 941.6. Samples: 214898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:24:20,573][00738] Avg episode reward: [(0, '5.013')]
[2024-10-12 09:24:20,575][03534] Saving new best policy, reward=5.013!
[2024-10-12 09:24:25,567][00738] Fps is (10 sec: 3685.9, 60 sec: 3618.1, 300 sec: 3561.0). Total num frames: 872448. Throughput: 0: 913.2. Samples: 216948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-12 09:24:25,569][00738] Avg episode reward: [(0, '5.044')]
[2024-10-12 09:24:25,584][03534] Saving new best policy, reward=5.044!
[2024-10-12 09:24:30,572][00738] Fps is (10 sec: 3276.5, 60 sec: 3686.1, 300 sec: 3571.6). Total num frames: 892928. Throughput: 0: 886.8. Samples: 222774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:24:30,578][00738] Avg episode reward: [(0, '5.119')]
[2024-10-12 09:24:30,635][03534] Saving new best policy, reward=5.119!
[2024-10-12 09:24:31,500][03547] Updated weights for policy 0, policy_version 220 (0.0024)
[2024-10-12 09:24:35,565][00738] Fps is (10 sec: 4506.2, 60 sec: 3822.9, 300 sec: 3598.1). Total num frames: 917504. Throughput: 0: 939.1. Samples: 229782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:24:35,572][00738] Avg episode reward: [(0, '5.137')]
[2024-10-12 09:24:35,582][03534] Saving new best policy, reward=5.137!
[2024-10-12 09:24:40,570][00738] Fps is (10 sec: 4096.8, 60 sec: 3686.1, 300 sec: 3591.8). Total num frames: 933888. Throughput: 0: 932.6. Samples: 232150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:24:40,572][00738] Avg episode reward: [(0, '5.272')]
[2024-10-12 09:24:40,574][03534] Saving new best policy, reward=5.272!
[2024-10-12 09:24:42,880][03547] Updated weights for policy 0, policy_version 230 (0.0042)
[2024-10-12 09:24:45,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3585.9). Total num frames: 950272. Throughput: 0: 891.6. Samples: 237142. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:24:45,569][00738] Avg episode reward: [(0, '5.567')]
[2024-10-12 09:24:45,619][03534] Saving new best policy, reward=5.567!
[2024-10-12 09:24:50,565][00738] Fps is (10 sec: 4097.9, 60 sec: 3822.9, 300 sec: 3610.5). Total num frames: 974848. Throughput: 0: 934.9. Samples: 244108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:24:50,569][00738] Avg episode reward: [(0, '5.874')]
[2024-10-12 09:24:50,571][03534] Saving new best policy, reward=5.874!
[2024-10-12 09:24:51,767][03547] Updated weights for policy 0, policy_version 240 (0.0042)
[2024-10-12 09:24:55,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3604.5). Total num frames: 991232. Throughput: 0: 962.5. Samples: 247180. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:24:55,568][00738] Avg episode reward: [(0, '5.515')]
[2024-10-12 09:25:00,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3598.6). Total num frames: 1007616. Throughput: 0: 963.6. Samples: 251590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:25:00,567][00738] Avg episode reward: [(0, '5.365')]
[2024-10-12 09:25:03,143][03547] Updated weights for policy 0, policy_version 250 (0.0034)
[2024-10-12 09:25:05,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3621.7). Total num frames: 1032192. Throughput: 0: 971.9. Samples: 258630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:25:05,567][00738] Avg episode reward: [(0, '5.705')]
[2024-10-12 09:25:10,569][00738] Fps is (10 sec: 4504.0, 60 sec: 3891.1, 300 sec: 3629.9). Total num frames: 1052672. Throughput: 0: 1005.2. Samples: 262186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:25:10,571][00738] Avg episode reward: [(0, '5.986')]
[2024-10-12 09:25:10,573][03534] Saving new best policy, reward=5.986!
[2024-10-12 09:25:13,966][03547] Updated weights for policy 0, policy_version 260 (0.0041)
[2024-10-12 09:25:15,565][00738] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3623.9). Total num frames: 1069056. Throughput: 0: 975.0. Samples: 266642. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:25:15,568][00738] Avg episode reward: [(0, '6.016')]
[2024-10-12 09:25:15,577][03534] Saving new best policy, reward=6.016!
[2024-10-12 09:25:20,565][00738] Fps is (10 sec: 3687.7, 60 sec: 3823.3, 300 sec: 3693.3). Total num frames: 1089536. Throughput: 0: 960.5. Samples: 273004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:25:20,568][00738] Avg episode reward: [(0, '6.255')]
[2024-10-12 09:25:20,624][03534] Saving new best policy, reward=6.255!
[2024-10-12 09:25:23,269][03547] Updated weights for policy 0, policy_version 270 (0.0020)
[2024-10-12 09:25:25,565][00738] Fps is (10 sec: 4505.7, 60 sec: 4027.8, 300 sec: 3776.7). Total num frames: 1114112. Throughput: 0: 985.3. Samples: 276484. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:25:25,572][00738] Avg episode reward: [(0, '6.650')]
[2024-10-12 09:25:25,581][03534] Saving new best policy, reward=6.650!
[2024-10-12 09:25:30,567][00738] Fps is (10 sec: 3685.8, 60 sec: 3891.5, 300 sec: 3804.4). Total num frames: 1126400. Throughput: 0: 986.1. Samples: 281520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:25:30,569][00738] Avg episode reward: [(0, '6.160')]
[2024-10-12 09:25:34,542][03547] Updated weights for policy 0, policy_version 280 (0.0013)
[2024-10-12 09:25:35,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1150976. Throughput: 0: 966.7. Samples: 287608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:25:35,568][00738] Avg episode reward: [(0, '6.427')]
[2024-10-12 09:25:40,565][00738] Fps is (10 sec: 4506.2, 60 sec: 3959.8, 300 sec: 3832.2). Total num frames: 1171456. Throughput: 0: 975.2. Samples: 291066. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:25:40,568][00738] Avg episode reward: [(0, '6.469')]
[2024-10-12 09:25:45,010][03547] Updated weights for policy 0, policy_version 290 (0.0033)
[2024-10-12 09:25:45,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1187840. Throughput: 0: 1001.8. Samples: 296670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:25:45,572][00738] Avg episode reward: [(0, '6.743')]
[2024-10-12 09:25:45,581][03534] Saving new best policy, reward=6.743!
[2024-10-12 09:25:50,565][00738] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1208320. Throughput: 0: 961.2. Samples: 301884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:25:50,567][00738] Avg episode reward: [(0, '6.663')]
[2024-10-12 09:25:54,932][03547] Updated weights for policy 0, policy_version 300 (0.0026)
[2024-10-12 09:25:55,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1228800. Throughput: 0: 960.7. Samples: 305416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:25:55,568][00738] Avg episode reward: [(0, '7.355')]
[2024-10-12 09:25:55,581][03534] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000300_1228800.pth...
[2024-10-12 09:25:55,697][03534] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth
[2024-10-12 09:25:55,714][03534] Saving new best policy, reward=7.355!
[2024-10-12 09:26:00,565][00738] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1245184. Throughput: 0: 1000.0. Samples: 311644. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:26:00,571][00738] Avg episode reward: [(0, '7.507')]
[2024-10-12 09:26:00,573][03534] Saving new best policy, reward=7.507!
[2024-10-12 09:26:05,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1265664. Throughput: 0: 961.8. Samples: 316286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:26:05,572][00738] Avg episode reward: [(0, '8.241')]
[2024-10-12 09:26:05,581][03534] Saving new best policy, reward=8.241!
[2024-10-12 09:26:06,486][03547] Updated weights for policy 0, policy_version 310 (0.0024)
[2024-10-12 09:26:10,565][00738] Fps is (10 sec: 4096.1, 60 sec: 3891.4, 300 sec: 3804.4). Total num frames: 1286144. Throughput: 0: 959.6. Samples: 319666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:26:10,571][00738] Avg episode reward: [(0, '7.705')]
[2024-10-12 09:26:15,569][00738] Fps is (10 sec: 4094.5, 60 sec: 3959.2, 300 sec: 3832.2). Total num frames: 1306624. Throughput: 0: 1001.0. Samples: 326566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:26:15,575][00738] Avg episode reward: [(0, '7.157')]
[2024-10-12 09:26:15,820][03547] Updated weights for policy 0, policy_version 320 (0.0018)
[2024-10-12 09:26:20,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1323008. Throughput: 0: 962.8. Samples: 330936. Policy #0 lag: (min: 0.0, avg: 0.2, max: 2.0)
[2024-10-12 09:26:20,568][00738] Avg episode reward: [(0, '7.113')]
[2024-10-12 09:26:25,565][00738] Fps is (10 sec: 3687.7, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1343488. Throughput: 0: 956.0. Samples: 334084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:26:25,568][00738] Avg episode reward: [(0, '7.389')]
[2024-10-12 09:26:26,589][03547] Updated weights for policy 0, policy_version 330 (0.0028)
[2024-10-12 09:26:30,565][00738] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3860.0). Total num frames: 1368064. Throughput: 0: 989.9. Samples: 341214. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:26:30,568][00738] Avg episode reward: [(0, '7.906')]
[2024-10-12 09:26:35,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1384448. Throughput: 0: 990.8. Samples: 346468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:26:35,567][00738] Avg episode reward: [(0, '8.148')]
[2024-10-12 09:26:37,890][03547] Updated weights for policy 0, policy_version 340 (0.0017)
[2024-10-12 09:26:40,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.2). Total num frames: 1404928. Throughput: 0: 962.2. Samples: 348714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:26:40,570][00738] Avg episode reward: [(0, '8.279')]
[2024-10-12 09:26:40,573][03534] Saving new best policy, reward=8.279!
[2024-10-12 09:26:45,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1425408. Throughput: 0: 982.3. Samples: 355846. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:26:45,571][00738] Avg episode reward: [(0, '7.473')]
[2024-10-12 09:26:47,956][03547] Updated weights for policy 0, policy_version 350 (0.0027)
[2024-10-12 09:26:50,567][00738] Fps is (10 sec: 3276.3, 60 sec: 3822.8, 300 sec: 3846.1). Total num frames: 1437696. Throughput: 0: 971.7. Samples: 360014. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:26:50,569][00738] Avg episode reward: [(0, '7.682')]
[2024-10-12 09:26:55,572][00738] Fps is (10 sec: 2456.0, 60 sec: 3686.0, 300 sec: 3804.3). Total num frames: 1449984. Throughput: 0: 938.6. Samples: 361908. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:26:55,575][00738] Avg episode reward: [(0, '8.217')]
[2024-10-12 09:27:00,189][03547] Updated weights for policy 0, policy_version 360 (0.0043)
[2024-10-12 09:27:00,565][00738] Fps is (10 sec: 3687.0, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1474560. Throughput: 0: 912.9. Samples: 367642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:27:00,570][00738] Avg episode reward: [(0, '9.662')]
[2024-10-12 09:27:00,573][03534] Saving new best policy, reward=9.662!
[2024-10-12 09:27:05,565][00738] Fps is (10 sec: 4918.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1499136. Throughput: 0: 971.9. Samples: 374670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:27:05,570][00738] Avg episode reward: [(0, '11.005')]
[2024-10-12 09:27:05,578][03534] Saving new best policy, reward=11.005!
[2024-10-12 09:27:10,566][00738] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 1511424. Throughput: 0: 958.6. Samples: 377220. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:27:10,569][00738] Avg episode reward: [(0, '10.968')]
[2024-10-12 09:27:11,010][03547] Updated weights for policy 0, policy_version 370 (0.0021)
[2024-10-12 09:27:15,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3818.3). Total num frames: 1531904. Throughput: 0: 914.9. Samples: 382384. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:27:15,567][00738] Avg episode reward: [(0, '10.440')]
[2024-10-12 09:27:20,256][03547] Updated weights for policy 0, policy_version 380 (0.0059)
[2024-10-12 09:27:20,565][00738] Fps is (10 sec: 4505.8, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1556480. Throughput: 0: 949.8. Samples: 389210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:27:20,568][00738] Avg episode reward: [(0, '10.177')]
[2024-10-12 09:27:25,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1572864. Throughput: 0: 972.1. Samples: 392458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:27:25,568][00738] Avg episode reward: [(0, '11.011')]
[2024-10-12 09:27:25,584][03534] Saving new best policy, reward=11.011!
[2024-10-12 09:27:30,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 1589248. Throughput: 0: 904.5. Samples: 396548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:27:30,567][00738] Avg episode reward: [(0, '11.867')]
[2024-10-12 09:27:30,575][03534] Saving new best policy, reward=11.867!
[2024-10-12 09:27:31,890][03547] Updated weights for policy 0, policy_version 390 (0.0020)
[2024-10-12 09:27:35,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1613824. Throughput: 0: 966.2. Samples: 403490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:27:35,571][00738] Avg episode reward: [(0, '12.204')]
[2024-10-12 09:27:35,582][03534] Saving new best policy, reward=12.204!
[2024-10-12 09:27:40,565][00738] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1634304. Throughput: 0: 997.7. Samples: 406800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:27:40,570][00738] Avg episode reward: [(0, '12.608')]
[2024-10-12 09:27:40,572][03534] Saving new best policy, reward=12.608!
[2024-10-12 09:27:42,072][03547] Updated weights for policy 0, policy_version 400 (0.0024)
[2024-10-12 09:27:45,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1646592. Throughput: 0: 972.7. Samples: 411412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:27:45,568][00738] Avg episode reward: [(0, '12.178')]
[2024-10-12 09:27:50,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 1671168. Throughput: 0: 956.3. Samples: 417702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:27:50,573][00738] Avg episode reward: [(0, '13.371')]
[2024-10-12 09:27:50,575][03534] Saving new best policy, reward=13.371!
[2024-10-12 09:27:52,201][03547] Updated weights for policy 0, policy_version 410 (0.0019)
[2024-10-12 09:27:55,565][00738] Fps is (10 sec: 4505.6, 60 sec: 4028.2, 300 sec: 3860.0). Total num frames: 1691648. Throughput: 0: 976.1. Samples: 421144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:27:55,572][00738] Avg episode reward: [(0, '13.328')]
[2024-10-12 09:27:55,587][03534] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000413_1691648.pth...
[2024-10-12 09:27:55,745][03534] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth
[2024-10-12 09:28:00,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1708032. Throughput: 0: 977.1. Samples: 426354. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:28:00,567][00738] Avg episode reward: [(0, '13.763')]
[2024-10-12 09:28:00,570][03534] Saving new best policy, reward=13.763!
[2024-10-12 09:28:03,966][03547] Updated weights for policy 0, policy_version 420 (0.0015)
[2024-10-12 09:28:05,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1724416. Throughput: 0: 945.5. Samples: 431758. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:28:05,572][00738] Avg episode reward: [(0, '14.280')]
[2024-10-12 09:28:05,579][03534] Saving new best policy, reward=14.280!
[2024-10-12 09:28:10,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1748992. Throughput: 0: 945.7. Samples: 435016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:28:10,568][00738] Avg episode reward: [(0, '14.471')]
[2024-10-12 09:28:10,570][03534] Saving new best policy, reward=14.471!
[2024-10-12 09:28:13,891][03547] Updated weights for policy 0, policy_version 430 (0.0034)
[2024-10-12 09:28:15,568][00738] Fps is (10 sec: 4095.0, 60 sec: 3891.0, 300 sec: 3846.0). Total num frames: 1765376. Throughput: 0: 982.5. Samples: 440764. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:28:15,570][00738] Avg episode reward: [(0, '14.511')]
[2024-10-12 09:28:15,587][03534] Saving new best policy, reward=14.511!
[2024-10-12 09:28:20,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1781760. Throughput: 0: 932.0. Samples: 445430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:28:20,569][00738] Avg episode reward: [(0, '15.393')]
[2024-10-12 09:28:20,575][03534] Saving new best policy, reward=15.393!
[2024-10-12 09:28:24,716][03547] Updated weights for policy 0, policy_version 440 (0.0022)
[2024-10-12 09:28:25,565][00738] Fps is (10 sec: 4097.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1806336. Throughput: 0: 935.7. Samples: 448908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:28:25,569][00738] Avg episode reward: [(0, '15.641')]
[2024-10-12 09:28:25,579][03534] Saving new best policy, reward=15.641!
[2024-10-12 09:28:30,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1822720. Throughput: 0: 980.6. Samples: 455538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:28:30,569][00738] Avg episode reward: [(0, '14.977')]
[2024-10-12 09:28:35,566][00738] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 1839104. Throughput: 0: 940.8. Samples: 460040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:28:35,569][00738] Avg episode reward: [(0, '14.672')]
[2024-10-12 09:28:36,067][03547] Updated weights for policy 0, policy_version 450 (0.0018)
[2024-10-12 09:28:40,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1863680. Throughput: 0: 936.9. Samples: 463304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:28:40,568][00738] Avg episode reward: [(0, '14.356')]
[2024-10-12 09:28:44,889][03547] Updated weights for policy 0, policy_version 460 (0.0039)
[2024-10-12 09:28:45,565][00738] Fps is (10 sec: 4505.8, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1884160. Throughput: 0: 978.4. Samples: 470382. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:28:45,571][00738] Avg episode reward: [(0, '14.714')]
[2024-10-12 09:28:50,569][00738] Fps is (10 sec: 3275.6, 60 sec: 3754.4, 300 sec: 3818.3). Total num frames: 1896448. Throughput: 0: 960.6. Samples: 474990. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:28:50,571][00738] Avg episode reward: [(0, '14.840')]
[2024-10-12 09:28:55,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1921024. Throughput: 0: 951.6. Samples: 477838. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:28:55,568][00738] Avg episode reward: [(0, '15.379')]
[2024-10-12 09:28:56,224][03547] Updated weights for policy 0, policy_version 470 (0.0026)
[2024-10-12 09:29:00,566][00738] Fps is (10 sec: 4916.8, 60 sec: 3959.4, 300 sec: 3860.0). Total num frames: 1945600. Throughput: 0: 982.8. Samples: 484986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:29:00,571][00738] Avg episode reward: [(0, '16.835')]
[2024-10-12 09:29:00,573][03534] Saving new best policy, reward=16.835!
[2024-10-12 09:29:05,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1957888. Throughput: 0: 995.6. Samples: 490230. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:29:05,567][00738] Avg episode reward: [(0, '16.915')]
[2024-10-12 09:29:05,579][03534] Saving new best policy, reward=16.915!
[2024-10-12 09:29:07,672][03547] Updated weights for policy 0, policy_version 480 (0.0033)
[2024-10-12 09:29:10,565][00738] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1978368. Throughput: 0: 965.0. Samples: 492334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:29:10,573][00738] Avg episode reward: [(0, '17.305')]
[2024-10-12 09:29:10,578][03534] Saving new best policy, reward=17.305!
[2024-10-12 09:29:15,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3860.0). Total num frames: 1998848. Throughput: 0: 971.0. Samples: 499234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:29:15,569][00738] Avg episode reward: [(0, '18.185')]
[2024-10-12 09:29:15,579][03534] Saving new best policy, reward=18.185!
[2024-10-12 09:29:16,714][03547] Updated weights for policy 0, policy_version 490 (0.0028)
[2024-10-12 09:29:20,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2019328. Throughput: 0: 997.4. Samples: 504922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:29:20,572][00738] Avg episode reward: [(0, '17.825')]
[2024-10-12 09:29:25,565][00738] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3846.2). Total num frames: 2027520. Throughput: 0: 967.6. Samples: 506848. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:29:25,570][00738] Avg episode reward: [(0, '17.439')]
[2024-10-12 09:29:30,565][00738] Fps is (10 sec: 2457.6, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 2043904. Throughput: 0: 898.0. Samples: 510790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:29:30,568][00738] Avg episode reward: [(0, '16.177')]
[2024-10-12 09:29:30,628][03547] Updated weights for policy 0, policy_version 500 (0.0042)
[2024-10-12 09:29:35,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 2068480. Throughput: 0: 945.3. Samples: 517524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:29:35,572][00738] Avg episode reward: [(0, '16.926')]
[2024-10-12 09:29:40,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 2084864. Throughput: 0: 939.0. Samples: 520094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:29:40,572][00738] Avg episode reward: [(0, '17.185')]
[2024-10-12 09:29:41,708][03547] Updated weights for policy 0, policy_version 510 (0.0039)
[2024-10-12 09:29:45,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 2105344. Throughput: 0: 890.8. Samples: 525074. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:29:45,568][00738] Avg episode reward: [(0, '18.117')]
[2024-10-12 09:29:50,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3823.2, 300 sec: 3846.1). Total num frames: 2125824. Throughput: 0: 928.5. Samples: 532012. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:29:50,568][00738] Avg episode reward: [(0, '18.078')]
[2024-10-12 09:29:50,741][03547] Updated weights for policy 0, policy_version 520 (0.0017)
[2024-10-12 09:29:55,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 2146304. Throughput: 0: 953.5. Samples: 535242. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:29:55,569][00738] Avg episode reward: [(0, '18.115')]
[2024-10-12 09:29:55,588][03534] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000524_2146304.pth...
[2024-10-12 09:29:55,581][00738] Components not started: RolloutWorker_w3, wait_time=600.0 seconds
[2024-10-12 09:29:55,773][03534] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000300_1228800.pth
[2024-10-12 09:30:00,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3832.2). Total num frames: 2162688. Throughput: 0: 894.2. Samples: 539472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:30:00,570][00738] Avg episode reward: [(0, '19.534')]
[2024-10-12 09:30:00,573][03534] Saving new best policy, reward=19.534!
[2024-10-12 09:30:02,301][03547] Updated weights for policy 0, policy_version 530 (0.0017)
[2024-10-12 09:30:05,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2183168. Throughput: 0: 921.1. Samples: 546370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:30:05,570][00738] Avg episode reward: [(0, '19.066')]
[2024-10-12 09:30:10,568][00738] Fps is (10 sec: 4095.0, 60 sec: 3754.5, 300 sec: 3846.0). Total num frames: 2203648. Throughput: 0: 956.8. Samples: 549906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:30:10,570][00738] Avg episode reward: [(0, '18.864')]
[2024-10-12 09:30:12,084][03547] Updated weights for policy 0, policy_version 540 (0.0018)
[2024-10-12 09:30:15,566][00738] Fps is (10 sec: 3686.1, 60 sec: 3686.3, 300 sec: 3832.2). Total num frames: 2220032. Throughput: 0: 976.2. Samples: 554720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:30:15,573][00738] Avg episode reward: [(0, '18.430')]
[2024-10-12 09:30:20,565][00738] Fps is (10 sec: 4096.9, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2244608. Throughput: 0: 964.4. Samples: 560922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:30:20,573][00738] Avg episode reward: [(0, '16.093')]
[2024-10-12 09:30:22,280][03547] Updated weights for policy 0, policy_version 550 (0.0014)
[2024-10-12 09:30:25,569][00738] Fps is (10 sec: 4504.4, 60 sec: 3959.2, 300 sec: 3859.9). Total num frames: 2265088. Throughput: 0: 985.9. Samples: 564462. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:30:25,574][00738] Avg episode reward: [(0, '16.367')]
[2024-10-12 09:30:30,565][00738] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 2281472. Throughput: 0: 996.5. Samples: 569918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:30:30,573][00738] Avg episode reward: [(0, '16.810')]
[2024-10-12 09:30:33,568][03547] Updated weights for policy 0, policy_version 560 (0.0021)
[2024-10-12 09:30:35,565][00738] Fps is (10 sec: 3687.7, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2301952. Throughput: 0: 971.8. Samples: 575742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:30:35,567][00738] Avg episode reward: [(0, '17.381')]
[2024-10-12 09:30:40,565][00738] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 2326528. Throughput: 0: 976.5. Samples: 579184. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:30:40,570][00738] Avg episode reward: [(0, '18.860')]
[2024-10-12 09:30:42,282][03547] Updated weights for policy 0, policy_version 570 (0.0037)
[2024-10-12 09:30:45,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2342912. Throughput: 0: 1016.9. Samples: 585232. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:30:45,567][00738] Avg episode reward: [(0, '17.950')]
[2024-10-12 09:30:50,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2359296. Throughput: 0: 975.4. Samples: 590262. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:30:50,568][00738] Avg episode reward: [(0, '18.507')]
[2024-10-12 09:30:53,514][03547] Updated weights for policy 0, policy_version 580 (0.0031)
[2024-10-12 09:30:55,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2383872. Throughput: 0: 975.0. Samples: 593778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:30:55,572][00738] Avg episode reward: [(0, '18.537')]
[2024-10-12 09:31:00,565][00738] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 2404352. Throughput: 0: 1019.0. Samples: 600576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:31:00,570][00738] Avg episode reward: [(0, '17.141')]
[2024-10-12 09:31:04,628][03547] Updated weights for policy 0, policy_version 590 (0.0022)
[2024-10-12 09:31:05,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2416640. Throughput: 0: 977.1. Samples: 604890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-12 09:31:05,567][00738] Avg episode reward: [(0, '17.830')]
[2024-10-12 09:31:10,566][00738] Fps is (10 sec: 3686.3, 60 sec: 3959.6, 300 sec: 3846.1). Total num frames: 2441216. Throughput: 0: 976.6. Samples: 608404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:31:10,571][00738] Avg episode reward: [(0, '18.596')]
[2024-10-12 09:31:13,438][03547] Updated weights for policy 0, policy_version 600 (0.0018)
[2024-10-12 09:31:15,565][00738] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3860.0). Total num frames: 2461696. Throughput: 0: 1010.3. Samples: 615380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:31:15,568][00738] Avg episode reward: [(0, '19.230')]
[2024-10-12 09:31:20,565][00738] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2478080. Throughput: 0: 986.3. Samples: 620126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:31:20,568][00738] Avg episode reward: [(0, '19.628')]
[2024-10-12 09:31:20,571][03534] Saving new best policy, reward=19.628!
[2024-10-12 09:31:24,872][03547] Updated weights for policy 0, policy_version 610 (0.0033)
[2024-10-12 09:31:25,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3832.2). Total num frames: 2498560. Throughput: 0: 973.1. Samples: 622972. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:31:25,567][00738] Avg episode reward: [(0, '20.561')]
[2024-10-12 09:31:25,576][03534] Saving new best policy, reward=20.561!
[2024-10-12 09:31:30,565][00738] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 2523136. Throughput: 0: 993.6. Samples: 629946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:31:30,568][00738] Avg episode reward: [(0, '21.029')]
[2024-10-12 09:31:30,578][03534] Saving new best policy, reward=21.029!
[2024-10-12 09:31:35,094][03547] Updated weights for policy 0, policy_version 620 (0.0023)
[2024-10-12 09:31:35,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2539520. Throughput: 0: 1002.7. Samples: 635382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:31:35,568][00738] Avg episode reward: [(0, '21.360')]
[2024-10-12 09:31:35,576][03534] Saving new best policy, reward=21.360!
[2024-10-12 09:31:40,570][00738] Fps is (10 sec: 3684.6, 60 sec: 3890.9, 300 sec: 3846.0). Total num frames: 2560000. Throughput: 0: 971.0. Samples: 637478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:31:40,573][00738] Avg episode reward: [(0, '21.026')]
[2024-10-12 09:31:44,933][03547] Updated weights for policy 0, policy_version 630 (0.0038)
[2024-10-12 09:31:45,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 2580480. Throughput: 0: 976.0. Samples: 644494. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:31:45,572][00738] Avg episode reward: [(0, '20.624')]
[2024-10-12 09:31:50,569][00738] Fps is (10 sec: 4096.4, 60 sec: 4027.5, 300 sec: 3901.6). Total num frames: 2600960. Throughput: 0: 1016.1. Samples: 650618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:31:50,572][00738] Avg episode reward: [(0, '20.973')]
[2024-10-12 09:31:55,566][00738] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2617344. Throughput: 0: 983.6. Samples: 652668. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:31:55,568][00738] Avg episode reward: [(0, '20.243')]
[2024-10-12 09:31:55,578][03534] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000639_2617344.pth...
[2024-10-12 09:31:55,728][03534] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000413_1691648.pth
[2024-10-12 09:31:56,365][03547] Updated weights for policy 0, policy_version 640 (0.0022)
[2024-10-12 09:32:00,565][00738] Fps is (10 sec: 3687.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2637824. Throughput: 0: 965.6. Samples: 658834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:32:00,571][00738] Avg episode reward: [(0, '20.148')]
[2024-10-12 09:32:05,565][00738] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2650112. Throughput: 0: 955.0. Samples: 663100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:32:05,567][00738] Avg episode reward: [(0, '19.401')]
[2024-10-12 09:32:09,389][03547] Updated weights for policy 0, policy_version 650 (0.0026)
[2024-10-12 09:32:10,565][00738] Fps is (10 sec: 2457.6, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 2662400. Throughput: 0: 935.7. Samples: 665078. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:32:10,575][00738] Avg episode reward: [(0, '19.297')]
[2024-10-12 09:32:15,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2686976. Throughput: 0: 900.4. Samples: 670464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:32:15,570][00738] Avg episode reward: [(0, '19.752')]
[2024-10-12 09:32:18,846][03547] Updated weights for policy 0, policy_version 660 (0.0027)
[2024-10-12 09:32:20,565][00738] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2707456. Throughput: 0: 935.2. Samples: 677466. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:32:20,569][00738] Avg episode reward: [(0, '18.931')]
[2024-10-12 09:32:25,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2723840. Throughput: 0: 952.7. Samples: 680346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:32:25,575][00738] Avg episode reward: [(0, '18.974')]
[2024-10-12 09:32:30,173][03547] Updated weights for policy 0, policy_version 670 (0.0021)
[2024-10-12 09:32:30,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 2744320. Throughput: 0: 899.3. Samples: 684962. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:32:30,570][00738] Avg episode reward: [(0, '22.178')]
[2024-10-12 09:32:30,574][03534] Saving new best policy, reward=22.178!
[2024-10-12 09:32:35,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2764800. Throughput: 0: 919.9. Samples: 692010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:32:35,570][00738] Avg episode reward: [(0, '22.745')]
[2024-10-12 09:32:35,635][03534] Saving new best policy, reward=22.745!
[2024-10-12 09:32:39,518][03547] Updated weights for policy 0, policy_version 680 (0.0026)
[2024-10-12 09:32:40,565][00738] Fps is (10 sec: 4095.9, 60 sec: 3755.0, 300 sec: 3860.0). Total num frames: 2785280. Throughput: 0: 951.3. Samples: 695478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:32:40,570][00738] Avg episode reward: [(0, '22.972')]
[2024-10-12 09:32:40,577][03534] Saving new best policy, reward=22.972!
[2024-10-12 09:32:45,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 2801664. Throughput: 0: 911.6. Samples: 699858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:32:45,567][00738] Avg episode reward: [(0, '24.581')]
[2024-10-12 09:32:45,579][03534] Saving new best policy, reward=24.581!
[2024-10-12 09:32:50,378][03547] Updated weights for policy 0, policy_version 690 (0.0018)
[2024-10-12 09:32:50,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3754.9, 300 sec: 3846.1). Total num frames: 2826240. Throughput: 0: 964.8. Samples: 706514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:32:50,573][00738] Avg episode reward: [(0, '24.904')]
[2024-10-12 09:32:50,576][03534] Saving new best policy, reward=24.904!
[2024-10-12 09:32:55,565][00738] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2846720. Throughput: 0: 995.0. Samples: 709852. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:32:55,569][00738] Avg episode reward: [(0, '23.078')]
[2024-10-12 09:33:00,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 2859008. Throughput: 0: 987.9. Samples: 714918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:33:00,575][00738] Avg episode reward: [(0, '22.035')]
[2024-10-12 09:33:01,650][03547] Updated weights for policy 0, policy_version 700 (0.0032)
[2024-10-12 09:33:05,565][00738] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2883584. Throughput: 0: 969.7. Samples: 721104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:33:05,572][00738] Avg episode reward: [(0, '21.492')]
[2024-10-12 09:33:10,303][03547] Updated weights for policy 0, policy_version 710 (0.0016)
[2024-10-12 09:33:10,565][00738] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3873.9). Total num frames: 2908160. Throughput: 0: 983.8. Samples: 724618. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:33:10,567][00738] Avg episode reward: [(0, '20.965')]
[2024-10-12 09:33:15,567][00738] Fps is (10 sec: 4095.4, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 2924544. Throughput: 0: 1010.1. Samples: 730420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:33:15,570][00738] Avg episode reward: [(0, '20.493')]
[2024-10-12 09:33:20,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2940928. Throughput: 0: 974.6. Samples: 735868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:33:20,571][00738] Avg episode reward: [(0, '19.475')]
[2024-10-12 09:33:21,681][03547] Updated weights for policy 0, policy_version 720 (0.0026)
[2024-10-12 09:33:25,565][00738] Fps is (10 sec: 4096.7, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 2965504. Throughput: 0: 971.2. Samples: 739182. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:33:25,571][00738] Avg episode reward: [(0, '22.151')]
[2024-10-12 09:33:30,566][00738] Fps is (10 sec: 4095.8, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 2981888. Throughput: 0: 1014.7. Samples: 745522. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:33:30,569][00738] Avg episode reward: [(0, '24.526')]
[2024-10-12 09:33:32,345][03547] Updated weights for policy 0, policy_version 730 (0.0023)
[2024-10-12 09:33:35,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3002368. Throughput: 0: 975.6. Samples: 750418. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:33:35,567][00738] Avg episode reward: [(0, '24.290')]
[2024-10-12 09:33:40,565][00738] Fps is (10 sec: 4096.2, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3022848. Throughput: 0: 979.9. Samples: 753948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:33:40,572][00738] Avg episode reward: [(0, '24.719')]
[2024-10-12 09:33:41,647][03547] Updated weights for policy 0, policy_version 740 (0.0027)
[2024-10-12 09:33:45,565][00738] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.8). Total num frames: 3043328. Throughput: 0: 1020.4. Samples: 760836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:33:45,569][00738] Avg episode reward: [(0, '24.806')]
[2024-10-12 09:33:50,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3059712. Throughput: 0: 981.9. Samples: 765290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:33:50,568][00738] Avg episode reward: [(0, '23.911')]
[2024-10-12 09:33:52,705][03547] Updated weights for policy 0, policy_version 750 (0.0048)
[2024-10-12 09:33:55,567][00738] Fps is (10 sec: 4095.4, 60 sec: 3959.4, 300 sec: 3859.9). Total num frames: 3084288. Throughput: 0: 978.3. Samples: 768644. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:33:55,569][00738] Avg episode reward: [(0, '22.284')]
[2024-10-12 09:33:55,584][03534] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000753_3084288.pth...
[2024-10-12 09:33:55,736][03534] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000524_2146304.pth
[2024-10-12 09:34:00,565][00738] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 3104768. Throughput: 0: 1003.6. Samples: 775580. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:34:00,572][00738] Avg episode reward: [(0, '21.947')]
[2024-10-12 09:34:02,405][03547] Updated weights for policy 0, policy_version 760 (0.0026)
[2024-10-12 09:34:05,565][00738] Fps is (10 sec: 3687.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3121152. Throughput: 0: 991.3. Samples: 780476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:34:05,570][00738] Avg episode reward: [(0, '21.498')]
[2024-10-12 09:34:10,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3141632. Throughput: 0: 979.8. Samples: 783274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:34:10,567][00738] Avg episode reward: [(0, '21.290')]
[2024-10-12 09:34:12,689][03547] Updated weights for policy 0, policy_version 770 (0.0027)
[2024-10-12 09:34:15,565][00738] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3887.7). Total num frames: 3166208. Throughput: 0: 998.7. Samples: 790462. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:34:15,568][00738] Avg episode reward: [(0, '21.810')]
[2024-10-12 09:34:20,567][00738] Fps is (10 sec: 4095.3, 60 sec: 4027.6, 300 sec: 3915.5). Total num frames: 3182592. Throughput: 0: 1010.3. Samples: 795882. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:34:20,571][00738] Avg episode reward: [(0, '23.361')]
[2024-10-12 09:34:24,093][03547] Updated weights for policy 0, policy_version 780 (0.0035)
[2024-10-12 09:34:25,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3198976. Throughput: 0: 978.7. Samples: 797990. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:34:25,568][00738] Avg episode reward: [(0, '23.394')]
[2024-10-12 09:34:30,565][00738] Fps is (10 sec: 4096.7, 60 sec: 4027.8, 300 sec: 3915.5). Total num frames: 3223552. Throughput: 0: 979.5. Samples: 804912. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:34:30,567][00738] Avg episode reward: [(0, '24.503')]
[2024-10-12 09:34:32,769][03547] Updated weights for policy 0, policy_version 790 (0.0029)
[2024-10-12 09:34:35,565][00738] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 3244032. Throughput: 0: 1015.6. Samples: 810992. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:34:35,567][00738] Avg episode reward: [(0, '24.829')]
[2024-10-12 09:34:40,565][00738] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3256320. Throughput: 0: 987.9. Samples: 813096. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:34:40,568][00738] Avg episode reward: [(0, '24.287')]
[2024-10-12 09:34:45,565][00738] Fps is (10 sec: 2457.5, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 3268608. Throughput: 0: 921.3. Samples: 817040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:34:45,573][00738] Avg episode reward: [(0, '24.290')]
[2024-10-12 09:34:46,533][03547] Updated weights for policy 0, policy_version 800 (0.0035)
[2024-10-12 09:34:50,568][00738] Fps is (10 sec: 3685.6, 60 sec: 3891.0, 300 sec: 3887.7). Total num frames: 3293184. Throughput: 0: 957.5. Samples: 823568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:34:50,570][00738] Avg episode reward: [(0, '24.129')]
[2024-10-12 09:34:55,565][00738] Fps is (10 sec: 4096.1, 60 sec: 3754.8, 300 sec: 3887.7). Total num frames: 3309568. Throughput: 0: 947.2. Samples: 825898. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:34:55,569][00738] Avg episode reward: [(0, '24.787')]
[2024-10-12 09:34:57,985][03547] Updated weights for policy 0, policy_version 810 (0.0031)
[2024-10-12 09:35:00,565][00738] Fps is (10 sec: 3687.3, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 3330048. Throughput: 0: 900.8. Samples: 830998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:35:00,568][00738] Avg episode reward: [(0, '23.524')]
[2024-10-12 09:35:05,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3887.8). Total num frames: 3350528. Throughput: 0: 936.7. Samples: 838032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:35:05,568][00738] Avg episode reward: [(0, '24.941')]
[2024-10-12 09:35:05,579][03534] Saving new best policy, reward=24.941!
[2024-10-12 09:35:06,846][03547] Updated weights for policy 0, policy_version 820 (0.0025)
[2024-10-12 09:35:10,569][00738] Fps is (10 sec: 3685.1, 60 sec: 3754.4, 300 sec: 3887.7). Total num frames: 3366912. Throughput: 0: 957.3. Samples: 841072. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2024-10-12 09:35:10,571][00738] Avg episode reward: [(0, '24.900')]
[2024-10-12 09:35:15,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 3387392. Throughput: 0: 904.0. Samples: 845590. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:35:15,568][00738] Avg episode reward: [(0, '23.180')]
[2024-10-12 09:35:18,010][03547] Updated weights for policy 0, policy_version 830 (0.0039)
[2024-10-12 09:35:20,565][00738] Fps is (10 sec: 4097.4, 60 sec: 3754.8, 300 sec: 3873.9). Total num frames: 3407872. Throughput: 0: 926.8. Samples: 852696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-12 09:35:20,567][00738] Avg episode reward: [(0, '21.462')]
[2024-10-12 09:35:25,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3428352. Throughput: 0: 957.0. Samples: 856162. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:35:25,572][00738] Avg episode reward: [(0, '21.881')]
[2024-10-12 09:35:29,007][03547] Updated weights for policy 0, policy_version 840 (0.0017)
[2024-10-12 09:35:30,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 3444736. Throughput: 0: 970.2. Samples: 860700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:35:30,571][00738] Avg episode reward: [(0, '21.274')]
[2024-10-12 09:35:35,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 3469312. Throughput: 0: 972.2. Samples: 867316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:35:35,569][00738] Avg episode reward: [(0, '21.630')]
[2024-10-12 09:35:38,116][03547] Updated weights for policy 0, policy_version 850 (0.0026)
[2024-10-12 09:35:40,567][00738] Fps is (10 sec: 4504.9, 60 sec: 3891.1, 300 sec: 3887.7). Total num frames: 3489792. Throughput: 0: 1000.0. Samples: 870898. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:35:40,569][00738] Avg episode reward: [(0, '21.782')]
[2024-10-12 09:35:45,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3506176. Throughput: 0: 1001.8. Samples: 876080. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:35:45,573][00738] Avg episode reward: [(0, '22.979')]
[2024-10-12 09:35:49,260][03547] Updated weights for policy 0, policy_version 860 (0.0029)
[2024-10-12 09:35:50,565][00738] Fps is (10 sec: 3687.0, 60 sec: 3891.4, 300 sec: 3873.8). Total num frames: 3526656. Throughput: 0: 975.7. Samples: 881940. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:35:50,574][00738] Avg episode reward: [(0, '24.253')]
[2024-10-12 09:35:55,565][00738] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3551232. Throughput: 0: 984.3. Samples: 885362. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:35:55,570][00738] Avg episode reward: [(0, '24.160')]
[2024-10-12 09:35:55,582][03534] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000867_3551232.pth...
[2024-10-12 09:35:55,745][03534] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000639_2617344.pth
[2024-10-12 09:35:59,201][03547] Updated weights for policy 0, policy_version 870 (0.0018)
[2024-10-12 09:36:00,568][00738] Fps is (10 sec: 4095.0, 60 sec: 3959.3, 300 sec: 3901.6). Total num frames: 3567616. Throughput: 0: 1013.2. Samples: 891188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:36:00,575][00738] Avg episode reward: [(0, '24.601')]
[2024-10-12 09:36:05,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3584000. Throughput: 0: 970.4. Samples: 896366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:36:05,567][00738] Avg episode reward: [(0, '24.925')]
[2024-10-12 09:36:09,426][03547] Updated weights for policy 0, policy_version 880 (0.0028)
[2024-10-12 09:36:10,565][00738] Fps is (10 sec: 4097.0, 60 sec: 4028.0, 300 sec: 3887.7). Total num frames: 3608576. Throughput: 0: 969.6. Samples: 899794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:36:10,569][00738] Avg episode reward: [(0, '23.903')]
[2024-10-12 09:36:15,566][00738] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3887.7). Total num frames: 3624960. Throughput: 0: 1012.8. Samples: 906276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-12 09:36:15,572][00738] Avg episode reward: [(0, '23.460')]
[2024-10-12 09:36:20,566][00738] Fps is (10 sec: 3276.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3641344. Throughput: 0: 967.2. Samples: 910840. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:36:20,570][00738] Avg episode reward: [(0, '23.377')]
[2024-10-12 09:36:20,810][03547] Updated weights for policy 0, policy_version 890 (0.0020)
[2024-10-12 09:36:25,565][00738] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3665920. Throughput: 0: 964.6. Samples: 914302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:36:25,568][00738] Avg episode reward: [(0, '23.035')]
[2024-10-12 09:36:29,554][03547] Updated weights for policy 0, policy_version 900 (0.0029)
[2024-10-12 09:36:30,565][00738] Fps is (10 sec: 4505.8, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3686400. Throughput: 0: 1004.5. Samples: 921284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:36:30,572][00738] Avg episode reward: [(0, '22.976')]
[2024-10-12 09:36:35,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 3702784. Throughput: 0: 972.8. Samples: 925714. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-12 09:36:35,568][00738] Avg episode reward: [(0, '21.965')]
[2024-10-12 09:36:40,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3873.8). Total num frames: 3723264. Throughput: 0: 966.8. Samples: 928870. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:36:40,567][00738] Avg episode reward: [(0, '22.098')]
[2024-10-12 09:36:40,805][03547] Updated weights for policy 0, policy_version 910 (0.0031)
[2024-10-12 09:36:45,565][00738] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3887.8). Total num frames: 3747840. Throughput: 0: 992.9. Samples: 935868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:36:45,573][00738] Avg episode reward: [(0, '22.437')]
[2024-10-12 09:36:50,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3760128. Throughput: 0: 994.2. Samples: 941106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:36:50,567][00738] Avg episode reward: [(0, '22.241')]
[2024-10-12 09:36:51,977][03547] Updated weights for policy 0, policy_version 920 (0.0032)
[2024-10-12 09:36:55,565][00738] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3780608. Throughput: 0: 970.8. Samples: 943478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:36:55,568][00738] Avg episode reward: [(0, '22.302')]
[2024-10-12 09:37:00,565][00738] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 3915.5). Total num frames: 3805184. Throughput: 0: 979.6. Samples: 950358. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:37:00,568][00738] Avg episode reward: [(0, '23.434')]
[2024-10-12 09:37:01,083][03547] Updated weights for policy 0, policy_version 930 (0.0025)
[2024-10-12 09:37:05,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3821568. Throughput: 0: 1009.3. Samples: 956256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:37:05,574][00738] Avg episode reward: [(0, '23.921')]
[2024-10-12 09:37:10,565][00738] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3842048. Throughput: 0: 978.8. Samples: 958350. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:37:10,573][00738] Avg episode reward: [(0, '23.827')]
[2024-10-12 09:37:12,178][03547] Updated weights for policy 0, policy_version 940 (0.0023)
[2024-10-12 09:37:15,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3862528. Throughput: 0: 971.5. Samples: 965000. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:37:15,568][00738] Avg episode reward: [(0, '23.284')]
[2024-10-12 09:37:20,566][00738] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 3883008. Throughput: 0: 1018.6. Samples: 971550. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:37:20,573][00738] Avg episode reward: [(0, '23.211')]
[2024-10-12 09:37:22,495][03547] Updated weights for policy 0, policy_version 950 (0.0039)
[2024-10-12 09:37:25,565][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3899392. Throughput: 0: 996.6. Samples: 973716. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:37:25,570][00738] Avg episode reward: [(0, '24.080')]
[2024-10-12 09:37:30,566][00738] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3919872. Throughput: 0: 965.1. Samples: 979298. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:37:30,568][00738] Avg episode reward: [(0, '24.734')]
[2024-10-12 09:37:32,643][03547] Updated weights for policy 0, policy_version 960 (0.0031)
[2024-10-12 09:37:35,565][00738] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3940352. Throughput: 0: 988.0. Samples: 985566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-12 09:37:35,569][00738] Avg episode reward: [(0, '24.571')]
[2024-10-12 09:37:40,565][00738] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3952640. Throughput: 0: 975.1. Samples: 987358. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:37:40,571][00738] Avg episode reward: [(0, '25.133')]
[2024-10-12 09:37:40,576][03534] Saving new best policy, reward=25.133!
[2024-10-12 09:37:45,565][00738] Fps is (10 sec: 2457.6, 60 sec: 3618.1, 300 sec: 3860.0). Total num frames: 3964928. Throughput: 0: 903.7. Samples: 991024. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-12 09:37:45,570][00738] Avg episode reward: [(0, '24.550')]
[2024-10-12 09:37:46,452][03547] Updated weights for policy 0, policy_version 970 (0.0040)
[2024-10-12 09:37:50,565][00738] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3989504. Throughput: 0: 923.6. Samples: 997820. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-12 09:37:50,570][00738] Avg episode reward: [(0, '24.237')]
[2024-10-12 09:37:53,570][03534] Stopping Batcher_0...
[2024-10-12 09:37:53,570][03534] Loop batcher_evt_loop terminating...
[2024-10-12 09:37:53,572][03534] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-10-12 09:37:53,571][00738] Component Batcher_0 stopped!
[2024-10-12 09:37:53,574][00738] Component RolloutWorker_w3 process died already! Don't wait for it.
[2024-10-12 09:37:53,626][03547] Weights refcount: 2 0
[2024-10-12 09:37:53,629][00738] Component InferenceWorker_p0-w0 stopped!
[2024-10-12 09:37:53,635][03547] Stopping InferenceWorker_p0-w0...
[2024-10-12 09:37:53,636][03547] Loop inference_proc0-0_evt_loop terminating...
[2024-10-12 09:37:53,706][03534] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000753_3084288.pth
[2024-10-12 09:37:53,728][03534] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-10-12 09:37:53,950][03534] Stopping LearnerWorker_p0...
[2024-10-12 09:37:53,951][03534] Loop learner_proc0_evt_loop terminating...
[2024-10-12 09:37:53,950][00738] Component LearnerWorker_p0 stopped!
[2024-10-12 09:37:54,040][00738] Component RolloutWorker_w1 stopped!
[2024-10-12 09:37:54,042][03550] Stopping RolloutWorker_w1...
[2024-10-12 09:37:54,052][00738] Component RolloutWorker_w5 stopped!
[2024-10-12 09:37:54,054][03558] Stopping RolloutWorker_w5...
[2024-10-12 09:37:54,065][03558] Loop rollout_proc5_evt_loop terminating...
[2024-10-12 09:37:54,046][03550] Loop rollout_proc1_evt_loop terminating...
[2024-10-12 09:37:54,082][00738] Component RolloutWorker_w7 stopped!
[2024-10-12 09:37:54,082][03559] Stopping RolloutWorker_w7...
[2024-10-12 09:37:54,089][03559] Loop rollout_proc7_evt_loop terminating...
[2024-10-12 09:37:54,236][03557] Stopping RolloutWorker_w6...
[2024-10-12 09:37:54,236][00738] Component RolloutWorker_w6 stopped!
[2024-10-12 09:37:54,237][03557] Loop rollout_proc6_evt_loop terminating...
[2024-10-12 09:37:54,273][03551] Stopping RolloutWorker_w4...
[2024-10-12 09:37:54,273][00738] Component RolloutWorker_w4 stopped!
[2024-10-12 09:37:54,280][03549] Stopping RolloutWorker_w2...
[2024-10-12 09:37:54,280][00738] Component RolloutWorker_w2 stopped!
[2024-10-12 09:37:54,274][03551] Loop rollout_proc4_evt_loop terminating...
[2024-10-12 09:37:54,291][03549] Loop rollout_proc2_evt_loop terminating...
[2024-10-12 09:37:54,323][03548] Stopping RolloutWorker_w0...
[2024-10-12 09:37:54,323][00738] Component RolloutWorker_w0 stopped!
[2024-10-12 09:37:54,328][00738] Waiting for process learner_proc0 to stop...
[2024-10-12 09:37:54,334][03548] Loop rollout_proc0_evt_loop terminating...
[2024-10-12 09:37:55,878][00738] Waiting for process inference_proc0-0 to join...
[2024-10-12 09:37:55,886][00738] Waiting for process rollout_proc0 to join...
[2024-10-12 09:37:58,554][00738] Waiting for process rollout_proc1 to join...
[2024-10-12 09:37:58,557][00738] Waiting for process rollout_proc2 to join...
[2024-10-12 09:37:58,562][00738] Waiting for process rollout_proc3 to join...
[2024-10-12 09:37:58,564][00738] Waiting for process rollout_proc4 to join...
[2024-10-12 09:37:58,567][00738] Waiting for process rollout_proc5 to join...
[2024-10-12 09:37:58,572][00738] Waiting for process rollout_proc6 to join...
[2024-10-12 09:37:58,577][00738] Waiting for process rollout_proc7 to join...
[2024-10-12 09:37:58,579][00738] Batcher 0 profile tree view:
batching: 26.4827, releasing_batches: 0.0269
[2024-10-12 09:37:58,581][00738] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0001
wait_policy_total: 411.1960
update_model: 9.1016
weight_update: 0.0026
one_step: 0.0041
handle_policy_step: 591.2089
deserialize: 14.9245, stack: 3.1316, obs_to_device_normalize: 121.5230, forward: 318.9090, send_messages: 25.0899
prepare_outputs: 78.6140
to_cpu: 45.8851
[2024-10-12 09:37:58,583][00738] Learner 0 profile tree view:
misc: 0.0061, prepare_batch: 14.4363
train: 72.6413
epoch_init: 0.0061, minibatch_init: 0.0086, losses_postprocess: 0.5473, kl_divergence: 0.5781, after_optimizer: 33.4086
calculate_losses: 26.0833
losses_init: 0.0035, forward_head: 1.3224, bptt_initial: 17.7547, tail: 1.1922, advantages_returns: 0.2232, losses: 3.6330
bptt: 1.6604
bptt_forward_core: 1.5803
update: 11.3902
clip: 0.8216
[2024-10-12 09:37:58,586][00738] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.4128, enqueue_policy_requests: 94.1590, env_step: 824.9058, overhead: 13.8286, complete_rollouts: 8.5626
save_policy_outputs: 22.7102
split_output_tensors: 9.1195
[2024-10-12 09:37:58,587][00738] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.3963, enqueue_policy_requests: 121.3798, env_step: 792.0611, overhead: 13.2457, complete_rollouts: 5.6515
save_policy_outputs: 21.0561
split_output_tensors: 8.5264
[2024-10-12 09:37:58,589][00738] Loop Runner_EvtLoop terminating...
[2024-10-12 09:37:58,591][00738] Runner profile tree view:
main_loop: 1077.2539
[2024-10-12 09:37:58,592][00738] Collected {0: 4005888}, FPS: 3718.6
[2024-10-12 09:42:13,964][00738] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-10-12 09:42:13,967][00738] Overriding arg 'num_workers' with value 1 passed from command line
[2024-10-12 09:42:13,968][00738] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-10-12 09:42:13,972][00738] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-10-12 09:42:13,975][00738] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-10-12 09:42:13,977][00738] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-10-12 09:42:13,979][00738] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2024-10-12 09:42:13,982][00738] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-10-12 09:42:13,983][00738] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2024-10-12 09:42:13,984][00738] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2024-10-12 09:42:13,985][00738] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-10-12 09:42:13,986][00738] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-10-12 09:42:13,987][00738] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-10-12 09:42:13,988][00738] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-10-12 09:42:13,989][00738] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-10-12 09:42:14,022][00738] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-12 09:42:14,026][00738] RunningMeanStd input shape: (3, 72, 128)
[2024-10-12 09:42:14,028][00738] RunningMeanStd input shape: (1,)
[2024-10-12 09:42:14,046][00738] ConvEncoder: input_channels=3
[2024-10-12 09:42:14,164][00738] Conv encoder output size: 512
[2024-10-12 09:42:14,166][00738] Policy head output size: 512
[2024-10-12 09:42:14,358][00738] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-10-12 09:42:15,162][00738] Num frames 100...
[2024-10-12 09:42:15,288][00738] Num frames 200...
[2024-10-12 09:42:15,422][00738] Num frames 300...
[2024-10-12 09:42:15,546][00738] Num frames 400...
[2024-10-12 09:42:15,667][00738] Num frames 500...
[2024-10-12 09:42:15,798][00738] Num frames 600...
[2024-10-12 09:42:15,920][00738] Num frames 700...
[2024-10-12 09:42:16,049][00738] Num frames 800...
[2024-10-12 09:42:16,102][00738] Avg episode rewards: #0: 15.000, true rewards: #0: 8.000
[2024-10-12 09:42:16,103][00738] Avg episode reward: 15.000, avg true_objective: 8.000
[2024-10-12 09:42:16,234][00738] Num frames 900...
[2024-10-12 09:42:16,370][00738] Num frames 1000...
[2024-10-12 09:42:16,492][00738] Num frames 1100...
[2024-10-12 09:42:16,616][00738] Num frames 1200...
[2024-10-12 09:42:16,738][00738] Num frames 1300...
[2024-10-12 09:42:16,859][00738] Num frames 1400...
[2024-10-12 09:42:16,986][00738] Num frames 1500...
[2024-10-12 09:42:17,110][00738] Num frames 1600...
[2024-10-12 09:42:17,243][00738] Num frames 1700...
[2024-10-12 09:42:17,378][00738] Num frames 1800...
[2024-10-12 09:42:17,505][00738] Num frames 1900...
[2024-10-12 09:42:17,626][00738] Num frames 2000...
[2024-10-12 09:42:17,755][00738] Num frames 2100...
[2024-10-12 09:42:17,881][00738] Num frames 2200...
[2024-10-12 09:42:18,004][00738] Num frames 2300...
[2024-10-12 09:42:18,137][00738] Num frames 2400...
[2024-10-12 09:42:18,262][00738] Num frames 2500...
[2024-10-12 09:42:18,377][00738] Avg episode rewards: #0: 28.725, true rewards: #0: 12.725
[2024-10-12 09:42:18,381][00738] Avg episode reward: 28.725, avg true_objective: 12.725
[2024-10-12 09:42:18,457][00738] Num frames 2600...
[2024-10-12 09:42:18,584][00738] Num frames 2700...
[2024-10-12 09:42:18,707][00738] Num frames 2800...
[2024-10-12 09:42:18,831][00738] Num frames 2900...
[2024-10-12 09:42:18,958][00738] Num frames 3000...
[2024-10-12 09:42:19,084][00738] Num frames 3100...
[2024-10-12 09:42:19,213][00738] Num frames 3200...
[2024-10-12 09:42:19,379][00738] Num frames 3300...
[2024-10-12 09:42:19,564][00738] Num frames 3400...
[2024-10-12 09:42:19,737][00738] Num frames 3500...
[2024-10-12 09:42:19,901][00738] Num frames 3600...
[2024-10-12 09:42:20,068][00738] Num frames 3700...
[2024-10-12 09:42:20,251][00738] Num frames 3800...
[2024-10-12 09:42:20,420][00738] Num frames 3900...
[2024-10-12 09:42:20,583][00738] Avg episode rewards: #0: 30.510, true rewards: #0: 13.177
[2024-10-12 09:42:20,586][00738] Avg episode reward: 30.510, avg true_objective: 13.177
[2024-10-12 09:42:20,670][00738] Num frames 4000...
[2024-10-12 09:42:20,842][00738] Num frames 4100...
[2024-10-12 09:42:21,018][00738] Num frames 4200...
[2024-10-12 09:42:21,206][00738] Num frames 4300...
[2024-10-12 09:42:21,380][00738] Num frames 4400...
[2024-10-12 09:42:21,569][00738] Num frames 4500...
[2024-10-12 09:42:21,730][00738] Num frames 4600...
[2024-10-12 09:42:21,857][00738] Num frames 4700...
[2024-10-12 09:42:21,976][00738] Num frames 4800...
[2024-10-12 09:42:22,102][00738] Num frames 4900...
[2024-10-12 09:42:22,234][00738] Num frames 5000...
[2024-10-12 09:42:22,360][00738] Num frames 5100...
[2024-10-12 09:42:22,482][00738] Num frames 5200...
[2024-10-12 09:42:22,616][00738] Num frames 5300...
[2024-10-12 09:42:22,741][00738] Num frames 5400...
[2024-10-12 09:42:22,869][00738] Num frames 5500...
[2024-10-12 09:42:22,992][00738] Num frames 5600...
[2024-10-12 09:42:23,123][00738] Num frames 5700...
[2024-10-12 09:42:23,251][00738] Num frames 5800...
[2024-10-12 09:42:23,375][00738] Num frames 5900...
[2024-10-12 09:42:23,500][00738] Num frames 6000...
[2024-10-12 09:42:23,621][00738] Avg episode rewards: #0: 37.132, true rewards: #0: 15.133
[2024-10-12 09:42:23,623][00738] Avg episode reward: 37.132, avg true_objective: 15.133
[2024-10-12 09:42:23,682][00738] Num frames 6100...
[2024-10-12 09:42:23,803][00738] Num frames 6200...
[2024-10-12 09:42:23,926][00738] Num frames 6300...
[2024-10-12 09:42:24,050][00738] Num frames 6400...
[2024-10-12 09:42:24,180][00738] Num frames 6500...
[2024-10-12 09:42:24,303][00738] Num frames 6600...
[2024-10-12 09:42:24,423][00738] Num frames 6700...
[2024-10-12 09:42:24,544][00738] Num frames 6800...
[2024-10-12 09:42:24,684][00738] Num frames 6900...
[2024-10-12 09:42:24,807][00738] Num frames 7000...
[2024-10-12 09:42:24,972][00738] Avg episode rewards: #0: 35.378, true rewards: #0: 14.178
[2024-10-12 09:42:24,974][00738] Avg episode reward: 35.378, avg true_objective: 14.178
[2024-10-12 09:42:24,993][00738] Num frames 7100...
[2024-10-12 09:42:25,115][00738] Num frames 7200...
[2024-10-12 09:42:25,249][00738] Num frames 7300...
[2024-10-12 09:42:25,372][00738] Num frames 7400...
[2024-10-12 09:42:25,497][00738] Num frames 7500...
[2024-10-12 09:42:25,620][00738] Num frames 7600...
[2024-10-12 09:42:25,753][00738] Num frames 7700...
[2024-10-12 09:42:25,873][00738] Num frames 7800...
[2024-10-12 09:42:25,995][00738] Num frames 7900...
[2024-10-12 09:42:26,120][00738] Num frames 8000...
[2024-10-12 09:42:26,249][00738] Num frames 8100...
[2024-10-12 09:42:26,371][00738] Num frames 8200...
[2024-10-12 09:42:26,440][00738] Avg episode rewards: #0: 33.515, true rewards: #0: 13.682
[2024-10-12 09:42:26,442][00738] Avg episode reward: 33.515, avg true_objective: 13.682
[2024-10-12 09:42:26,556][00738] Num frames 8300...
[2024-10-12 09:42:26,686][00738] Num frames 8400...
[2024-10-12 09:42:26,808][00738] Num frames 8500...
[2024-10-12 09:42:26,932][00738] Num frames 8600...
[2024-10-12 09:42:27,055][00738] Num frames 8700...
[2024-10-12 09:42:27,190][00738] Num frames 8800...
[2024-10-12 09:42:27,269][00738] Avg episode rewards: #0: 30.024, true rewards: #0: 12.596
[2024-10-12 09:42:27,270][00738] Avg episode reward: 30.024, avg true_objective: 12.596
[2024-10-12 09:42:27,372][00738] Num frames 8900...
[2024-10-12 09:42:27,497][00738] Num frames 9000...
[2024-10-12 09:42:27,618][00738] Num frames 9100...
[2024-10-12 09:42:27,748][00738] Num frames 9200...
[2024-10-12 09:42:27,873][00738] Num frames 9300...
[2024-10-12 09:42:27,994][00738] Num frames 9400...
[2024-10-12 09:42:28,125][00738] Num frames 9500...
[2024-10-12 09:42:28,258][00738] Num frames 9600...
[2024-10-12 09:42:28,380][00738] Num frames 9700...
[2024-10-12 09:42:28,504][00738] Num frames 9800...
[2024-10-12 09:42:28,625][00738] Num frames 9900...
[2024-10-12 09:42:28,688][00738] Avg episode rewards: #0: 29.256, true rewards: #0: 12.381
[2024-10-12 09:42:28,692][00738] Avg episode reward: 29.256, avg true_objective: 12.381
[2024-10-12 09:42:28,815][00738] Num frames 10000...
[2024-10-12 09:42:28,937][00738] Num frames 10100...
[2024-10-12 09:42:29,064][00738] Num frames 10200...
[2024-10-12 09:42:29,193][00738] Num frames 10300...
[2024-10-12 09:42:29,316][00738] Avg episode rewards: #0: 26.948, true rewards: #0: 11.503
[2024-10-12 09:42:29,317][00738] Avg episode reward: 26.948, avg true_objective: 11.503
[2024-10-12 09:42:29,376][00738] Num frames 10400...
[2024-10-12 09:42:29,496][00738] Num frames 10500...
[2024-10-12 09:42:29,621][00738] Num frames 10600...
[2024-10-12 09:42:29,749][00738] Num frames 10700...
[2024-10-12 09:42:29,875][00738] Num frames 10800...
[2024-10-12 09:42:30,001][00738] Num frames 10900...
[2024-10-12 09:42:30,138][00738] Num frames 11000...
[2024-10-12 09:42:30,262][00738] Num frames 11100...
[2024-10-12 09:42:30,387][00738] Num frames 11200...
[2024-10-12 09:42:30,513][00738] Num frames 11300...
[2024-10-12 09:42:30,637][00738] Num frames 11400...
[2024-10-12 09:42:30,774][00738] Num frames 11500...
[2024-10-12 09:42:30,898][00738] Num frames 11600...
[2024-10-12 09:42:31,020][00738] Num frames 11700...
[2024-10-12 09:42:31,157][00738] Num frames 11800...
[2024-10-12 09:42:31,281][00738] Num frames 11900...
[2024-10-12 09:42:31,444][00738] Avg episode rewards: #0: 28.385, true rewards: #0: 11.985
[2024-10-12 09:42:31,446][00738] Avg episode reward: 28.385, avg true_objective: 11.985
[2024-10-12 09:43:43,092][00738] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-10-12 09:45:55,511][00738] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-10-12 09:45:55,513][00738] Overriding arg 'num_workers' with value 1 passed from command line
[2024-10-12 09:45:55,516][00738] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-10-12 09:45:55,517][00738] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-10-12 09:45:55,519][00738] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-10-12 09:45:55,521][00738] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-10-12 09:45:55,522][00738] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-10-12 09:45:55,523][00738] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-10-12 09:45:55,524][00738] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-10-12 09:45:55,525][00738] Adding new argument 'hf_repository'='pableitorr/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-10-12 09:45:55,526][00738] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-10-12 09:45:55,527][00738] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-10-12 09:45:55,528][00738] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-10-12 09:45:55,529][00738] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-10-12 09:45:55,530][00738] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-10-12 09:45:55,562][00738] RunningMeanStd input shape: (3, 72, 128)
[2024-10-12 09:45:55,563][00738] RunningMeanStd input shape: (1,)
[2024-10-12 09:45:55,576][00738] ConvEncoder: input_channels=3
[2024-10-12 09:45:55,611][00738] Conv encoder output size: 512
[2024-10-12 09:45:55,613][00738] Policy head output size: 512
[2024-10-12 09:45:55,631][00738] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-10-12 09:45:56,100][00738] Num frames 100...
[2024-10-12 09:45:56,247][00738] Num frames 200...
[2024-10-12 09:45:56,371][00738] Num frames 300...
[2024-10-12 09:45:56,491][00738] Num frames 400...
[2024-10-12 09:45:56,614][00738] Num frames 500...
[2024-10-12 09:45:56,735][00738] Num frames 600...
[2024-10-12 09:45:56,855][00738] Num frames 700...
[2024-10-12 09:45:56,959][00738] Avg episode rewards: #0: 15.370, true rewards: #0: 7.370
[2024-10-12 09:45:56,961][00738] Avg episode reward: 15.370, avg true_objective: 7.370
[2024-10-12 09:45:57,039][00738] Num frames 800...
[2024-10-12 09:45:57,168][00738] Num frames 900...
[2024-10-12 09:45:57,289][00738] Num frames 1000...
[2024-10-12 09:45:57,414][00738] Num frames 1100...
[2024-10-12 09:45:57,576][00738] Num frames 1200...
[2024-10-12 09:45:57,742][00738] Num frames 1300...
[2024-10-12 09:45:57,905][00738] Num frames 1400...
[2024-10-12 09:45:58,083][00738] Num frames 1500...
[2024-10-12 09:45:58,268][00738] Num frames 1600...
[2024-10-12 09:45:58,434][00738] Num frames 1700...
[2024-10-12 09:45:58,595][00738] Num frames 1800...
[2024-10-12 09:45:58,803][00738] Avg episode rewards: #0: 22.445, true rewards: #0: 9.445
[2024-10-12 09:45:58,805][00738] Avg episode reward: 22.445, avg true_objective: 9.445
[2024-10-12 09:45:58,831][00738] Num frames 1900...
[2024-10-12 09:45:59,006][00738] Num frames 2000...
[2024-10-12 09:45:59,187][00738] Num frames 2100...
[2024-10-12 09:45:59,373][00738] Num frames 2200...
[2024-10-12 09:45:59,550][00738] Num frames 2300...
[2024-10-12 09:45:59,725][00738] Num frames 2400...
[2024-10-12 09:45:59,918][00738] Avg episode rewards: #0: 18.263, true rewards: #0: 8.263
[2024-10-12 09:45:59,920][00738] Avg episode reward: 18.263, avg true_objective: 8.263
[2024-10-12 09:45:59,960][00738] Num frames 2500...
[2024-10-12 09:46:00,099][00738] Num frames 2600...
[2024-10-12 09:46:00,231][00738] Num frames 2700...
[2024-10-12 09:46:00,352][00738] Num frames 2800...
[2024-10-12 09:46:00,476][00738] Num frames 2900...
[2024-10-12 09:46:00,597][00738] Num frames 3000...
[2024-10-12 09:46:00,721][00738] Num frames 3100...
[2024-10-12 09:46:00,844][00738] Num frames 3200...
[2024-10-12 09:46:00,963][00738] Num frames 3300...
[2024-10-12 09:46:01,102][00738] Num frames 3400...
[2024-10-12 09:46:01,233][00738] Num frames 3500...
[2024-10-12 09:46:01,356][00738] Num frames 3600...
[2024-10-12 09:46:01,478][00738] Num frames 3700...
[2024-10-12 09:46:01,600][00738] Num frames 3800...
[2024-10-12 09:46:01,685][00738] Avg episode rewards: #0: 21.558, true rewards: #0: 9.557
[2024-10-12 09:46:01,686][00738] Avg episode reward: 21.558, avg true_objective: 9.557
[2024-10-12 09:46:01,780][00738] Num frames 3900...
[2024-10-12 09:46:01,905][00738] Num frames 4000...
[2024-10-12 09:46:02,027][00738] Num frames 4100...
[2024-10-12 09:46:02,171][00738] Num frames 4200...
[2024-10-12 09:46:02,293][00738] Num frames 4300...
[2024-10-12 09:46:02,414][00738] Num frames 4400...
[2024-10-12 09:46:02,535][00738] Num frames 4500...
[2024-10-12 09:46:02,657][00738] Num frames 4600...
[2024-10-12 09:46:02,792][00738] Num frames 4700...
[2024-10-12 09:46:02,876][00738] Avg episode rewards: #0: 20.646, true rewards: #0: 9.446
[2024-10-12 09:46:02,878][00738] Avg episode reward: 20.646, avg true_objective: 9.446
[2024-10-12 09:46:02,976][00738] Num frames 4800...
[2024-10-12 09:46:03,107][00738] Num frames 4900...
[2024-10-12 09:46:03,251][00738] Num frames 5000...
[2024-10-12 09:46:03,372][00738] Num frames 5100...
[2024-10-12 09:46:03,491][00738] Num frames 5200...
[2024-10-12 09:46:03,661][00738] Avg episode rewards: #0: 19.327, true rewards: #0: 8.827
[2024-10-12 09:46:03,663][00738] Avg episode reward: 19.327, avg true_objective: 8.827
[2024-10-12 09:46:03,673][00738] Num frames 5300...
[2024-10-12 09:46:03,792][00738] Num frames 5400...
[2024-10-12 09:46:03,915][00738] Num frames 5500...
[2024-10-12 09:46:04,036][00738] Num frames 5600...
[2024-10-12 09:46:04,177][00738] Num frames 5700...
[2024-10-12 09:46:04,301][00738] Num frames 5800...
[2024-10-12 09:46:04,368][00738] Avg episode rewards: #0: 17.583, true rewards: #0: 8.297
[2024-10-12 09:46:04,370][00738] Avg episode reward: 17.583, avg true_objective: 8.297
[2024-10-12 09:46:04,488][00738] Num frames 5900...
[2024-10-12 09:46:04,607][00738] Num frames 6000...
[2024-10-12 09:46:04,731][00738] Num frames 6100...
[2024-10-12 09:46:04,850][00738] Num frames 6200...
[2024-10-12 09:46:04,972][00738] Num frames 6300...
[2024-10-12 09:46:05,096][00738] Num frames 6400...
[2024-10-12 09:46:05,242][00738] Num frames 6500...
[2024-10-12 09:46:05,363][00738] Num frames 6600...
[2024-10-12 09:46:05,490][00738] Num frames 6700...
[2024-10-12 09:46:05,613][00738] Num frames 6800...
[2024-10-12 09:46:05,787][00738] Avg episode rewards: #0: 18.369, true rewards: #0: 8.619
[2024-10-12 09:46:05,789][00738] Avg episode reward: 18.369, avg true_objective: 8.619
[2024-10-12 09:46:05,798][00738] Num frames 6900...
[2024-10-12 09:46:05,915][00738] Num frames 7000...
[2024-10-12 09:46:06,040][00738] Num frames 7100...
[2024-10-12 09:46:06,172][00738] Num frames 7200...
[2024-10-12 09:46:06,301][00738] Num frames 7300...
[2024-10-12 09:46:06,420][00738] Num frames 7400...
[2024-10-12 09:46:06,549][00738] Num frames 7500...
[2024-10-12 09:46:06,670][00738] Num frames 7600...
[2024-10-12 09:46:06,793][00738] Num frames 7700...
[2024-10-12 09:46:06,910][00738] Num frames 7800...
[2024-10-12 09:46:07,037][00738] Num frames 7900...
[2024-10-12 09:46:07,205][00738] Avg episode rewards: #0: 18.648, true rewards: #0: 8.870
[2024-10-12 09:46:07,206][00738] Avg episode reward: 18.648, avg true_objective: 8.870
[2024-10-12 09:46:07,231][00738] Num frames 8000...
[2024-10-12 09:46:07,363][00738] Num frames 8100...
[2024-10-12 09:46:07,483][00738] Num frames 8200...
[2024-10-12 09:46:07,607][00738] Num frames 8300...
[2024-10-12 09:46:07,727][00738] Num frames 8400...
[2024-10-12 09:46:07,854][00738] Num frames 8500...
[2024-10-12 09:46:07,975][00738] Num frames 8600...
[2024-10-12 09:46:08,101][00738] Num frames 8700...
[2024-10-12 09:46:08,231][00738] Num frames 8800...
[2024-10-12 09:46:08,360][00738] Num frames 8900...
[2024-10-12 09:46:08,519][00738] Avg episode rewards: #0: 18.687, true rewards: #0: 8.987
[2024-10-12 09:46:08,522][00738] Avg episode reward: 18.687, avg true_objective: 8.987
[2024-10-12 09:47:00,432][00738] Replay video saved to /content/train_dir/default_experiment/replay.mp4!