[2024-11-18 09:42:43,707][01550] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-11-18 09:42:43,710][01550] Rollout worker 0 uses device cpu [2024-11-18 09:42:43,711][01550] Rollout worker 1 uses device cpu [2024-11-18 09:42:43,713][01550] Rollout worker 2 uses device cpu [2024-11-18 09:42:43,714][01550] Rollout worker 3 uses device cpu [2024-11-18 09:42:43,716][01550] Rollout worker 4 uses device cpu [2024-11-18 09:42:43,717][01550] Rollout worker 5 uses device cpu [2024-11-18 09:42:43,718][01550] Rollout worker 6 uses device cpu [2024-11-18 09:42:43,720][01550] Rollout worker 7 uses device cpu [2024-11-18 09:42:43,869][01550] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-18 09:42:43,871][01550] InferenceWorker_p0-w0: min num requests: 2 [2024-11-18 09:42:43,910][01550] Starting all processes... [2024-11-18 09:42:43,912][01550] Starting process learner_proc0 [2024-11-18 09:42:43,960][01550] Starting all processes... [2024-11-18 09:42:43,969][01550] Starting process inference_proc0-0 [2024-11-18 09:42:43,970][01550] Starting process rollout_proc0 [2024-11-18 09:42:43,971][01550] Starting process rollout_proc1 [2024-11-18 09:42:43,971][01550] Starting process rollout_proc2 [2024-11-18 09:42:43,971][01550] Starting process rollout_proc3 [2024-11-18 09:42:43,972][01550] Starting process rollout_proc4 [2024-11-18 09:42:43,972][01550] Starting process rollout_proc5 [2024-11-18 09:42:43,972][01550] Starting process rollout_proc6 [2024-11-18 09:42:43,972][01550] Starting process rollout_proc7 [2024-11-18 09:42:54,723][04620] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-18 09:42:54,729][04620] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-18 09:42:54,778][04620] Num visible devices: 1 [2024-11-18 09:42:54,818][04620] Starting seed is not provided [2024-11-18 09:42:54,819][04620] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-18 09:42:54,820][04620] Initializing actor-critic model on device cuda:0 [2024-11-18 09:42:54,820][04620] RunningMeanStd input shape: (3, 72, 128) [2024-11-18 09:42:54,822][04620] RunningMeanStd input shape: (1,) [2024-11-18 09:42:54,899][04620] ConvEncoder: input_channels=3 [2024-11-18 09:42:55,219][04637] Worker 3 uses CPU cores [1] [2024-11-18 09:42:55,250][04633] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-18 09:42:55,258][04633] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-18 09:42:55,291][04635] Worker 1 uses CPU cores [1] [2024-11-18 09:42:55,356][04633] Num visible devices: 1 [2024-11-18 09:42:55,437][04640] Worker 6 uses CPU cores [0] [2024-11-18 09:42:55,543][04636] Worker 2 uses CPU cores [0] [2024-11-18 09:42:55,571][04634] Worker 0 uses CPU cores [0] [2024-11-18 09:42:55,577][04639] Worker 4 uses CPU cores [0] [2024-11-18 09:42:55,581][04638] Worker 5 uses CPU cores [1] [2024-11-18 09:42:55,634][04620] Conv encoder output size: 512 [2024-11-18 09:42:55,637][04620] Policy head output size: 512 [2024-11-18 09:42:55,651][04641] Worker 7 uses CPU cores [1] [2024-11-18 09:42:55,660][04620] Created Actor Critic model with architecture: [2024-11-18 09:42:55,660][04620] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-18 09:42:59,523][04620] Using optimizer [2024-11-18 09:42:59,524][04620] No checkpoints found [2024-11-18 09:42:59,524][04620] Did not load from checkpoint, starting from scratch! [2024-11-18 09:42:59,525][04620] Initialized policy 0 weights for model version 0 [2024-11-18 09:42:59,528][04620] LearnerWorker_p0 finished initialization! [2024-11-18 09:42:59,528][04620] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-18 09:42:59,703][04633] RunningMeanStd input shape: (3, 72, 128) [2024-11-18 09:42:59,704][04633] RunningMeanStd input shape: (1,) [2024-11-18 09:42:59,716][04633] ConvEncoder: input_channels=3 [2024-11-18 09:42:59,824][04633] Conv encoder output size: 512 [2024-11-18 09:42:59,825][04633] Policy head output size: 512 [2024-11-18 09:43:00,336][01550] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-18 09:43:01,350][01550] Inference worker 0-0 is ready! [2024-11-18 09:43:01,352][01550] All inference workers are ready! Signal rollout workers to start! [2024-11-18 09:43:01,469][04634] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-18 09:43:01,479][04639] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-18 09:43:01,480][04636] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-18 09:43:01,483][04641] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-18 09:43:01,490][04640] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-18 09:43:01,489][04638] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-18 09:43:01,503][04637] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-18 09:43:01,513][04635] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-18 09:43:03,029][04637] Decorrelating experience for 0 frames... [2024-11-18 09:43:03,031][04640] Decorrelating experience for 0 frames... [2024-11-18 09:43:03,030][04641] Decorrelating experience for 0 frames... [2024-11-18 09:43:03,032][04639] Decorrelating experience for 0 frames... [2024-11-18 09:43:03,031][04635] Decorrelating experience for 0 frames... [2024-11-18 09:43:03,033][04636] Decorrelating experience for 0 frames... [2024-11-18 09:43:03,031][04634] Decorrelating experience for 0 frames... [2024-11-18 09:43:03,860][01550] Heartbeat connected on Batcher_0 [2024-11-18 09:43:03,875][01550] Heartbeat connected on LearnerWorker_p0 [2024-11-18 09:43:03,926][01550] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-18 09:43:04,456][04640] Decorrelating experience for 32 frames... [2024-11-18 09:43:04,478][04634] Decorrelating experience for 32 frames... [2024-11-18 09:43:04,485][04636] Decorrelating experience for 32 frames... [2024-11-18 09:43:04,710][04635] Decorrelating experience for 32 frames... [2024-11-18 09:43:04,715][04637] Decorrelating experience for 32 frames... [2024-11-18 09:43:04,718][04641] Decorrelating experience for 32 frames... [2024-11-18 09:43:04,772][04638] Decorrelating experience for 0 frames... [2024-11-18 09:43:05,339][01550] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-18 09:43:06,608][04639] Decorrelating experience for 32 frames... [2024-11-18 09:43:06,926][04640] Decorrelating experience for 64 frames... [2024-11-18 09:43:06,919][04634] Decorrelating experience for 64 frames... [2024-11-18 09:43:06,923][04636] Decorrelating experience for 64 frames... [2024-11-18 09:43:07,092][04638] Decorrelating experience for 32 frames... [2024-11-18 09:43:07,366][04641] Decorrelating experience for 64 frames... [2024-11-18 09:43:07,379][04637] Decorrelating experience for 64 frames... [2024-11-18 09:43:08,575][04635] Decorrelating experience for 64 frames... [2024-11-18 09:43:08,613][04634] Decorrelating experience for 96 frames... [2024-11-18 09:43:08,620][04640] Decorrelating experience for 96 frames... [2024-11-18 09:43:08,987][04641] Decorrelating experience for 96 frames... [2024-11-18 09:43:09,020][01550] Heartbeat connected on RolloutWorker_w0 [2024-11-18 09:43:09,033][01550] Heartbeat connected on RolloutWorker_w6 [2024-11-18 09:43:09,054][04639] Decorrelating experience for 64 frames... [2024-11-18 09:43:09,416][01550] Heartbeat connected on RolloutWorker_w7 [2024-11-18 09:43:09,777][04636] Decorrelating experience for 96 frames... [2024-11-18 09:43:09,975][01550] Heartbeat connected on RolloutWorker_w2 [2024-11-18 09:43:10,336][01550] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-18 09:43:10,675][04637] Decorrelating experience for 96 frames... [2024-11-18 09:43:10,676][04638] Decorrelating experience for 64 frames... [2024-11-18 09:43:10,896][04635] Decorrelating experience for 96 frames... [2024-11-18 09:43:11,057][01550] Heartbeat connected on RolloutWorker_w3 [2024-11-18 09:43:11,259][01550] Heartbeat connected on RolloutWorker_w1 [2024-11-18 09:43:11,743][04639] Decorrelating experience for 96 frames... [2024-11-18 09:43:11,878][01550] Heartbeat connected on RolloutWorker_w4 [2024-11-18 09:43:11,974][04638] Decorrelating experience for 96 frames... [2024-11-18 09:43:12,081][01550] Heartbeat connected on RolloutWorker_w5 [2024-11-18 09:43:15,136][04620] Signal inference workers to stop experience collection... [2024-11-18 09:43:15,149][04633] InferenceWorker_p0-w0: stopping experience collection [2024-11-18 09:43:15,336][01550] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 60.9. Samples: 914. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-18 09:43:15,338][01550] Avg episode reward: [(0, '1.790')] [2024-11-18 09:43:17,187][04620] Signal inference workers to resume experience collection... [2024-11-18 09:43:17,189][04633] InferenceWorker_p0-w0: resuming experience collection [2024-11-18 09:43:20,336][01550] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 178.0. Samples: 3560. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:43:20,343][01550] Avg episode reward: [(0, '2.991')] [2024-11-18 09:43:25,336][01550] Fps is (10 sec: 2867.2, 60 sec: 1146.9, 300 sec: 1146.9). Total num frames: 28672. Throughput: 0: 251.6. Samples: 6290. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-11-18 09:43:25,342][01550] Avg episode reward: [(0, '3.625')] [2024-11-18 09:43:28,715][04633] Updated weights for policy 0, policy_version 10 (0.0022) [2024-11-18 09:43:30,336][01550] Fps is (10 sec: 2867.2, 60 sec: 1501.9, 300 sec: 1501.9). Total num frames: 45056. Throughput: 0: 340.9. Samples: 10226. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-18 09:43:30,338][01550] Avg episode reward: [(0, '4.318')] [2024-11-18 09:43:35,336][01550] Fps is (10 sec: 3686.4, 60 sec: 1872.5, 300 sec: 1872.5). Total num frames: 65536. Throughput: 0: 476.3. Samples: 16670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:43:35,342][01550] Avg episode reward: [(0, '4.457')] [2024-11-18 09:43:38,234][04633] Updated weights for policy 0, policy_version 20 (0.0022) [2024-11-18 09:43:40,336][01550] Fps is (10 sec: 4096.0, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 86016. Throughput: 0: 499.7. Samples: 19988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:43:40,338][01550] Avg episode reward: [(0, '4.333')] [2024-11-18 09:43:45,336][01550] Fps is (10 sec: 3276.8, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 98304. Throughput: 0: 545.8. Samples: 24562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:43:45,339][01550] Avg episode reward: [(0, '4.388')] [2024-11-18 09:43:50,206][04633] Updated weights for policy 0, policy_version 30 (0.0024) [2024-11-18 09:43:50,336][01550] Fps is (10 sec: 3686.5, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 122880. Throughput: 0: 669.8. Samples: 30138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:43:50,344][01550] Avg episode reward: [(0, '4.583')] [2024-11-18 09:43:50,350][04620] Saving new best policy, reward=4.583! [2024-11-18 09:43:55,336][01550] Fps is (10 sec: 4505.6, 60 sec: 2606.5, 300 sec: 2606.5). Total num frames: 143360. Throughput: 0: 740.9. Samples: 33340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-18 09:43:55,341][01550] Avg episode reward: [(0, '4.664')] [2024-11-18 09:43:55,345][04620] Saving new best policy, reward=4.664! [2024-11-18 09:44:00,336][01550] Fps is (10 sec: 3276.8, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 155648. Throughput: 0: 846.0. Samples: 38984. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:44:00,339][01550] Avg episode reward: [(0, '4.612')] [2024-11-18 09:44:02,113][04633] Updated weights for policy 0, policy_version 40 (0.0027) [2024-11-18 09:44:05,336][01550] Fps is (10 sec: 3276.8, 60 sec: 2935.6, 300 sec: 2709.7). Total num frames: 176128. Throughput: 0: 888.8. Samples: 43558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:44:05,343][01550] Avg episode reward: [(0, '4.343')] [2024-11-18 09:44:10,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 2808.7). Total num frames: 196608. Throughput: 0: 902.7. Samples: 46910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:44:10,339][01550] Avg episode reward: [(0, '4.448')] [2024-11-18 09:44:11,529][04633] Updated weights for policy 0, policy_version 50 (0.0012) [2024-11-18 09:44:15,336][01550] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 2894.5). Total num frames: 217088. Throughput: 0: 960.9. Samples: 53466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-18 09:44:15,339][01550] Avg episode reward: [(0, '4.605')] [2024-11-18 09:44:20,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 2867.2). Total num frames: 229376. Throughput: 0: 907.4. Samples: 57504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:44:20,340][01550] Avg episode reward: [(0, '4.555')] [2024-11-18 09:44:24,709][04633] Updated weights for policy 0, policy_version 60 (0.0021) [2024-11-18 09:44:25,336][01550] Fps is (10 sec: 2867.3, 60 sec: 3618.1, 300 sec: 2891.3). Total num frames: 245760. Throughput: 0: 887.8. Samples: 59940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:44:25,343][01550] Avg episode reward: [(0, '4.415')] [2024-11-18 09:44:30,336][01550] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3003.7). Total num frames: 270336. Throughput: 0: 922.4. Samples: 66070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:44:30,340][01550] Avg episode reward: [(0, '4.668')] [2024-11-18 09:44:30,352][04620] Saving new best policy, reward=4.668! [2024-11-18 09:44:35,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 2975.0). Total num frames: 282624. Throughput: 0: 906.8. Samples: 70942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:44:35,338][01550] Avg episode reward: [(0, '4.616')] [2024-11-18 09:44:36,362][04633] Updated weights for policy 0, policy_version 70 (0.0021) [2024-11-18 09:44:40,336][01550] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 2990.1). Total num frames: 299008. Throughput: 0: 880.0. Samples: 72942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:44:40,340][01550] Avg episode reward: [(0, '4.300')] [2024-11-18 09:44:40,351][04620] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth... [2024-11-18 09:44:45,335][01550] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3081.8). Total num frames: 323584. Throughput: 0: 897.7. Samples: 79380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:44:45,338][01550] Avg episode reward: [(0, '4.428')] [2024-11-18 09:44:46,164][04633] Updated weights for policy 0, policy_version 80 (0.0028) [2024-11-18 09:44:50,337][01550] Fps is (10 sec: 3276.2, 60 sec: 3481.5, 300 sec: 3016.1). Total num frames: 331776. Throughput: 0: 890.1. Samples: 83616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:44:50,341][01550] Avg episode reward: [(0, '4.554')] [2024-11-18 09:44:55,336][01550] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3027.5). Total num frames: 348160. Throughput: 0: 858.4. Samples: 85536. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:44:55,338][01550] Avg episode reward: [(0, '4.676')] [2024-11-18 09:44:55,343][04620] Saving new best policy, reward=4.676! [2024-11-18 09:44:59,702][04633] Updated weights for policy 0, policy_version 90 (0.0017) [2024-11-18 09:45:00,337][01550] Fps is (10 sec: 3686.4, 60 sec: 3549.8, 300 sec: 3072.0). Total num frames: 368640. Throughput: 0: 839.0. Samples: 91224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:45:00,341][01550] Avg episode reward: [(0, '4.409')] [2024-11-18 09:45:05,336][01550] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3113.0). Total num frames: 389120. Throughput: 0: 896.4. Samples: 97842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:45:05,342][01550] Avg episode reward: [(0, '4.227')] [2024-11-18 09:45:10,338][01550] Fps is (10 sec: 3276.5, 60 sec: 3413.2, 300 sec: 3087.7). Total num frames: 401408. Throughput: 0: 887.7. Samples: 99888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:45:10,340][01550] Avg episode reward: [(0, '4.344')] [2024-11-18 09:45:11,867][04633] Updated weights for policy 0, policy_version 100 (0.0022) [2024-11-18 09:45:15,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3125.1). Total num frames: 421888. Throughput: 0: 858.4. Samples: 104698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:45:15,338][01550] Avg episode reward: [(0, '4.558')] [2024-11-18 09:45:20,336][01550] Fps is (10 sec: 4506.8, 60 sec: 3618.1, 300 sec: 3189.0). Total num frames: 446464. Throughput: 0: 895.1. Samples: 111220. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:45:20,338][01550] Avg episode reward: [(0, '4.503')] [2024-11-18 09:45:21,089][04633] Updated weights for policy 0, policy_version 110 (0.0026) [2024-11-18 09:45:25,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3192.1). Total num frames: 462848. Throughput: 0: 917.2. Samples: 114216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:45:25,341][01550] Avg episode reward: [(0, '4.429')] [2024-11-18 09:45:30,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3194.9). Total num frames: 479232. Throughput: 0: 867.9. Samples: 118436. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-18 09:45:30,338][01550] Avg episode reward: [(0, '4.471')] [2024-11-18 09:45:33,099][04633] Updated weights for policy 0, policy_version 120 (0.0019) [2024-11-18 09:45:35,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3224.0). Total num frames: 499712. Throughput: 0: 912.3. Samples: 124668. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-18 09:45:35,338][01550] Avg episode reward: [(0, '4.570')] [2024-11-18 09:45:40,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3251.2). Total num frames: 520192. Throughput: 0: 946.1. Samples: 128110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:45:40,340][01550] Avg episode reward: [(0, '4.597')] [2024-11-18 09:45:44,707][04633] Updated weights for policy 0, policy_version 130 (0.0029) [2024-11-18 09:45:45,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3227.2). Total num frames: 532480. Throughput: 0: 922.2. Samples: 132720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:45:45,342][01550] Avg episode reward: [(0, '4.591')] [2024-11-18 09:45:50,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3252.7). Total num frames: 552960. Throughput: 0: 899.3. Samples: 138312. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:45:50,338][01550] Avg episode reward: [(0, '4.481')] [2024-11-18 09:45:54,366][04633] Updated weights for policy 0, policy_version 140 (0.0018) [2024-11-18 09:45:55,336][01550] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3300.2). Total num frames: 577536. Throughput: 0: 930.5. Samples: 141758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:45:55,338][01550] Avg episode reward: [(0, '4.551')] [2024-11-18 09:46:00,337][01550] Fps is (10 sec: 3685.7, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 589824. Throughput: 0: 951.6. Samples: 147520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:46:00,340][01550] Avg episode reward: [(0, '4.374')] [2024-11-18 09:46:05,336][01550] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3298.9). Total num frames: 610304. Throughput: 0: 909.5. Samples: 152148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:46:05,343][01550] Avg episode reward: [(0, '4.122')] [2024-11-18 09:46:06,154][04633] Updated weights for policy 0, policy_version 150 (0.0032) [2024-11-18 09:46:10,336][01550] Fps is (10 sec: 4096.8, 60 sec: 3823.1, 300 sec: 3319.9). Total num frames: 630784. Throughput: 0: 918.4. Samples: 155542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:46:10,338][01550] Avg episode reward: [(0, '4.248')] [2024-11-18 09:46:15,336][01550] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3339.8). Total num frames: 651264. Throughput: 0: 973.9. Samples: 162260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:46:15,339][01550] Avg episode reward: [(0, '4.384')] [2024-11-18 09:46:16,694][04633] Updated weights for policy 0, policy_version 160 (0.0015) [2024-11-18 09:46:20,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3317.8). Total num frames: 663552. Throughput: 0: 925.5. Samples: 166316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:46:20,338][01550] Avg episode reward: [(0, '4.393')] [2024-11-18 09:46:25,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3356.7). Total num frames: 688128. Throughput: 0: 915.7. Samples: 169316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:46:25,341][01550] Avg episode reward: [(0, '4.326')] [2024-11-18 09:46:27,125][04633] Updated weights for policy 0, policy_version 170 (0.0015) [2024-11-18 09:46:30,341][01550] Fps is (10 sec: 4503.1, 60 sec: 3822.6, 300 sec: 3374.2). Total num frames: 708608. Throughput: 0: 963.7. Samples: 176090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:46:30,344][01550] Avg episode reward: [(0, '4.443')] [2024-11-18 09:46:35,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3372.1). Total num frames: 724992. Throughput: 0: 949.4. Samples: 181036. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-18 09:46:35,340][01550] Avg episode reward: [(0, '4.584')] [2024-11-18 09:46:38,809][04633] Updated weights for policy 0, policy_version 180 (0.0015) [2024-11-18 09:46:40,336][01550] Fps is (10 sec: 3278.6, 60 sec: 3686.4, 300 sec: 3369.9). Total num frames: 741376. Throughput: 0: 919.9. Samples: 183152. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-18 09:46:40,342][01550] Avg episode reward: [(0, '4.466')] [2024-11-18 09:46:40,352][04620] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000181_741376.pth... [2024-11-18 09:46:45,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3386.0). Total num frames: 761856. Throughput: 0: 939.7. Samples: 189806. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-18 09:46:45,338][01550] Avg episode reward: [(0, '4.524')] [2024-11-18 09:46:48,447][04633] Updated weights for policy 0, policy_version 190 (0.0020) [2024-11-18 09:46:50,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3401.5). Total num frames: 782336. Throughput: 0: 965.2. Samples: 195584. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:46:50,340][01550] Avg episode reward: [(0, '4.682')] [2024-11-18 09:46:50,355][04620] Saving new best policy, reward=4.682! [2024-11-18 09:46:55,335][01550] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3398.8). Total num frames: 798720. Throughput: 0: 934.2. Samples: 197580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:46:55,340][01550] Avg episode reward: [(0, '4.854')] [2024-11-18 09:46:55,343][04620] Saving new best policy, reward=4.854! [2024-11-18 09:47:00,080][04633] Updated weights for policy 0, policy_version 200 (0.0027) [2024-11-18 09:47:00,336][01550] Fps is (10 sec: 3686.3, 60 sec: 3823.0, 300 sec: 3413.3). Total num frames: 819200. Throughput: 0: 912.9. Samples: 203342. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:47:00,348][01550] Avg episode reward: [(0, '4.712')] [2024-11-18 09:47:05,336][01550] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3427.3). Total num frames: 839680. Throughput: 0: 974.1. Samples: 210152. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-18 09:47:05,342][01550] Avg episode reward: [(0, '4.406')] [2024-11-18 09:47:10,336][01550] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3407.9). Total num frames: 851968. Throughput: 0: 953.6. Samples: 212228. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-18 09:47:10,339][01550] Avg episode reward: [(0, '4.507')] [2024-11-18 09:47:12,049][04633] Updated weights for policy 0, policy_version 210 (0.0025) [2024-11-18 09:47:15,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3421.4). Total num frames: 872448. Throughput: 0: 912.3. Samples: 217138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:47:15,338][01550] Avg episode reward: [(0, '4.525')] [2024-11-18 09:47:20,337][01550] Fps is (10 sec: 4095.6, 60 sec: 3822.8, 300 sec: 3434.3). Total num frames: 892928. Throughput: 0: 949.8. Samples: 223778. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-11-18 09:47:20,342][01550] Avg episode reward: [(0, '4.556')] [2024-11-18 09:47:21,314][04633] Updated weights for policy 0, policy_version 220 (0.0014) [2024-11-18 09:47:25,337][01550] Fps is (10 sec: 3685.7, 60 sec: 3686.3, 300 sec: 3431.3). Total num frames: 909312. Throughput: 0: 972.8. Samples: 226932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:47:25,341][01550] Avg episode reward: [(0, '4.637')] [2024-11-18 09:47:30,336][01550] Fps is (10 sec: 3277.3, 60 sec: 3618.5, 300 sec: 3428.5). Total num frames: 925696. Throughput: 0: 918.3. Samples: 231130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:47:30,338][01550] Avg episode reward: [(0, '4.496')] [2024-11-18 09:47:32,897][04633] Updated weights for policy 0, policy_version 230 (0.0012) [2024-11-18 09:47:35,336][01550] Fps is (10 sec: 4096.8, 60 sec: 3754.7, 300 sec: 3455.5). Total num frames: 950272. Throughput: 0: 935.6. Samples: 237688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:47:35,339][01550] Avg episode reward: [(0, '4.355')] [2024-11-18 09:47:40,336][01550] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3467.0). Total num frames: 970752. Throughput: 0: 965.2. Samples: 241016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:47:40,338][01550] Avg episode reward: [(0, '4.359')] [2024-11-18 09:47:43,856][04633] Updated weights for policy 0, policy_version 240 (0.0020) [2024-11-18 09:47:45,336][01550] Fps is (10 sec: 3276.6, 60 sec: 3686.4, 300 sec: 3449.3). Total num frames: 983040. Throughput: 0: 945.1. Samples: 245872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:47:45,345][01550] Avg episode reward: [(0, '4.427')] [2024-11-18 09:47:50,336][01550] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3460.4). Total num frames: 1003520. Throughput: 0: 912.8. Samples: 251228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-18 09:47:50,338][01550] Avg episode reward: [(0, '4.330')] [2024-11-18 09:47:54,218][04633] Updated weights for policy 0, policy_version 250 (0.0019) [2024-11-18 09:47:55,336][01550] Fps is (10 sec: 4505.9, 60 sec: 3822.9, 300 sec: 3485.1). Total num frames: 1028096. Throughput: 0: 939.1. Samples: 254488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:47:55,338][01550] Avg episode reward: [(0, '4.722')] [2024-11-18 09:48:00,336][01550] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3526.8). Total num frames: 1040384. Throughput: 0: 960.4. Samples: 260354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:48:00,341][01550] Avg episode reward: [(0, '4.864')] [2024-11-18 09:48:00,356][04620] Saving new best policy, reward=4.864! [2024-11-18 09:48:05,336][01550] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1056768. Throughput: 0: 910.8. Samples: 264764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:48:05,338][01550] Avg episode reward: [(0, '4.650')] [2024-11-18 09:48:06,271][04633] Updated weights for policy 0, policy_version 260 (0.0039) [2024-11-18 09:48:10,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3665.6). Total num frames: 1081344. Throughput: 0: 914.4. Samples: 268080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:48:10,347][01550] Avg episode reward: [(0, '4.538')] [2024-11-18 09:48:15,336][01550] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 1101824. Throughput: 0: 972.4. Samples: 274886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:48:15,342][01550] Avg episode reward: [(0, '4.520')] [2024-11-18 09:48:16,534][04633] Updated weights for policy 0, policy_version 270 (0.0020) [2024-11-18 09:48:20,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3679.5). Total num frames: 1114112. Throughput: 0: 916.7. Samples: 278938. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:48:20,340][01550] Avg episode reward: [(0, '4.585')] [2024-11-18 09:48:25,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3707.2). Total num frames: 1138688. Throughput: 0: 907.7. Samples: 281864. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-18 09:48:25,340][01550] Avg episode reward: [(0, '4.754')] [2024-11-18 09:48:27,251][04633] Updated weights for policy 0, policy_version 280 (0.0013) [2024-11-18 09:48:30,336][01550] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 1159168. Throughput: 0: 948.9. Samples: 288574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:48:30,342][01550] Avg episode reward: [(0, '4.749')] [2024-11-18 09:48:35,340][01550] Fps is (10 sec: 3275.5, 60 sec: 3686.1, 300 sec: 3679.4). Total num frames: 1171456. Throughput: 0: 943.6. Samples: 293694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:48:35,346][01550] Avg episode reward: [(0, '4.735')] [2024-11-18 09:48:38,972][04633] Updated weights for policy 0, policy_version 290 (0.0023) [2024-11-18 09:48:40,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1191936. Throughput: 0: 918.2. Samples: 295808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:48:40,340][01550] Avg episode reward: [(0, '4.626')] [2024-11-18 09:48:40,357][04620] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000291_1191936.pth... [2024-11-18 09:48:40,504][04620] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth [2024-11-18 09:48:45,336][01550] Fps is (10 sec: 4097.7, 60 sec: 3823.0, 300 sec: 3693.3). Total num frames: 1212416. Throughput: 0: 932.0. Samples: 302296. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:48:45,339][01550] Avg episode reward: [(0, '4.581')] [2024-11-18 09:48:48,833][04633] Updated weights for policy 0, policy_version 300 (0.0012) [2024-11-18 09:48:50,338][01550] Fps is (10 sec: 3685.4, 60 sec: 3754.5, 300 sec: 3679.4). Total num frames: 1228800. Throughput: 0: 958.3. Samples: 307890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:48:50,340][01550] Avg episode reward: [(0, '4.567')] [2024-11-18 09:48:55,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 1245184. Throughput: 0: 926.1. Samples: 309756. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:48:55,338][01550] Avg episode reward: [(0, '4.768')] [2024-11-18 09:49:00,336][01550] Fps is (10 sec: 3687.4, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1265664. Throughput: 0: 888.0. Samples: 314844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:49:00,339][01550] Avg episode reward: [(0, '4.836')] [2024-11-18 09:49:01,588][04633] Updated weights for policy 0, policy_version 310 (0.0038) [2024-11-18 09:49:05,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1282048. Throughput: 0: 935.4. Samples: 321032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:49:05,338][01550] Avg episode reward: [(0, '4.568')] [2024-11-18 09:49:10,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 1298432. Throughput: 0: 922.0. Samples: 323356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:49:10,341][01550] Avg episode reward: [(0, '4.645')] [2024-11-18 09:49:13,719][04633] Updated weights for policy 0, policy_version 320 (0.0012) [2024-11-18 09:49:15,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 1314816. Throughput: 0: 872.8. Samples: 327852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-18 09:49:15,342][01550] Avg episode reward: [(0, '4.883')] [2024-11-18 09:49:15,348][04620] Saving new best policy, reward=4.883! [2024-11-18 09:49:20,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1335296. Throughput: 0: 901.8. Samples: 334270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:49:20,338][01550] Avg episode reward: [(0, '4.764')] [2024-11-18 09:49:23,247][04633] Updated weights for policy 0, policy_version 330 (0.0014) [2024-11-18 09:49:25,337][01550] Fps is (10 sec: 4095.3, 60 sec: 3618.0, 300 sec: 3679.4). Total num frames: 1355776. Throughput: 0: 929.1. Samples: 337620. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:49:25,340][01550] Avg episode reward: [(0, '4.979')] [2024-11-18 09:49:25,344][04620] Saving new best policy, reward=4.979! [2024-11-18 09:49:30,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3679.5). Total num frames: 1368064. Throughput: 0: 877.4. Samples: 341778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:49:30,340][01550] Avg episode reward: [(0, '5.123')] [2024-11-18 09:49:30,349][04620] Saving new best policy, reward=5.123! [2024-11-18 09:49:35,324][04633] Updated weights for policy 0, policy_version 340 (0.0013) [2024-11-18 09:49:35,336][01550] Fps is (10 sec: 3687.0, 60 sec: 3686.7, 300 sec: 3707.2). Total num frames: 1392640. Throughput: 0: 885.4. Samples: 347730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:49:35,342][01550] Avg episode reward: [(0, '5.291')] [2024-11-18 09:49:35,344][04620] Saving new best policy, reward=5.291! [2024-11-18 09:49:40,336][01550] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1413120. Throughput: 0: 916.1. Samples: 350982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:49:40,340][01550] Avg episode reward: [(0, '5.206')] [2024-11-18 09:49:45,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3707.3). Total num frames: 1425408. Throughput: 0: 914.8. Samples: 356008. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:49:45,338][01550] Avg episode reward: [(0, '5.202')] [2024-11-18 09:49:47,559][04633] Updated weights for policy 0, policy_version 350 (0.0022) [2024-11-18 09:49:50,336][01550] Fps is (10 sec: 2867.2, 60 sec: 3550.0, 300 sec: 3707.2). Total num frames: 1441792. Throughput: 0: 888.2. Samples: 361002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:49:50,341][01550] Avg episode reward: [(0, '5.116')] [2024-11-18 09:49:55,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1466368. Throughput: 0: 910.3. Samples: 364318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:49:55,338][01550] Avg episode reward: [(0, '5.143')] [2024-11-18 09:49:56,964][04633] Updated weights for policy 0, policy_version 360 (0.0016) [2024-11-18 09:50:00,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 1482752. Throughput: 0: 944.4. Samples: 370352. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:50:00,338][01550] Avg episode reward: [(0, '4.995')] [2024-11-18 09:50:05,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 1499136. Throughput: 0: 895.6. Samples: 374570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:50:05,338][01550] Avg episode reward: [(0, '4.991')] [2024-11-18 09:50:08,994][04633] Updated weights for policy 0, policy_version 370 (0.0016) [2024-11-18 09:50:10,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1519616. Throughput: 0: 891.1. Samples: 377716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:50:10,338][01550] Avg episode reward: [(0, '5.035')] [2024-11-18 09:50:15,339][01550] Fps is (10 sec: 4094.6, 60 sec: 3754.5, 300 sec: 3707.2). Total num frames: 1540096. Throughput: 0: 946.0. Samples: 384352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:50:15,341][01550] Avg episode reward: [(0, '5.291')] [2024-11-18 09:50:20,338][01550] Fps is (10 sec: 3275.9, 60 sec: 3618.0, 300 sec: 3693.3). Total num frames: 1552384. Throughput: 0: 912.4. Samples: 388790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:50:20,343][01550] Avg episode reward: [(0, '5.433')] [2024-11-18 09:50:20,356][04620] Saving new best policy, reward=5.433! [2024-11-18 09:50:20,712][04633] Updated weights for policy 0, policy_version 380 (0.0013) [2024-11-18 09:50:25,336][01550] Fps is (10 sec: 3277.9, 60 sec: 3618.2, 300 sec: 3707.2). Total num frames: 1572864. Throughput: 0: 888.8. Samples: 390978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:50:25,341][01550] Avg episode reward: [(0, '5.194')] [2024-11-18 09:50:30,336][01550] Fps is (10 sec: 4097.0, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1593344. Throughput: 0: 924.6. Samples: 397616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:50:30,346][01550] Avg episode reward: [(0, '5.445')] [2024-11-18 09:50:30,359][04620] Saving new best policy, reward=5.445! [2024-11-18 09:50:30,596][04633] Updated weights for policy 0, policy_version 390 (0.0012) [2024-11-18 09:50:35,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 1609728. Throughput: 0: 937.1. Samples: 403172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:50:35,340][01550] Avg episode reward: [(0, '5.450')] [2024-11-18 09:50:35,346][04620] Saving new best policy, reward=5.450! [2024-11-18 09:50:40,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 1626112. Throughput: 0: 906.6. Samples: 405116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:50:40,338][01550] Avg episode reward: [(0, '5.461')] [2024-11-18 09:50:40,349][04620] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000397_1626112.pth... [2024-11-18 09:50:40,478][04620] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000181_741376.pth [2024-11-18 09:50:40,491][04620] Saving new best policy, reward=5.461! [2024-11-18 09:50:42,969][04633] Updated weights for policy 0, policy_version 400 (0.0021) [2024-11-18 09:50:45,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1646592. Throughput: 0: 897.6. Samples: 410742. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:50:45,342][01550] Avg episode reward: [(0, '5.660')] [2024-11-18 09:50:45,345][04620] Saving new best policy, reward=5.660! [2024-11-18 09:50:50,344][01550] Fps is (10 sec: 4092.5, 60 sec: 3754.1, 300 sec: 3693.2). Total num frames: 1667072. Throughput: 0: 945.4. Samples: 417120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:50:50,347][01550] Avg episode reward: [(0, '5.831')] [2024-11-18 09:50:50,358][04620] Saving new best policy, reward=5.831! [2024-11-18 09:50:54,613][04633] Updated weights for policy 0, policy_version 410 (0.0018) [2024-11-18 09:50:55,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3693.4). Total num frames: 1679360. Throughput: 0: 919.5. Samples: 419094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-18 09:50:55,338][01550] Avg episode reward: [(0, '5.842')] [2024-11-18 09:50:55,341][04620] Saving new best policy, reward=5.842! [2024-11-18 09:51:00,336][01550] Fps is (10 sec: 3279.6, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 1699840. Throughput: 0: 881.4. Samples: 424012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:51:00,339][01550] Avg episode reward: [(0, '6.266')] [2024-11-18 09:51:00,351][04620] Saving new best policy, reward=6.266! [2024-11-18 09:51:04,344][04633] Updated weights for policy 0, policy_version 420 (0.0029) [2024-11-18 09:51:05,336][01550] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1724416. Throughput: 0: 928.2. Samples: 430556. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:51:05,338][01550] Avg episode reward: [(0, '6.947')] [2024-11-18 09:51:05,342][04620] Saving new best policy, reward=6.947! [2024-11-18 09:51:10,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 1736704. Throughput: 0: 944.2. Samples: 433468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:51:10,342][01550] Avg episode reward: [(0, '6.881')] [2024-11-18 09:51:15,336][01550] Fps is (10 sec: 2867.2, 60 sec: 3550.1, 300 sec: 3693.3). Total num frames: 1753088. Throughput: 0: 886.4. Samples: 437504. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:51:15,341][01550] Avg episode reward: [(0, '7.383')] [2024-11-18 09:51:15,347][04620] Saving new best policy, reward=7.383! [2024-11-18 09:51:16,539][04633] Updated weights for policy 0, policy_version 430 (0.0019) [2024-11-18 09:51:20,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3693.3). Total num frames: 1777664. Throughput: 0: 909.0. Samples: 444078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:51:20,342][01550] Avg episode reward: [(0, '7.783')] [2024-11-18 09:51:20,350][04620] Saving new best policy, reward=7.783! [2024-11-18 09:51:25,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1794048. Throughput: 0: 938.0. Samples: 447328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:51:25,342][01550] Avg episode reward: [(0, '8.371')] [2024-11-18 09:51:25,344][04620] Saving new best policy, reward=8.371! [2024-11-18 09:51:27,564][04633] Updated weights for policy 0, policy_version 440 (0.0016) [2024-11-18 09:51:30,336][01550] Fps is (10 sec: 2867.0, 60 sec: 3549.8, 300 sec: 3665.6). Total num frames: 1806336. Throughput: 0: 912.6. Samples: 451810. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:51:30,340][01550] Avg episode reward: [(0, '8.338')] [2024-11-18 09:51:35,335][01550] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1830912. Throughput: 0: 904.8. Samples: 457828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:51:35,342][01550] Avg episode reward: [(0, '8.878')] [2024-11-18 09:51:35,345][04620] Saving new best policy, reward=8.878! [2024-11-18 09:51:37,551][04633] Updated weights for policy 0, policy_version 450 (0.0013) [2024-11-18 09:51:40,335][01550] Fps is (10 sec: 4915.5, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 1855488. Throughput: 0: 935.3. Samples: 461182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:51:40,343][01550] Avg episode reward: [(0, '8.463')] [2024-11-18 09:51:45,339][01550] Fps is (10 sec: 3685.3, 60 sec: 3686.2, 300 sec: 3679.4). Total num frames: 1867776. Throughput: 0: 945.4. Samples: 466556. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:51:45,346][01550] Avg episode reward: [(0, '8.583')] [2024-11-18 09:51:49,392][04633] Updated weights for policy 0, policy_version 460 (0.0020) [2024-11-18 09:51:50,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3686.9, 300 sec: 3693.3). Total num frames: 1888256. Throughput: 0: 909.9. Samples: 471502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:51:50,338][01550] Avg episode reward: [(0, '8.822')] [2024-11-18 09:51:55,336][01550] Fps is (10 sec: 4097.3, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 1908736. Throughput: 0: 919.7. Samples: 474854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:51:55,338][01550] Avg episode reward: [(0, '9.218')] [2024-11-18 09:51:55,343][04620] Saving new best policy, reward=9.218! [2024-11-18 09:51:59,486][04633] Updated weights for policy 0, policy_version 470 (0.0029) [2024-11-18 09:52:00,336][01550] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3679.5). Total num frames: 1925120. Throughput: 0: 969.9. Samples: 481150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:52:00,343][01550] Avg episode reward: [(0, '9.658')] [2024-11-18 09:52:00,359][04620] Saving new best policy, reward=9.658! [2024-11-18 09:52:05,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.4). Total num frames: 1941504. Throughput: 0: 915.0. Samples: 485254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:52:05,342][01550] Avg episode reward: [(0, '9.236')] [2024-11-18 09:52:10,312][04633] Updated weights for policy 0, policy_version 480 (0.0016) [2024-11-18 09:52:10,336][01550] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 1966080. Throughput: 0: 918.4. Samples: 488654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:52:10,338][01550] Avg episode reward: [(0, '10.238')] [2024-11-18 09:52:10,350][04620] Saving new best policy, reward=10.238! [2024-11-18 09:52:15,335][01550] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 1986560. Throughput: 0: 968.4. Samples: 495388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:52:15,338][01550] Avg episode reward: [(0, '10.125')] [2024-11-18 09:52:20,336][01550] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3693.4). Total num frames: 1998848. Throughput: 0: 935.3. Samples: 499918. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:52:20,339][01550] Avg episode reward: [(0, '10.715')] [2024-11-18 09:52:20,358][04620] Saving new best policy, reward=10.715! [2024-11-18 09:52:22,319][04633] Updated weights for policy 0, policy_version 490 (0.0040) [2024-11-18 09:52:25,337][01550] Fps is (10 sec: 3276.2, 60 sec: 3754.6, 300 sec: 3707.2). Total num frames: 2019328. Throughput: 0: 913.5. Samples: 502292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:52:25,341][01550] Avg episode reward: [(0, '11.405')] [2024-11-18 09:52:25,350][04620] Saving new best policy, reward=11.405! [2024-11-18 09:52:30,336][01550] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3693.3). Total num frames: 2039808. Throughput: 0: 945.3. Samples: 509090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:52:30,340][01550] Avg episode reward: [(0, '11.265')] [2024-11-18 09:52:31,556][04633] Updated weights for policy 0, policy_version 500 (0.0018) [2024-11-18 09:52:35,336][01550] Fps is (10 sec: 3687.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2056192. Throughput: 0: 958.8. Samples: 514648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:52:35,338][01550] Avg episode reward: [(0, '11.484')] [2024-11-18 09:52:35,340][04620] Saving new best policy, reward=11.484! [2024-11-18 09:52:40,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2072576. Throughput: 0: 930.0. Samples: 516704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:52:40,338][01550] Avg episode reward: [(0, '11.204')] [2024-11-18 09:52:40,348][04620] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000506_2072576.pth... [2024-11-18 09:52:40,480][04620] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000291_1191936.pth [2024-11-18 09:52:43,406][04633] Updated weights for policy 0, policy_version 510 (0.0018) [2024-11-18 09:52:45,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3707.2). Total num frames: 2097152. Throughput: 0: 927.2. Samples: 522874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:52:45,340][01550] Avg episode reward: [(0, '11.005')] [2024-11-18 09:52:50,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2113536. Throughput: 0: 978.8. Samples: 529298. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:52:50,339][01550] Avg episode reward: [(0, '11.030')] [2024-11-18 09:52:54,711][04633] Updated weights for policy 0, policy_version 520 (0.0012) [2024-11-18 09:52:55,336][01550] Fps is (10 sec: 3276.6, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2129920. Throughput: 0: 949.8. Samples: 531394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-18 09:52:55,339][01550] Avg episode reward: [(0, '10.963')] [2024-11-18 09:53:00,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2150400. Throughput: 0: 916.0. Samples: 536608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:53:00,337][01550] Avg episode reward: [(0, '11.278')] [2024-11-18 09:53:04,161][04633] Updated weights for policy 0, policy_version 530 (0.0014) [2024-11-18 09:53:05,336][01550] Fps is (10 sec: 4505.9, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 2174976. Throughput: 0: 966.1. Samples: 543390. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:53:05,343][01550] Avg episode reward: [(0, '11.304')] [2024-11-18 09:53:10,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2187264. Throughput: 0: 973.3. Samples: 546088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:53:10,339][01550] Avg episode reward: [(0, '11.754')] [2024-11-18 09:53:10,355][04620] Saving new best policy, reward=11.754! [2024-11-18 09:53:15,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2207744. Throughput: 0: 915.1. Samples: 550268. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:53:15,343][01550] Avg episode reward: [(0, '12.181')] [2024-11-18 09:53:15,346][04620] Saving new best policy, reward=12.181! [2024-11-18 09:53:16,218][04633] Updated weights for policy 0, policy_version 540 (0.0018) [2024-11-18 09:53:20,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 2228224. Throughput: 0: 937.1. Samples: 556816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:53:20,339][01550] Avg episode reward: [(0, '13.650')] [2024-11-18 09:53:20,362][04620] Saving new best policy, reward=13.650! [2024-11-18 09:53:25,336][01550] Fps is (10 sec: 3686.1, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2244608. Throughput: 0: 964.8. Samples: 560122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:53:25,342][01550] Avg episode reward: [(0, '14.503')] [2024-11-18 09:53:25,347][04620] Saving new best policy, reward=14.503! [2024-11-18 09:53:27,406][04633] Updated weights for policy 0, policy_version 550 (0.0020) [2024-11-18 09:53:30,338][01550] Fps is (10 sec: 2866.4, 60 sec: 3618.0, 300 sec: 3679.5). Total num frames: 2256896. Throughput: 0: 926.3. Samples: 564560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:53:30,341][01550] Avg episode reward: [(0, '15.854')] [2024-11-18 09:53:30,383][04620] Saving new best policy, reward=15.854! [2024-11-18 09:53:35,336][01550] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2281472. Throughput: 0: 913.5. Samples: 570406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:53:35,341][01550] Avg episode reward: [(0, '17.202')] [2024-11-18 09:53:35,344][04620] Saving new best policy, reward=17.202! [2024-11-18 09:53:37,761][04633] Updated weights for policy 0, policy_version 560 (0.0013) [2024-11-18 09:53:40,336][01550] Fps is (10 sec: 4506.8, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 2301952. Throughput: 0: 941.1. Samples: 573742. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:53:40,338][01550] Avg episode reward: [(0, '16.175')] [2024-11-18 09:53:45,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.4). Total num frames: 2318336. Throughput: 0: 945.3. Samples: 579148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:53:45,340][01550] Avg episode reward: [(0, '15.169')] [2024-11-18 09:53:49,416][04633] Updated weights for policy 0, policy_version 570 (0.0018) [2024-11-18 09:53:50,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2338816. Throughput: 0: 905.6. Samples: 584144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:53:50,338][01550] Avg episode reward: [(0, '15.226')] [2024-11-18 09:53:55,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3707.2). Total num frames: 2359296. Throughput: 0: 920.1. Samples: 587492. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:53:55,342][01550] Avg episode reward: [(0, '13.162')] [2024-11-18 09:53:59,319][04633] Updated weights for policy 0, policy_version 580 (0.0017) [2024-11-18 09:54:00,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2375680. Throughput: 0: 967.4. Samples: 593802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:54:00,341][01550] Avg episode reward: [(0, '13.804')] [2024-11-18 09:54:05,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 2392064. Throughput: 0: 917.6. Samples: 598108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:54:05,338][01550] Avg episode reward: [(0, '14.407')] [2024-11-18 09:54:10,141][04633] Updated weights for policy 0, policy_version 590 (0.0015) [2024-11-18 09:54:10,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2416640. Throughput: 0: 918.6. Samples: 601460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:54:10,343][01550] Avg episode reward: [(0, '14.877')] [2024-11-18 09:54:15,336][01550] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2437120. Throughput: 0: 971.1. Samples: 608256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:54:15,341][01550] Avg episode reward: [(0, '15.853')] [2024-11-18 09:54:20,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2449408. Throughput: 0: 940.9. Samples: 612746. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:54:20,337][01550] Avg episode reward: [(0, '16.856')] [2024-11-18 09:54:22,151][04633] Updated weights for policy 0, policy_version 600 (0.0020) [2024-11-18 09:54:25,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2469888. Throughput: 0: 921.5. Samples: 615208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:54:25,343][01550] Avg episode reward: [(0, '16.894')] [2024-11-18 09:54:30,336][01550] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 3735.0). Total num frames: 2494464. Throughput: 0: 952.0. Samples: 621986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:54:30,341][01550] Avg episode reward: [(0, '16.451')] [2024-11-18 09:54:31,203][04633] Updated weights for policy 0, policy_version 610 (0.0025) [2024-11-18 09:54:35,341][01550] Fps is (10 sec: 3684.5, 60 sec: 3754.4, 300 sec: 3707.2). Total num frames: 2506752. Throughput: 0: 964.8. Samples: 627566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:54:35,343][01550] Avg episode reward: [(0, '16.775')] [2024-11-18 09:54:40,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2527232. Throughput: 0: 935.6. Samples: 629596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:54:40,343][01550] Avg episode reward: [(0, '16.687')] [2024-11-18 09:54:40,357][04620] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000617_2527232.pth... [2024-11-18 09:54:40,481][04620] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000397_1626112.pth [2024-11-18 09:54:43,075][04633] Updated weights for policy 0, policy_version 620 (0.0031) [2024-11-18 09:54:45,336][01550] Fps is (10 sec: 4098.1, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2547712. Throughput: 0: 935.1. Samples: 635882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:54:45,338][01550] Avg episode reward: [(0, '15.877')] [2024-11-18 09:54:50,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2568192. Throughput: 0: 984.7. Samples: 642418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:54:50,342][01550] Avg episode reward: [(0, '15.855')] [2024-11-18 09:54:54,247][04633] Updated weights for policy 0, policy_version 630 (0.0013) [2024-11-18 09:54:55,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2580480. Throughput: 0: 953.0. Samples: 644344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-18 09:54:55,338][01550] Avg episode reward: [(0, '16.100')] [2024-11-18 09:55:00,335][01550] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2605056. Throughput: 0: 921.6. Samples: 649726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:55:00,339][01550] Avg episode reward: [(0, '17.042')] [2024-11-18 09:55:03,910][04633] Updated weights for policy 0, policy_version 640 (0.0015) [2024-11-18 09:55:05,336][01550] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2625536. Throughput: 0: 970.7. Samples: 656426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:55:05,343][01550] Avg episode reward: [(0, '17.552')] [2024-11-18 09:55:05,346][04620] Saving new best policy, reward=17.552! [2024-11-18 09:55:10,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2641920. Throughput: 0: 975.4. Samples: 659102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:55:10,342][01550] Avg episode reward: [(0, '18.869')] [2024-11-18 09:55:10,357][04620] Saving new best policy, reward=18.869! [2024-11-18 09:55:15,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 2658304. Throughput: 0: 921.5. Samples: 663454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:55:15,342][01550] Avg episode reward: [(0, '18.464')] [2024-11-18 09:55:15,757][04633] Updated weights for policy 0, policy_version 650 (0.0015) [2024-11-18 09:55:20,336][01550] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2678784. Throughput: 0: 948.6. Samples: 670250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:55:20,339][01550] Avg episode reward: [(0, '18.139')] [2024-11-18 09:55:25,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2699264. Throughput: 0: 975.6. Samples: 673496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:55:25,340][01550] Avg episode reward: [(0, '19.270')] [2024-11-18 09:55:25,348][04620] Saving new best policy, reward=19.270! [2024-11-18 09:55:26,150][04633] Updated weights for policy 0, policy_version 660 (0.0029) [2024-11-18 09:55:30,336][01550] Fps is (10 sec: 3277.0, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 2711552. Throughput: 0: 933.3. Samples: 677882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:55:30,338][01550] Avg episode reward: [(0, '19.241')] [2024-11-18 09:55:35,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3762.8). Total num frames: 2736128. Throughput: 0: 923.5. Samples: 683976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:55:35,341][01550] Avg episode reward: [(0, '18.709')] [2024-11-18 09:55:36,817][04633] Updated weights for policy 0, policy_version 670 (0.0017) [2024-11-18 09:55:40,336][01550] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2756608. Throughput: 0: 955.3. Samples: 687334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:55:40,341][01550] Avg episode reward: [(0, '18.493')] [2024-11-18 09:55:45,336][01550] Fps is (10 sec: 3686.1, 60 sec: 3754.6, 300 sec: 3749.0). Total num frames: 2772992. Throughput: 0: 954.8. Samples: 692692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:55:45,339][01550] Avg episode reward: [(0, '18.926')] [2024-11-18 09:55:48,454][04633] Updated weights for policy 0, policy_version 680 (0.0019) [2024-11-18 09:55:50,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2793472. Throughput: 0: 922.9. Samples: 697956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:55:50,343][01550] Avg episode reward: [(0, '18.309')] [2024-11-18 09:55:55,336][01550] Fps is (10 sec: 4096.3, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2813952. Throughput: 0: 935.5. Samples: 701200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:55:55,341][01550] Avg episode reward: [(0, '19.633')] [2024-11-18 09:55:55,344][04620] Saving new best policy, reward=19.633! [2024-11-18 09:55:58,200][04633] Updated weights for policy 0, policy_version 690 (0.0014) [2024-11-18 09:56:00,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2830336. Throughput: 0: 973.9. Samples: 707278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:56:00,340][01550] Avg episode reward: [(0, '20.145')] [2024-11-18 09:56:00,350][04620] Saving new best policy, reward=20.145! [2024-11-18 09:56:05,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2846720. Throughput: 0: 917.9. Samples: 711556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:56:05,338][01550] Avg episode reward: [(0, '20.298')] [2024-11-18 09:56:05,345][04620] Saving new best policy, reward=20.298! [2024-11-18 09:56:09,641][04633] Updated weights for policy 0, policy_version 700 (0.0012) [2024-11-18 09:56:10,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 2867200. Throughput: 0: 916.8. Samples: 714750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:56:10,340][01550] Avg episode reward: [(0, '21.031')] [2024-11-18 09:56:10,350][04620] Saving new best policy, reward=21.031! [2024-11-18 09:56:15,336][01550] Fps is (10 sec: 4095.7, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2887680. Throughput: 0: 966.7. Samples: 721386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:56:15,338][01550] Avg episode reward: [(0, '21.230')] [2024-11-18 09:56:15,340][04620] Saving new best policy, reward=21.230! [2024-11-18 09:56:20,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 2899968. Throughput: 0: 923.6. Samples: 725536. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:56:20,339][01550] Avg episode reward: [(0, '21.723')] [2024-11-18 09:56:20,351][04620] Saving new best policy, reward=21.723! [2024-11-18 09:56:22,079][04633] Updated weights for policy 0, policy_version 710 (0.0026) [2024-11-18 09:56:25,336][01550] Fps is (10 sec: 3277.0, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2920448. Throughput: 0: 902.2. Samples: 727932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:56:25,338][01550] Avg episode reward: [(0, '21.150')] [2024-11-18 09:56:30,336][01550] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2945024. Throughput: 0: 936.2. Samples: 734818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:56:30,338][01550] Avg episode reward: [(0, '21.491')] [2024-11-18 09:56:30,987][04633] Updated weights for policy 0, policy_version 720 (0.0015) [2024-11-18 09:56:35,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2961408. Throughput: 0: 939.3. Samples: 740224. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:56:35,342][01550] Avg episode reward: [(0, '21.077')] [2024-11-18 09:56:40,337][01550] Fps is (10 sec: 3276.3, 60 sec: 3686.3, 300 sec: 3762.8). Total num frames: 2977792. Throughput: 0: 913.6. Samples: 742314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:56:40,346][01550] Avg episode reward: [(0, '21.519')] [2024-11-18 09:56:40,360][04620] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000727_2977792.pth... [2024-11-18 09:56:40,496][04620] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000506_2072576.pth [2024-11-18 09:56:42,835][04633] Updated weights for policy 0, policy_version 730 (0.0025) [2024-11-18 09:56:45,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2998272. Throughput: 0: 921.4. Samples: 748742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:56:45,340][01550] Avg episode reward: [(0, '20.209')] [2024-11-18 09:56:50,336][01550] Fps is (10 sec: 4096.5, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3018752. Throughput: 0: 967.2. Samples: 755080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:56:50,343][01550] Avg episode reward: [(0, '20.642')] [2024-11-18 09:56:54,144][04633] Updated weights for policy 0, policy_version 740 (0.0016) [2024-11-18 09:56:55,336][01550] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3031040. Throughput: 0: 941.6. Samples: 757124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:56:55,341][01550] Avg episode reward: [(0, '20.516')] [2024-11-18 09:57:00,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3055616. Throughput: 0: 914.1. Samples: 762518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:57:00,342][01550] Avg episode reward: [(0, '20.206')] [2024-11-18 09:57:03,667][04633] Updated weights for policy 0, policy_version 750 (0.0013) [2024-11-18 09:57:05,336][01550] Fps is (10 sec: 4505.8, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3076096. Throughput: 0: 974.5. Samples: 769388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:57:05,338][01550] Avg episode reward: [(0, '19.454')] [2024-11-18 09:57:10,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3092480. Throughput: 0: 980.6. Samples: 772060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:57:10,338][01550] Avg episode reward: [(0, '18.977')] [2024-11-18 09:57:15,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3108864. Throughput: 0: 925.9. Samples: 776484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:57:15,339][01550] Avg episode reward: [(0, '18.447')] [2024-11-18 09:57:15,355][04633] Updated weights for policy 0, policy_version 760 (0.0023) [2024-11-18 09:57:20,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 3133440. Throughput: 0: 958.5. Samples: 783356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:57:20,342][01550] Avg episode reward: [(0, '19.444')] [2024-11-18 09:57:25,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3149824. Throughput: 0: 983.3. Samples: 786562. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:57:25,339][01550] Avg episode reward: [(0, '19.175')] [2024-11-18 09:57:25,753][04633] Updated weights for policy 0, policy_version 770 (0.0021) [2024-11-18 09:57:30,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3166208. Throughput: 0: 935.6. Samples: 790846. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:57:30,341][01550] Avg episode reward: [(0, '20.862')] [2024-11-18 09:57:35,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3186688. Throughput: 0: 935.0. Samples: 797154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:57:35,342][01550] Avg episode reward: [(0, '20.156')] [2024-11-18 09:57:36,360][04633] Updated weights for policy 0, policy_version 780 (0.0015) [2024-11-18 09:57:40,336][01550] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3776.6). Total num frames: 3211264. Throughput: 0: 965.8. Samples: 800584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:57:40,338][01550] Avg episode reward: [(0, '19.965')] [2024-11-18 09:57:45,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3223552. Throughput: 0: 960.4. Samples: 805738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:57:45,338][01550] Avg episode reward: [(0, '20.127')] [2024-11-18 09:57:48,221][04633] Updated weights for policy 0, policy_version 790 (0.0018) [2024-11-18 09:57:50,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3244032. Throughput: 0: 921.8. Samples: 810868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:57:50,340][01550] Avg episode reward: [(0, '20.891')] [2024-11-18 09:57:55,336][01550] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 3264512. Throughput: 0: 935.3. Samples: 814148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:57:55,338][01550] Avg episode reward: [(0, '19.847')] [2024-11-18 09:57:57,739][04633] Updated weights for policy 0, policy_version 800 (0.0014) [2024-11-18 09:58:00,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3280896. Throughput: 0: 973.9. Samples: 820308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:58:00,338][01550] Avg episode reward: [(0, '21.128')] [2024-11-18 09:58:05,336][01550] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3297280. Throughput: 0: 914.8. Samples: 824520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-18 09:58:05,342][01550] Avg episode reward: [(0, '22.314')] [2024-11-18 09:58:05,344][04620] Saving new best policy, reward=22.314! [2024-11-18 09:58:09,396][04633] Updated weights for policy 0, policy_version 810 (0.0022) [2024-11-18 09:58:10,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3321856. Throughput: 0: 915.4. Samples: 827754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:58:10,341][01550] Avg episode reward: [(0, '21.687')] [2024-11-18 09:58:15,336][01550] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 3342336. Throughput: 0: 970.7. Samples: 834526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:58:15,341][01550] Avg episode reward: [(0, '21.950')] [2024-11-18 09:58:20,337][01550] Fps is (10 sec: 3276.3, 60 sec: 3686.3, 300 sec: 3762.8). Total num frames: 3354624. Throughput: 0: 926.2. Samples: 838834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:58:20,343][01550] Avg episode reward: [(0, '21.636')] [2024-11-18 09:58:21,217][04633] Updated weights for policy 0, policy_version 820 (0.0024) [2024-11-18 09:58:25,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 3375104. Throughput: 0: 905.5. Samples: 841330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:58:25,338][01550] Avg episode reward: [(0, '20.703')] [2024-11-18 09:58:30,338][01550] Fps is (10 sec: 4096.6, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3395584. Throughput: 0: 936.3. Samples: 847872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:58:30,341][01550] Avg episode reward: [(0, '20.953')] [2024-11-18 09:58:30,735][04633] Updated weights for policy 0, policy_version 830 (0.0012) [2024-11-18 09:58:35,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3411968. Throughput: 0: 934.5. Samples: 852920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:58:35,341][01550] Avg episode reward: [(0, '21.579')] [2024-11-18 09:58:40,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 3428352. Throughput: 0: 904.7. Samples: 854858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:58:40,340][01550] Avg episode reward: [(0, '22.005')] [2024-11-18 09:58:40,349][04620] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000837_3428352.pth... [2024-11-18 09:58:40,505][04620] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000617_2527232.pth [2024-11-18 09:58:43,192][04633] Updated weights for policy 0, policy_version 840 (0.0028) [2024-11-18 09:58:45,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3448832. Throughput: 0: 902.9. Samples: 860940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:58:45,340][01550] Avg episode reward: [(0, '22.288')] [2024-11-18 09:58:50,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3465216. Throughput: 0: 946.8. Samples: 867126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:58:50,340][01550] Avg episode reward: [(0, '23.499')] [2024-11-18 09:58:50,351][04620] Saving new best policy, reward=23.499! [2024-11-18 09:58:55,281][04633] Updated weights for policy 0, policy_version 850 (0.0023) [2024-11-18 09:58:55,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3748.9). Total num frames: 3481600. Throughput: 0: 917.3. Samples: 869032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:58:55,341][01550] Avg episode reward: [(0, '23.644')] [2024-11-18 09:58:55,344][04620] Saving new best policy, reward=23.644! [2024-11-18 09:59:00,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3502080. Throughput: 0: 884.3. Samples: 874318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:59:00,338][01550] Avg episode reward: [(0, '21.923')] [2024-11-18 09:59:04,775][04633] Updated weights for policy 0, policy_version 860 (0.0012) [2024-11-18 09:59:05,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3522560. Throughput: 0: 934.1. Samples: 880866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:59:05,342][01550] Avg episode reward: [(0, '22.023')] [2024-11-18 09:59:10,337][01550] Fps is (10 sec: 3276.5, 60 sec: 3549.8, 300 sec: 3721.1). Total num frames: 3534848. Throughput: 0: 933.2. Samples: 883324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:59:10,343][01550] Avg episode reward: [(0, '21.364')] [2024-11-18 09:59:15,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3748.9). Total num frames: 3555328. Throughput: 0: 886.3. Samples: 887754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 09:59:15,344][01550] Avg episode reward: [(0, '19.683')] [2024-11-18 09:59:16,745][04633] Updated weights for policy 0, policy_version 870 (0.0023) [2024-11-18 09:59:20,336][01550] Fps is (10 sec: 4096.4, 60 sec: 3686.5, 300 sec: 3748.9). Total num frames: 3575808. Throughput: 0: 922.6. Samples: 894436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 09:59:20,343][01550] Avg episode reward: [(0, '19.726')] [2024-11-18 09:59:25,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3596288. Throughput: 0: 952.3. Samples: 897710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-18 09:59:25,343][01550] Avg episode reward: [(0, '21.323')] [2024-11-18 09:59:28,192][04633] Updated weights for policy 0, policy_version 880 (0.0019) [2024-11-18 09:59:30,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3735.1). Total num frames: 3608576. Throughput: 0: 906.6. Samples: 901736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 09:59:30,337][01550] Avg episode reward: [(0, '20.334')] [2024-11-18 09:59:35,336][01550] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3633152. Throughput: 0: 903.1. Samples: 907764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:59:35,343][01550] Avg episode reward: [(0, '21.051')] [2024-11-18 09:59:38,037][04633] Updated weights for policy 0, policy_version 890 (0.0014) [2024-11-18 09:59:40,336][01550] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3653632. Throughput: 0: 934.9. Samples: 911104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:59:40,338][01550] Avg episode reward: [(0, '22.023')] [2024-11-18 09:59:45,336][01550] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 3665920. Throughput: 0: 927.9. Samples: 916074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 09:59:45,342][01550] Avg episode reward: [(0, '22.418')] [2024-11-18 09:59:50,127][04633] Updated weights for policy 0, policy_version 900 (0.0020) [2024-11-18 09:59:50,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3686400. Throughput: 0: 898.5. Samples: 921298. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 09:59:50,341][01550] Avg episode reward: [(0, '21.604')] [2024-11-18 09:59:55,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3706880. Throughput: 0: 917.5. Samples: 924612. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 09:59:55,338][01550] Avg episode reward: [(0, '21.446')] [2024-11-18 10:00:00,337][01550] Fps is (10 sec: 3685.9, 60 sec: 3686.3, 300 sec: 3721.1). Total num frames: 3723264. Throughput: 0: 943.2. Samples: 930198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 10:00:00,342][01550] Avg episode reward: [(0, '21.514')] [2024-11-18 10:00:01,921][04633] Updated weights for policy 0, policy_version 910 (0.0020) [2024-11-18 10:00:05,336][01550] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 3735552. Throughput: 0: 888.4. Samples: 934414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 10:00:05,338][01550] Avg episode reward: [(0, '21.001')] [2024-11-18 10:00:10,336][01550] Fps is (10 sec: 3686.9, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3760128. Throughput: 0: 886.0. Samples: 937582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 10:00:10,338][01550] Avg episode reward: [(0, '21.351')] [2024-11-18 10:00:12,028][04633] Updated weights for policy 0, policy_version 920 (0.0015) [2024-11-18 10:00:15,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3776512. Throughput: 0: 939.6. Samples: 944020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 10:00:15,342][01550] Avg episode reward: [(0, '20.557')] [2024-11-18 10:00:20,339][01550] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 3792896. Throughput: 0: 897.2. Samples: 948140. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-18 10:00:20,342][01550] Avg episode reward: [(0, '22.299')] [2024-11-18 10:00:24,261][04633] Updated weights for policy 0, policy_version 930 (0.0020) [2024-11-18 10:00:25,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 3813376. Throughput: 0: 882.9. Samples: 950834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 10:00:25,344][01550] Avg episode reward: [(0, '22.342')] [2024-11-18 10:00:30,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3833856. Throughput: 0: 918.3. Samples: 957396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-18 10:00:30,339][01550] Avg episode reward: [(0, '22.929')] [2024-11-18 10:00:35,174][04633] Updated weights for policy 0, policy_version 940 (0.0015) [2024-11-18 10:00:35,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3707.2). Total num frames: 3850240. Throughput: 0: 912.8. Samples: 962374. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-18 10:00:35,337][01550] Avg episode reward: [(0, '23.225')] [2024-11-18 10:00:40,336][01550] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 3866624. Throughput: 0: 884.6. Samples: 964418. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-11-18 10:00:40,341][01550] Avg episode reward: [(0, '23.945')] [2024-11-18 10:00:40,351][04620] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000944_3866624.pth... [2024-11-18 10:00:40,482][04620] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000727_2977792.pth [2024-11-18 10:00:40,499][04620] Saving new best policy, reward=23.945! [2024-11-18 10:00:45,336][01550] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3887104. Throughput: 0: 900.7. Samples: 970728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 10:00:45,338][01550] Avg episode reward: [(0, '24.404')] [2024-11-18 10:00:45,346][04620] Saving new best policy, reward=24.404! [2024-11-18 10:00:45,741][04633] Updated weights for policy 0, policy_version 950 (0.0026) [2024-11-18 10:00:50,336][01550] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3907584. Throughput: 0: 938.7. Samples: 976656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 10:00:50,342][01550] Avg episode reward: [(0, '23.864')] [2024-11-18 10:00:55,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 3919872. Throughput: 0: 913.1. Samples: 978670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-18 10:00:55,339][01550] Avg episode reward: [(0, '24.861')] [2024-11-18 10:00:55,343][04620] Saving new best policy, reward=24.861! [2024-11-18 10:00:58,096][04633] Updated weights for policy 0, policy_version 960 (0.0019) [2024-11-18 10:01:00,336][01550] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3707.2). Total num frames: 3940352. Throughput: 0: 886.8. Samples: 983928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 10:01:00,343][01550] Avg episode reward: [(0, '23.465')] [2024-11-18 10:01:05,337][01550] Fps is (10 sec: 4095.6, 60 sec: 3754.6, 300 sec: 3707.2). Total num frames: 3960832. Throughput: 0: 941.7. Samples: 990518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-18 10:01:05,341][01550] Avg episode reward: [(0, '23.087')] [2024-11-18 10:01:09,108][04633] Updated weights for policy 0, policy_version 970 (0.0024) [2024-11-18 10:01:10,341][01550] Fps is (10 sec: 3274.9, 60 sec: 3549.5, 300 sec: 3679.4). Total num frames: 3973120. Throughput: 0: 932.1. Samples: 992782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-18 10:01:10,344][01550] Avg episode reward: [(0, '22.562')] [2024-11-18 10:01:15,336][01550] Fps is (10 sec: 3277.1, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 3993600. Throughput: 0: 889.1. Samples: 997406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-18 10:01:15,342][01550] Avg episode reward: [(0, '21.848')] [2024-11-18 10:01:17,647][04620] Stopping Batcher_0... [2024-11-18 10:01:17,648][04620] Loop batcher_evt_loop terminating... [2024-11-18 10:01:17,649][04620] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-18 10:01:17,647][01550] Component Batcher_0 stopped! [2024-11-18 10:01:17,702][04633] Weights refcount: 2 0 [2024-11-18 10:01:17,705][04633] Stopping InferenceWorker_p0-w0... [2024-11-18 10:01:17,705][01550] Component InferenceWorker_p0-w0 stopped! [2024-11-18 10:01:17,709][04633] Loop inference_proc0-0_evt_loop terminating... [2024-11-18 10:01:17,775][04620] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000837_3428352.pth [2024-11-18 10:01:17,793][04620] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-18 10:01:17,972][01550] Component LearnerWorker_p0 stopped! [2024-11-18 10:01:17,971][04620] Stopping LearnerWorker_p0... [2024-11-18 10:01:17,983][04620] Loop learner_proc0_evt_loop terminating... [2024-11-18 10:01:18,062][04636] Stopping RolloutWorker_w2... [2024-11-18 10:01:18,064][04636] Loop rollout_proc2_evt_loop terminating... [2024-11-18 10:01:18,063][01550] Component RolloutWorker_w2 stopped! [2024-11-18 10:01:18,073][04634] Stopping RolloutWorker_w0... [2024-11-18 10:01:18,073][01550] Component RolloutWorker_w0 stopped! [2024-11-18 10:01:18,077][04634] Loop rollout_proc0_evt_loop terminating... [2024-11-18 10:01:18,081][04640] Stopping RolloutWorker_w6... [2024-11-18 10:01:18,081][01550] Component RolloutWorker_w6 stopped! [2024-11-18 10:01:18,087][04639] Stopping RolloutWorker_w4... [2024-11-18 10:01:18,087][01550] Component RolloutWorker_w4 stopped! [2024-11-18 10:01:18,084][04640] Loop rollout_proc6_evt_loop terminating... [2024-11-18 10:01:18,088][04639] Loop rollout_proc4_evt_loop terminating... [2024-11-18 10:01:18,159][04635] Stopping RolloutWorker_w1... [2024-11-18 10:01:18,158][01550] Component RolloutWorker_w1 stopped! [2024-11-18 10:01:18,168][04635] Loop rollout_proc1_evt_loop terminating... [2024-11-18 10:01:18,184][01550] Component RolloutWorker_w7 stopped! [2024-11-18 10:01:18,184][04641] Stopping RolloutWorker_w7... [2024-11-18 10:01:18,196][04641] Loop rollout_proc7_evt_loop terminating... [2024-11-18 10:01:18,228][04637] Stopping RolloutWorker_w3... [2024-11-18 10:01:18,228][04637] Loop rollout_proc3_evt_loop terminating... [2024-11-18 10:01:18,227][01550] Component RolloutWorker_w3 stopped! [2024-11-18 10:01:18,252][04638] Stopping RolloutWorker_w5... [2024-11-18 10:01:18,253][04638] Loop rollout_proc5_evt_loop terminating... [2024-11-18 10:01:18,252][01550] Component RolloutWorker_w5 stopped! [2024-11-18 10:01:18,255][01550] Waiting for process learner_proc0 to stop... [2024-11-18 10:01:19,592][01550] Waiting for process inference_proc0-0 to join... [2024-11-18 10:01:19,809][01550] Waiting for process rollout_proc0 to join... [2024-11-18 10:01:21,160][01550] Waiting for process rollout_proc1 to join... [2024-11-18 10:01:21,167][01550] Waiting for process rollout_proc2 to join... [2024-11-18 10:01:21,171][01550] Waiting for process rollout_proc3 to join... [2024-11-18 10:01:21,173][01550] Waiting for process rollout_proc4 to join... [2024-11-18 10:01:21,176][01550] Waiting for process rollout_proc5 to join... [2024-11-18 10:01:21,179][01550] Waiting for process rollout_proc6 to join... [2024-11-18 10:01:21,182][01550] Waiting for process rollout_proc7 to join... [2024-11-18 10:01:21,184][01550] Batcher 0 profile tree view: batching: 26.4614, releasing_batches: 0.0265 [2024-11-18 10:01:21,186][01550] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0059 wait_policy_total: 482.7375 update_model: 7.4400 weight_update: 0.0023 one_step: 0.0025 handle_policy_step: 560.9062 deserialize: 15.2493, stack: 2.9520, obs_to_device_normalize: 115.5646, forward: 282.1535, send_messages: 28.2654 prepare_outputs: 88.2760 to_cpu: 54.8426 [2024-11-18 10:01:21,187][01550] Learner 0 profile tree view: misc: 0.0048, prepare_batch: 15.4161 train: 74.6290 epoch_init: 0.0059, minibatch_init: 0.0137, losses_postprocess: 0.5614, kl_divergence: 0.6066, after_optimizer: 33.7343 calculate_losses: 25.0321 losses_init: 0.0036, forward_head: 1.7016, bptt_initial: 16.0953, tail: 1.1221, advantages_returns: 0.2990, losses: 3.0513 bptt: 2.3942 bptt_forward_core: 2.2775 update: 14.0420 clip: 1.5157 [2024-11-18 10:01:21,189][01550] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3853, enqueue_policy_requests: 120.6240, env_step: 845.2245, overhead: 14.1168, complete_rollouts: 7.9993 save_policy_outputs: 25.2445 split_output_tensors: 8.3152 [2024-11-18 10:01:21,190][01550] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2875, enqueue_policy_requests: 122.6786, env_step: 844.5417, overhead: 13.9220, complete_rollouts: 6.5416 save_policy_outputs: 25.4384 split_output_tensors: 8.8300 [2024-11-18 10:01:21,192][01550] Loop Runner_EvtLoop terminating... [2024-11-18 10:01:21,194][01550] Runner profile tree view: main_loop: 1117.2843 [2024-11-18 10:01:21,195][01550] Collected {0: 4005888}, FPS: 3585.4 [2024-11-18 10:12:19,845][01550] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-18 10:12:19,846][01550] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-18 10:12:19,848][01550] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-18 10:12:19,850][01550] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-18 10:12:19,852][01550] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-18 10:12:19,854][01550] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-18 10:12:19,856][01550] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-18 10:12:19,857][01550] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-18 10:12:19,859][01550] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-18 10:12:19,860][01550] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-18 10:12:19,861][01550] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-18 10:12:19,862][01550] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-18 10:12:19,863][01550] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-18 10:12:19,864][01550] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-18 10:12:19,871][01550] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-18 10:12:19,886][01550] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-18 10:12:19,888][01550] RunningMeanStd input shape: (3, 72, 128) [2024-11-18 10:12:19,892][01550] RunningMeanStd input shape: (1,) [2024-11-18 10:12:19,911][01550] ConvEncoder: input_channels=3 [2024-11-18 10:12:20,036][01550] Conv encoder output size: 512 [2024-11-18 10:12:20,040][01550] Policy head output size: 512 [2024-11-18 10:12:21,778][01550] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-18 10:12:22,822][01550] Num frames 100... [2024-11-18 10:12:22,982][01550] Num frames 200... [2024-11-18 10:12:23,149][01550] Num frames 300... [2024-11-18 10:12:23,309][01550] Num frames 400... [2024-11-18 10:12:23,452][01550] Avg episode rewards: #0: 7.510, true rewards: #0: 4.510 [2024-11-18 10:12:23,454][01550] Avg episode reward: 7.510, avg true_objective: 4.510 [2024-11-18 10:12:23,540][01550] Num frames 500... [2024-11-18 10:12:23,702][01550] Num frames 600... [2024-11-18 10:12:23,882][01550] Num frames 700... [2024-11-18 10:12:24,051][01550] Num frames 800... [2024-11-18 10:12:24,219][01550] Num frames 900... [2024-11-18 10:12:24,394][01550] Num frames 1000... [2024-11-18 10:12:24,575][01550] Avg episode rewards: #0: 9.955, true rewards: #0: 5.455 [2024-11-18 10:12:24,576][01550] Avg episode reward: 9.955, avg true_objective: 5.455 [2024-11-18 10:12:24,592][01550] Num frames 1100... [2024-11-18 10:12:24,709][01550] Num frames 1200... [2024-11-18 10:12:24,833][01550] Num frames 1300... [2024-11-18 10:12:24,960][01550] Num frames 1400... [2024-11-18 10:12:25,071][01550] Avg episode rewards: #0: 8.493, true rewards: #0: 4.827 [2024-11-18 10:12:25,073][01550] Avg episode reward: 8.493, avg true_objective: 4.827 [2024-11-18 10:12:25,137][01550] Num frames 1500... [2024-11-18 10:12:25,264][01550] Num frames 1600... [2024-11-18 10:12:25,380][01550] Num frames 1700... [2024-11-18 10:12:25,503][01550] Num frames 1800... [2024-11-18 10:12:25,628][01550] Num frames 1900... [2024-11-18 10:12:25,755][01550] Avg episode rewards: #0: 9.150, true rewards: #0: 4.900 [2024-11-18 10:12:25,757][01550] Avg episode reward: 9.150, avg true_objective: 4.900 [2024-11-18 10:12:25,807][01550] Num frames 2000... [2024-11-18 10:12:25,933][01550] Num frames 2100... [2024-11-18 10:12:26,048][01550] Num frames 2200... [2024-11-18 10:12:26,192][01550] Num frames 2300... [2024-11-18 10:12:26,314][01550] Num frames 2400... [2024-11-18 10:12:26,432][01550] Num frames 2500... [2024-11-18 10:12:26,548][01550] Num frames 2600... [2024-11-18 10:12:26,663][01550] Num frames 2700... [2024-11-18 10:12:26,779][01550] Num frames 2800... [2024-11-18 10:12:26,903][01550] Num frames 2900... [2024-11-18 10:12:26,983][01550] Avg episode rewards: #0: 11.640, true rewards: #0: 5.840 [2024-11-18 10:12:26,985][01550] Avg episode reward: 11.640, avg true_objective: 5.840 [2024-11-18 10:12:27,083][01550] Num frames 3000... [2024-11-18 10:12:27,200][01550] Num frames 3100... [2024-11-18 10:12:27,326][01550] Num frames 3200... [2024-11-18 10:12:27,446][01550] Num frames 3300... [2024-11-18 10:12:27,561][01550] Num frames 3400... [2024-11-18 10:12:27,677][01550] Num frames 3500... [2024-11-18 10:12:27,800][01550] Num frames 3600... [2024-11-18 10:12:27,926][01550] Avg episode rewards: #0: 11.760, true rewards: #0: 6.093 [2024-11-18 10:12:27,929][01550] Avg episode reward: 11.760, avg true_objective: 6.093 [2024-11-18 10:12:27,982][01550] Num frames 3700... [2024-11-18 10:12:28,100][01550] Num frames 3800... [2024-11-18 10:12:28,222][01550] Num frames 3900... [2024-11-18 10:12:28,348][01550] Num frames 4000... [2024-11-18 10:12:28,465][01550] Num frames 4100... [2024-11-18 10:12:28,586][01550] Num frames 4200... [2024-11-18 10:12:28,705][01550] Num frames 4300... [2024-11-18 10:12:28,822][01550] Num frames 4400... [2024-11-18 10:12:28,952][01550] Num frames 4500... [2024-11-18 10:12:29,076][01550] Num frames 4600... [2024-11-18 10:12:29,194][01550] Num frames 4700... [2024-11-18 10:12:29,314][01550] Num frames 4800... [2024-11-18 10:12:29,442][01550] Num frames 4900... [2024-11-18 10:12:29,560][01550] Num frames 5000... [2024-11-18 10:12:29,677][01550] Num frames 5100... [2024-11-18 10:12:29,794][01550] Num frames 5200... [2024-11-18 10:12:29,920][01550] Num frames 5300... [2024-11-18 10:12:30,042][01550] Num frames 5400... [2024-11-18 10:12:30,161][01550] Num frames 5500... [2024-11-18 10:12:30,280][01550] Num frames 5600... [2024-11-18 10:12:30,408][01550] Num frames 5700... [2024-11-18 10:12:30,530][01550] Avg episode rewards: #0: 18.223, true rewards: #0: 8.223 [2024-11-18 10:12:30,531][01550] Avg episode reward: 18.223, avg true_objective: 8.223 [2024-11-18 10:12:30,586][01550] Num frames 5800... [2024-11-18 10:12:30,704][01550] Num frames 5900... [2024-11-18 10:12:30,822][01550] Num frames 6000... [2024-11-18 10:12:30,951][01550] Num frames 6100... [2024-11-18 10:12:31,070][01550] Num frames 6200... [2024-11-18 10:12:31,187][01550] Num frames 6300... [2024-11-18 10:12:31,308][01550] Num frames 6400... [2024-11-18 10:12:31,435][01550] Num frames 6500... [2024-11-18 10:12:31,574][01550] Avg episode rewards: #0: 18.214, true rewards: #0: 8.214 [2024-11-18 10:12:31,576][01550] Avg episode reward: 18.214, avg true_objective: 8.214 [2024-11-18 10:12:31,614][01550] Num frames 6600... [2024-11-18 10:12:31,732][01550] Num frames 6700... [2024-11-18 10:12:31,911][01550] Num frames 6800... [2024-11-18 10:12:32,084][01550] Num frames 6900... [2024-11-18 10:12:32,255][01550] Num frames 7000... [2024-11-18 10:12:32,636][01550] Num frames 7100... [2024-11-18 10:12:32,756][01550] Num frames 7200... [2024-11-18 10:12:32,878][01550] Num frames 7300... [2024-11-18 10:12:33,002][01550] Num frames 7400... [2024-11-18 10:12:33,120][01550] Num frames 7500... [2024-11-18 10:12:33,244][01550] Num frames 7600... [2024-11-18 10:12:33,366][01550] Num frames 7700... [2024-11-18 10:12:33,495][01550] Num frames 7800... [2024-11-18 10:12:33,612][01550] Num frames 7900... [2024-11-18 10:12:33,769][01550] Avg episode rewards: #0: 19.762, true rewards: #0: 8.873 [2024-11-18 10:12:33,771][01550] Avg episode reward: 19.762, avg true_objective: 8.873 [2024-11-18 10:12:33,791][01550] Num frames 8000... [2024-11-18 10:12:33,918][01550] Num frames 8100... [2024-11-18 10:12:34,036][01550] Num frames 8200... [2024-11-18 10:12:34,154][01550] Num frames 8300... [2024-11-18 10:12:34,272][01550] Num frames 8400... [2024-11-18 10:12:34,392][01550] Num frames 8500... [2024-11-18 10:12:34,552][01550] Num frames 8600... [2024-11-18 10:12:34,716][01550] Num frames 8700... [2024-11-18 10:12:34,882][01550] Num frames 8800... [2024-11-18 10:12:35,045][01550] Num frames 8900... [2024-11-18 10:12:35,210][01550] Num frames 9000... [2024-11-18 10:12:35,373][01550] Num frames 9100... [2024-11-18 10:12:35,535][01550] Num frames 9200... [2024-11-18 10:12:35,703][01550] Num frames 9300... [2024-11-18 10:12:35,886][01550] Num frames 9400... [2024-11-18 10:12:36,065][01550] Num frames 9500... [2024-11-18 10:12:36,228][01550] Num frames 9600... [2024-11-18 10:12:36,407][01550] Num frames 9700... [2024-11-18 10:12:36,572][01550] Avg episode rewards: #0: 21.862, true rewards: #0: 9.762 [2024-11-18 10:12:36,574][01550] Avg episode reward: 21.862, avg true_objective: 9.762 [2024-11-18 10:13:35,566][01550] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-18 10:14:12,465][01550] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-18 10:14:12,467][01550] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-18 10:14:12,469][01550] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-18 10:14:12,471][01550] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-18 10:14:12,472][01550] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-18 10:14:12,474][01550] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-18 10:14:12,476][01550] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-18 10:14:12,477][01550] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-18 10:14:12,478][01550] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-18 10:14:12,479][01550] Adding new argument 'hf_repository'='averydd/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-18 10:14:12,480][01550] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-18 10:14:12,481][01550] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-18 10:14:12,482][01550] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-18 10:14:12,483][01550] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-18 10:14:12,484][01550] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-18 10:14:12,492][01550] RunningMeanStd input shape: (3, 72, 128) [2024-11-18 10:14:12,501][01550] RunningMeanStd input shape: (1,) [2024-11-18 10:14:12,522][01550] ConvEncoder: input_channels=3 [2024-11-18 10:14:12,558][01550] Conv encoder output size: 512 [2024-11-18 10:14:12,559][01550] Policy head output size: 512 [2024-11-18 10:14:12,578][01550] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-18 10:14:13,061][01550] Num frames 100... [2024-11-18 10:14:13,177][01550] Num frames 200... [2024-11-18 10:14:13,295][01550] Num frames 300... [2024-11-18 10:14:13,409][01550] Num frames 400... [2024-11-18 10:14:13,530][01550] Num frames 500... [2024-11-18 10:14:13,655][01550] Num frames 600... [2024-11-18 10:14:13,772][01550] Num frames 700... [2024-11-18 10:14:13,896][01550] Num frames 800... [2024-11-18 10:14:14,043][01550] Num frames 900... [2024-11-18 10:14:14,207][01550] Num frames 1000... [2024-11-18 10:14:14,370][01550] Num frames 1100... [2024-11-18 10:14:14,535][01550] Num frames 1200... [2024-11-18 10:14:14,623][01550] Avg episode rewards: #0: 24.160, true rewards: #0: 12.160 [2024-11-18 10:14:14,625][01550] Avg episode reward: 24.160, avg true_objective: 12.160 [2024-11-18 10:14:14,762][01550] Num frames 1300... [2024-11-18 10:14:14,942][01550] Num frames 1400... [2024-11-18 10:14:15,116][01550] Num frames 1500... [2024-11-18 10:14:15,281][01550] Num frames 1600... [2024-11-18 10:14:15,446][01550] Num frames 1700... [2024-11-18 10:14:15,615][01550] Num frames 1800... [2024-11-18 10:14:15,786][01550] Num frames 1900... [2024-11-18 10:14:15,958][01550] Num frames 2000... [2024-11-18 10:14:16,122][01550] Num frames 2100... [2024-11-18 10:14:16,292][01550] Num frames 2200... [2024-11-18 10:14:16,445][01550] Num frames 2300... [2024-11-18 10:14:16,563][01550] Num frames 2400... [2024-11-18 10:14:16,689][01550] Num frames 2500... [2024-11-18 10:14:16,813][01550] Num frames 2600... [2024-11-18 10:14:16,941][01550] Num frames 2700... [2024-11-18 10:14:17,062][01550] Num frames 2800... [2024-11-18 10:14:17,190][01550] Num frames 2900... [2024-11-18 10:14:17,313][01550] Num frames 3000... [2024-11-18 10:14:17,434][01550] Num frames 3100... [2024-11-18 10:14:17,495][01550] Avg episode rewards: #0: 35.020, true rewards: #0: 15.520 [2024-11-18 10:14:17,497][01550] Avg episode reward: 35.020, avg true_objective: 15.520 [2024-11-18 10:14:17,612][01550] Num frames 3200... [2024-11-18 10:14:17,739][01550] Num frames 3300... [2024-11-18 10:14:17,861][01550] Num frames 3400... [2024-11-18 10:14:17,986][01550] Num frames 3500... [2024-11-18 10:14:18,103][01550] Num frames 3600... [2024-11-18 10:14:18,223][01550] Num frames 3700... [2024-11-18 10:14:18,343][01550] Num frames 3800... [2024-11-18 10:14:18,409][01550] Avg episode rewards: #0: 29.027, true rewards: #0: 12.693 [2024-11-18 10:14:18,410][01550] Avg episode reward: 29.027, avg true_objective: 12.693 [2024-11-18 10:14:18,519][01550] Num frames 3900... [2024-11-18 10:14:18,636][01550] Num frames 4000... [2024-11-18 10:14:18,760][01550] Num frames 4100... [2024-11-18 10:14:18,880][01550] Num frames 4200... [2024-11-18 10:14:19,001][01550] Num frames 4300... [2024-11-18 10:14:19,120][01550] Num frames 4400... [2024-11-18 10:14:19,245][01550] Num frames 4500... [2024-11-18 10:14:19,368][01550] Num frames 4600... [2024-11-18 10:14:19,488][01550] Num frames 4700... [2024-11-18 10:14:19,607][01550] Num frames 4800... [2024-11-18 10:14:19,729][01550] Num frames 4900... [2024-11-18 10:14:19,851][01550] Num frames 5000... [2024-11-18 10:14:19,942][01550] Avg episode rewards: #0: 28.310, true rewards: #0: 12.560 [2024-11-18 10:14:19,944][01550] Avg episode reward: 28.310, avg true_objective: 12.560 [2024-11-18 10:14:20,035][01550] Num frames 5100... [2024-11-18 10:14:20,158][01550] Num frames 5200... [2024-11-18 10:14:20,280][01550] Num frames 5300... [2024-11-18 10:14:20,399][01550] Num frames 5400... [2024-11-18 10:14:20,468][01550] Avg episode rewards: #0: 23.820, true rewards: #0: 10.820 [2024-11-18 10:14:20,470][01550] Avg episode reward: 23.820, avg true_objective: 10.820 [2024-11-18 10:14:20,577][01550] Num frames 5500... [2024-11-18 10:14:20,696][01550] Num frames 5600... [2024-11-18 10:14:20,829][01550] Num frames 5700... [2024-11-18 10:14:20,957][01550] Num frames 5800... [2024-11-18 10:14:21,081][01550] Num frames 5900... [2024-11-18 10:14:21,201][01550] Num frames 6000... [2024-11-18 10:14:21,322][01550] Num frames 6100... [2024-11-18 10:14:21,440][01550] Num frames 6200... [2024-11-18 10:14:21,557][01550] Num frames 6300... [2024-11-18 10:14:21,681][01550] Num frames 6400... [2024-11-18 10:14:21,812][01550] Num frames 6500... [2024-11-18 10:14:21,937][01550] Num frames 6600... [2024-11-18 10:14:22,058][01550] Num frames 6700... [2024-11-18 10:14:22,180][01550] Num frames 6800... [2024-11-18 10:14:22,303][01550] Num frames 6900... [2024-11-18 10:14:22,418][01550] Avg episode rewards: #0: 26.750, true rewards: #0: 11.583 [2024-11-18 10:14:22,420][01550] Avg episode reward: 26.750, avg true_objective: 11.583 [2024-11-18 10:14:22,481][01550] Num frames 7000... [2024-11-18 10:14:22,597][01550] Num frames 7100... [2024-11-18 10:14:22,727][01550] Num frames 7200... [2024-11-18 10:14:22,854][01550] Num frames 7300... [2024-11-18 10:14:22,982][01550] Num frames 7400... [2024-11-18 10:14:23,098][01550] Num frames 7500... [2024-11-18 10:14:23,216][01550] Num frames 7600... [2024-11-18 10:14:23,336][01550] Num frames 7700... [2024-11-18 10:14:23,453][01550] Num frames 7800... [2024-11-18 10:14:23,573][01550] Num frames 7900... [2024-11-18 10:14:23,690][01550] Num frames 8000... [2024-11-18 10:14:23,808][01550] Num frames 8100... [2024-11-18 10:14:23,945][01550] Num frames 8200... [2024-11-18 10:14:24,063][01550] Num frames 8300... [2024-11-18 10:14:24,182][01550] Num frames 8400... [2024-11-18 10:14:24,308][01550] Num frames 8500... [2024-11-18 10:14:24,425][01550] Num frames 8600... [2024-11-18 10:14:24,543][01550] Num frames 8700... [2024-11-18 10:14:24,665][01550] Num frames 8800... [2024-11-18 10:14:24,788][01550] Num frames 8900... [2024-11-18 10:14:24,922][01550] Num frames 9000... [2024-11-18 10:14:25,036][01550] Avg episode rewards: #0: 30.357, true rewards: #0: 12.929 [2024-11-18 10:14:25,038][01550] Avg episode reward: 30.357, avg true_objective: 12.929 [2024-11-18 10:14:25,099][01550] Num frames 9100... [2024-11-18 10:14:25,220][01550] Num frames 9200... [2024-11-18 10:14:25,341][01550] Num frames 9300... [2024-11-18 10:14:25,458][01550] Num frames 9400... [2024-11-18 10:14:25,578][01550] Num frames 9500... [2024-11-18 10:14:25,697][01550] Num frames 9600... [2024-11-18 10:14:25,816][01550] Num frames 9700... [2024-11-18 10:14:25,958][01550] Num frames 9800... [2024-11-18 10:14:26,090][01550] Avg episode rewards: #0: 28.956, true rewards: #0: 12.331 [2024-11-18 10:14:26,093][01550] Avg episode reward: 28.956, avg true_objective: 12.331 [2024-11-18 10:14:26,135][01550] Num frames 9900... [2024-11-18 10:14:26,253][01550] Num frames 10000... [2024-11-18 10:14:26,371][01550] Num frames 10100... [2024-11-18 10:14:26,527][01550] Num frames 10200... [2024-11-18 10:14:26,706][01550] Num frames 10300... [2024-11-18 10:14:26,790][01550] Avg episode rewards: #0: 26.348, true rewards: #0: 11.459 [2024-11-18 10:14:26,792][01550] Avg episode reward: 26.348, avg true_objective: 11.459 [2024-11-18 10:14:26,955][01550] Num frames 10400... [2024-11-18 10:14:27,113][01550] Num frames 10500... [2024-11-18 10:14:27,276][01550] Num frames 10600... [2024-11-18 10:14:27,437][01550] Num frames 10700... [2024-11-18 10:14:27,595][01550] Num frames 10800... [2024-11-18 10:14:27,760][01550] Num frames 10900... [2024-11-18 10:14:27,926][01550] Num frames 11000... [2024-11-18 10:14:28,092][01550] Num frames 11100... [2024-11-18 10:14:28,277][01550] Avg episode rewards: #0: 25.377, true rewards: #0: 11.177 [2024-11-18 10:14:28,279][01550] Avg episode reward: 25.377, avg true_objective: 11.177 [2024-11-18 10:15:33,245][01550] Replay video saved to /content/train_dir/default_experiment/replay.mp4!