harvey2333's picture
add model files
a118412
[2023-10-13 02:59:14,478] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-10-13 02:59:16,541] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-10-13 02:59:16,541] [INFO] [runner.py:555:main] cmd = /usr/local/miniconda3/envs/llava/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None llava/train/train_mem_video.py --deepspeed ./scripts/zero2.json --lora_enable True --model_name_or_path /hy-tmp/vicuna-7b-v1.3 --version v1 --data_path ./data/avsd_train_omni.json --video_folder /hy-tmp/Charades_v1_480 --vision_tower /hy-tmp/clip-vit-large-patch14 --pretrain_mm_mlp_adapter /hy-tmp/llava-pretrain-vicuna-7b-v1.3/mm_projector.bin --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --bf16 True --output_dir /hy-tmp/checkpoints/omni-vicuna-7b-v1.3-finetune_lora --num_train_epochs 8 --per_device_train_batch_size 8 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --evaluation_strategy no --save_strategy steps --save_steps 100 --save_total_limit 3 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type cosine --logging_steps 1 --tf32 True --model_max_length 2048 --gradient_checkpointing True --lazy_preprocess True --dataloader_num_workers 8 --report_to wandb
[2023-10-13 02:59:17,802] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-10-13 02:59:19,574] [INFO] [launch.py:138:main] 0 NCCL_P2P_LEVEL=NVL
[2023-10-13 02:59:19,574] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2023-10-13 02:59:19,574] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2023-10-13 02:59:19,574] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2023-10-13 02:59:19,574] [INFO] [launch.py:163:main] dist_world_size=2
[2023-10-13 02:59:19,574] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2023-10-13 02:59:22,389] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-10-13 02:59:22,433] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-10-13 02:59:22,977] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-10-13 02:59:22,977] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-10-13 02:59:22,977] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-10-13 02:59:23,051] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-10-13 02:59:23,051] [INFO] [comm.py:594:init_distributed] cdb=None
You are using a model of type llama to instantiate a model of type omni. This is not supported for all configurations of models and can yield errors.
You are using a model of type llama to instantiate a model of type omni. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:17<00:17, 17.74s/it] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:24<00:24, 24.14s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:24<00:00, 11.07s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:24<00:00, 12.07s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:35<00:00, 16.58s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:35<00:00, 17.72s/it]
Adding LoRA adapters...
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
Formatting inputs...Skip in lazy mode
Rank: 0 partition count [2, 2] and sizes[(82444288, False), (2176, False)]
Rank: 1 partition count [2, 2] and sizes[(82444288, False), (2176, False)]
wandb: Currently logged in as: wanghao-cst. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.12
wandb: Run data is saved locally in /root/Omni-LLM/wandb/run-20231013_030309-30lhy90r
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run fiery-dew-9
wandb: ⭐️ View project at https://wandb.ai/wanghao-cst/huggingface
wandb: πŸš€ View run at https://wandb.ai/wanghao-cst/huggingface/runs/30lhy90r
0%| | 0/616 [00:00<?, ?it/s] 0%| | 1/616 [01:39<17:00:22, 99.55s/it] {'loss': 12.2148, 'learning_rate': 1.0526315789473685e-06, 'epoch': 0.01}
0%| | 1/616 [01:39<17:00:22, 99.55s/it] 0%| | 2/616 [02:35<12:33:55, 73.67s/it] {'loss': 12.0312, 'learning_rate': 2.105263157894737e-06, 'epoch': 0.03}
0%| | 2/616 [02:35<12:33:55, 73.67s/it] 0%| | 3/616 [03:30<11:05:55, 65.18s/it] {'loss': 12.3086, 'learning_rate': 3.157894736842105e-06, 'epoch': 0.04}
0%| | 3/616 [03:30<11:05:55, 65.18s/it] 1%| | 4/616 [04:24<10:22:36, 61.04s/it] {'loss': 12.1172, 'learning_rate': 4.210526315789474e-06, 'epoch': 0.05}
1%| | 4/616 [04:24<10:22:36, 61.04s/it] 1%| | 5/616 [05:20<10:01:41, 59.09s/it] {'loss': 12.0117, 'learning_rate': 5.263157894736842e-06, 'epoch': 0.06}
1%| | 5/616 [05:20<10:01:41, 59.09s/it] 1%| | 6/616 [06:15<9:47:18, 57.77s/it] {'loss': 12.2656, 'learning_rate': 6.31578947368421e-06, 'epoch': 0.08}
1%| | 6/616 [06:15<9:47:18, 57.77s/it] 1%| | 7/616 [07:11<9:39:26, 57.09s/it] {'loss': 12.125, 'learning_rate': 7.368421052631579e-06, 'epoch': 0.09}
1%| | 7/616 [07:11<9:39:26, 57.09s/it] 1%|▏ | 8/616 [08:05<9:30:10, 56.27s/it] {'loss': 11.2266, 'learning_rate': 8.421052631578948e-06, 'epoch': 0.1}
1%|▏ | 8/616 [08:05<9:30:10, 56.27s/it] 1%|▏ | 9/616 [09:01<9:28:26, 56.19s/it] {'loss': 11.1523, 'learning_rate': 9.473684210526315e-06, 'epoch': 0.12}
1%|▏ | 9/616 [09:01<9:28:26, 56.19s/it] 2%|▏ | 10/616 [09:56<9:23:58, 55.84s/it] {'loss': 9.5234, 'learning_rate': 1.0526315789473684e-05, 'epoch': 0.13}
2%|▏ | 10/616 [09:56<9:23:58, 55.84s/it] 2%|▏ | 11/616 [10:52<9:23:12, 55.86s/it] {'loss': 9.4688, 'learning_rate': 1.1578947368421053e-05, 'epoch': 0.14}
2%|▏ | 11/616 [10:52<9:23:12, 55.86s/it] 2%|▏ | 12/616 [11:49<9:24:18, 56.06s/it] {'loss': 9.25, 'learning_rate': 1.263157894736842e-05, 'epoch': 0.16}
2%|▏ | 12/616 [11:49<9:24:18, 56.06s/it] 2%|▏ | 13/616 [12:44<9:20:17, 55.75s/it] {'loss': 7.7285, 'learning_rate': 1.3684210526315791e-05, 'epoch': 0.17}
2%|▏ | 13/616 [12:44<9:20:17, 55.75s/it] 2%|▏ | 14/616 [13:39<9:16:20, 55.45s/it] {'loss': 7.6367, 'learning_rate': 1.4736842105263159e-05, 'epoch': 0.18}
2%|▏ | 14/616 [13:39<9:16:20, 55.45s/it] 2%|▏ | 15/616 [14:34<9:16:23, 55.55s/it] {'loss': 7.4844, 'learning_rate': 1.578947368421053e-05, 'epoch': 0.19}
2%|▏ | 15/616 [14:34<9:16:23, 55.55s/it] 3%|β–Ž | 16/616 [15:30<9:16:29, 55.65s/it] {'loss': 7.2422, 'learning_rate': 1.6842105263157896e-05, 'epoch': 0.21}
3%|β–Ž | 16/616 [15:30<9:16:29, 55.65s/it] 3%|β–Ž | 17/616 [16:27<9:18:45, 55.97s/it] {'loss': 7.0938, 'learning_rate': 1.7894736842105264e-05, 'epoch': 0.22}
3%|β–Ž | 17/616 [16:27<9:18:45, 55.97s/it] 3%|β–Ž | 18/616 [17:22<9:14:48, 55.67s/it] {'loss': 6.7266, 'learning_rate': 1.894736842105263e-05, 'epoch': 0.23}
3%|β–Ž | 18/616 [17:22<9:14:48, 55.67s/it] 3%|β–Ž | 19/616 [18:17<9:10:47, 55.36s/it] {'loss': 6.5234, 'learning_rate': 2e-05, 'epoch': 0.25}
3%|β–Ž | 19/616 [18:17<9:10:47, 55.36s/it] 3%|β–Ž | 20/616 [19:13<9:11:42, 55.54s/it] {'loss': 6.3477, 'learning_rate': 1.9999861541352416e-05, 'epoch': 0.26}
3%|β–Ž | 20/616 [19:13<9:11:42, 55.54s/it] 3%|β–Ž | 21/616 [20:08<9:10:46, 55.54s/it] {'loss': 6.127, 'learning_rate': 1.9999446169243816e-05, 'epoch': 0.27}
3%|β–Ž | 21/616 [20:08<9:10:46, 55.54s/it] 4%|β–Ž | 22/616 [21:03<9:07:59, 55.35s/it] {'loss': 5.8555, 'learning_rate': 1.9998753895176576e-05, 'epoch': 0.29}
4%|β–Ž | 22/616 [21:03<9:07:59, 55.35s/it] 4%|β–Ž | 23/616 [22:00<9:11:42, 55.82s/it] {'loss': 5.7402, 'learning_rate': 1.999778473832096e-05, 'epoch': 0.3}
4%|β–Ž | 23/616 [22:00<9:11:42, 55.82s/it] 4%|▍ | 24/616 [22:54<9:06:29, 55.39s/it] {'loss': 5.5605, 'learning_rate': 1.9996538725514597e-05, 'epoch': 0.31}
4%|▍ | 24/616 [22:54<9:06:29, 55.39s/it] 4%|▍ | 25/616 [23:50<9:05:28, 55.38s/it] {'loss': 5.4199, 'learning_rate': 1.999501589126174e-05, 'epoch': 0.32}
4%|▍ | 25/616 [23:50<9:05:28, 55.38s/it] 4%|▍ | 26/616 [24:46<9:07:10, 55.65s/it] {'loss': 5.3242, 'learning_rate': 1.9993216277732302e-05, 'epoch': 0.34}
4%|▍ | 26/616 [24:46<9:07:10, 55.65s/it] 4%|▍ | 27/616 [25:42<9:05:58, 55.62s/it] {'loss': 5.2148, 'learning_rate': 1.999113993476069e-05, 'epoch': 0.35}
4%|▍ | 27/616 [25:42<9:05:58, 55.62s/it] 5%|▍ | 28/616 [26:37<9:04:07, 55.52s/it] {'loss': 5.1016, 'learning_rate': 1.9988786919844437e-05, 'epoch': 0.36}
5%|▍ | 28/616 [26:37<9:04:07, 55.52s/it] 5%|▍ | 29/616 [27:34<9:06:56, 55.91s/it] {'loss': 5.0488, 'learning_rate': 1.9986157298142595e-05, 'epoch': 0.38}
5%|▍ | 29/616 [27:34<9:06:56, 55.91s/it] 5%|▍ | 30/616 [28:28<9:02:34, 55.55s/it] {'loss': 4.9258, 'learning_rate': 1.9983251142473935e-05, 'epoch': 0.39}
5%|▍ | 30/616 [28:28<9:02:34, 55.55s/it] 5%|β–Œ | 31/616 [29:26<9:06:21, 56.04s/it] {'loss': 4.9531, 'learning_rate': 1.9980068533314937e-05, 'epoch': 0.4}
5%|β–Œ | 31/616 [29:26<9:06:21, 56.04s/it] 5%|β–Œ | 32/616 [30:21<9:05:01, 56.00s/it] {'loss': 4.8535, 'learning_rate': 1.9976609558797545e-05, 'epoch': 0.42}
5%|β–Œ | 32/616 [30:21<9:05:01, 56.00s/it] 5%|β–Œ | 33/616 [31:17<9:02:02, 55.79s/it] {'loss': 4.8203, 'learning_rate': 1.9972874314706755e-05, 'epoch': 0.43}
5%|β–Œ | 33/616 [31:17<9:02:02, 55.79s/it] 6%|β–Œ | 34/616 [32:12<8:58:28, 55.51s/it] {'loss': 4.8535, 'learning_rate': 1.9968862904477936e-05, 'epoch': 0.44}
6%|β–Œ | 34/616 [32:12<8:58:28, 55.51s/it] 6%|β–Œ | 35/616 [33:07<8:56:38, 55.42s/it] {'loss': 4.7168, 'learning_rate': 1.9964575439193966e-05, 'epoch': 0.45}
6%|β–Œ | 35/616 [33:07<8:56:38, 55.42s/it] 6%|β–Œ | 36/616 [34:02<8:53:51, 55.23s/it] {'loss': 4.6875, 'learning_rate': 1.996001203758218e-05, 'epoch': 0.47}
6%|β–Œ | 36/616 [34:02<8:53:51, 55.23s/it] 6%|β–Œ | 37/616 [34:56<8:50:03, 54.93s/it] {'loss': 4.6172, 'learning_rate': 1.995517282601106e-05, 'epoch': 0.48}
6%|β–Œ | 37/616 [34:56<8:50:03, 54.93s/it] 6%|β–Œ | 38/616 [35:52<8:51:28, 55.17s/it] {'loss': 4.6523, 'learning_rate': 1.9950057938486745e-05, 'epoch': 0.49}
6%|β–Œ | 38/616 [35:52<8:51:28, 55.17s/it] 6%|β–‹ | 39/616 [36:48<8:54:36, 55.59s/it] {'loss': 4.5195, 'learning_rate': 1.994466751664932e-05, 'epoch': 0.51}
6%|β–‹ | 39/616 [36:48<8:54:36, 55.59s/it] 6%|β–‹ | 40/616 [37:44<8:54:53, 55.72s/it] {'loss': 4.5117, 'learning_rate': 1.993900170976888e-05, 'epoch': 0.52}
6%|β–‹ | 40/616 [37:44<8:54:53, 55.72s/it] 7%|β–‹ | 41/616 [38:39<8:52:15, 55.54s/it] {'loss': 4.4141, 'learning_rate': 1.9933060674741422e-05, 'epoch': 0.53}
7%|β–‹ | 41/616 [38:39<8:52:15, 55.54s/it] 7%|β–‹ | 42/616 [39:35<8:52:13, 55.63s/it] {'loss': 4.3398, 'learning_rate': 1.9926844576084483e-05, 'epoch': 0.55}
7%|β–‹ | 42/616 [39:35<8:52:13, 55.63s/it] 7%|β–‹ | 43/616 [40:32<8:54:09, 55.93s/it] {'loss': 4.3232, 'learning_rate': 1.992035358593258e-05, 'epoch': 0.56}
7%|β–‹ | 43/616 [40:32<8:54:09, 55.93s/it] 7%|β–‹ | 44/616 [41:28<8:52:43, 55.88s/it] {'loss': 4.2305, 'learning_rate': 1.991358788403246e-05, 'epoch': 0.57}
7%|β–‹ | 44/616 [41:28<8:52:43, 55.88s/it] 7%|β–‹ | 45/616 [42:23<8:50:59, 55.80s/it] {'loss': 4.1641, 'learning_rate': 1.990654765773811e-05, 'epoch': 0.58}
7%|β–‹ | 45/616 [42:23<8:50:59, 55.80s/it] 7%|β–‹ | 46/616 [43:18<8:48:37, 55.65s/it] {'loss': 4.0674, 'learning_rate': 1.9899233102005573e-05, 'epoch': 0.6}
7%|β–‹ | 46/616 [43:18<8:48:37, 55.65s/it] 8%|β–Š | 47/616 [44:14<8:48:51, 55.77s/it] {'loss': 3.915, 'learning_rate': 1.9891644419387545e-05, 'epoch': 0.61}
8%|β–Š | 47/616 [44:14<8:48:51, 55.77s/it] 8%|β–Š | 48/616 [45:10<8:46:06, 55.58s/it] {'loss': 3.7822, 'learning_rate': 1.9883781820027777e-05, 'epoch': 0.62}
8%|β–Š | 48/616 [45:10<8:46:06, 55.58s/it] 8%|β–Š | 49/616 [46:06<8:47:09, 55.78s/it] {'loss': 3.709, 'learning_rate': 1.987564552165524e-05, 'epoch': 0.64}
8%|β–Š | 49/616 [46:06<8:47:09, 55.78s/it] 8%|β–Š | 50/616 [47:03<8:49:27, 56.13s/it] {'loss': 3.4131, 'learning_rate': 1.9867235749578108e-05, 'epoch': 0.65}
8%|β–Š | 50/616 [47:03<8:49:27, 56.13s/it] 8%|β–Š | 51/616 [47:59<8:47:39, 56.03s/it] {'loss': 3.1318, 'learning_rate': 1.9858552736677516e-05, 'epoch': 0.66}
8%|β–Š | 51/616 [47:59<8:47:39, 56.03s/it] 8%|β–Š | 52/616 [48:56<8:49:19, 56.31s/it] {'loss': 2.834, 'learning_rate': 1.984959672340111e-05, 'epoch': 0.68}
8%|β–Š | 52/616 [48:56<8:49:19, 56.31s/it] 9%|β–Š | 53/616 [49:52<8:48:34, 56.33s/it] {'loss': 2.5654, 'learning_rate': 1.984036795775638e-05, 'epoch': 0.69}
9%|β–Š | 53/616 [49:52<8:48:34, 56.33s/it] 9%|β–‰ | 54/616 [50:48<8:47:14, 56.29s/it] {'loss': 2.417, 'learning_rate': 1.9830866695303817e-05, 'epoch': 0.7}
9%|β–‰ | 54/616 [50:48<8:47:14, 56.29s/it] 9%|β–‰ | 55/616 [51:45<8:46:52, 56.35s/it] {'loss': 2.1909, 'learning_rate': 1.9821093199149806e-05, 'epoch': 0.71}
9%|β–‰ | 55/616 [51:45<8:46:52, 56.35s/it] 9%|β–‰ | 56/616 [52:41<8:47:14, 56.49s/it] {'loss': 2.2568, 'learning_rate': 1.981104773993936e-05, 'epoch': 0.73}
9%|β–‰ | 56/616 [52:41<8:47:14, 56.49s/it] 9%|β–‰ | 57/616 [53:37<8:44:24, 56.29s/it] {'loss': 2.2744, 'learning_rate': 1.980073059584862e-05, 'epoch': 0.74}
9%|β–‰ | 57/616 [53:37<8:44:24, 56.29s/it] 9%|β–‰ | 58/616 [54:34<8:43:43, 56.31s/it] {'loss': 2.0771, 'learning_rate': 1.9790142052577148e-05, 'epoch': 0.75}
9%|β–‰ | 58/616 [54:34<8:43:43, 56.31s/it] 10%|β–‰ | 59/616 [55:29<8:41:20, 56.16s/it] {'loss': 2.1729, 'learning_rate': 1.977928240334002e-05, 'epoch': 0.77}
10%|β–‰ | 59/616 [55:29<8:41:20, 56.16s/it] 10%|β–‰ | 60/616 [56:25<8:37:59, 55.90s/it] {'loss': 2.123, 'learning_rate': 1.9768151948859705e-05, 'epoch': 0.78}
10%|β–‰ | 60/616 [56:25<8:37:59, 55.90s/it] 10%|β–‰ | 61/616 [57:21<8:38:57, 56.10s/it] {'loss': 2.0356, 'learning_rate': 1.9756750997357738e-05, 'epoch': 0.79}
10%|β–‰ | 61/616 [57:21<8:38:57, 56.10s/it] 10%|β–ˆ | 62/616 [58:17<8:37:46, 56.08s/it] {'loss': 2.0142, 'learning_rate': 1.9745079864546184e-05, 'epoch': 0.81}
10%|β–ˆ | 62/616 [58:17<8:37:46, 56.08s/it] 10%|β–ˆ | 63/616 [59:12<8:34:10, 55.79s/it] {'loss': 2.061, 'learning_rate': 1.97331388736189e-05, 'epoch': 0.82}
10%|β–ˆ | 63/616 [59:12<8:34:10, 55.79s/it] 10%|β–ˆ | 64/616 [1:00:08<8:31:47, 55.63s/it] {'loss': 2.0508, 'learning_rate': 1.972092835524257e-05, 'epoch': 0.83}
10%|β–ˆ | 64/616 [1:00:08<8:31:47, 55.63s/it] 11%|β–ˆ | 65/616 [1:01:05<8:36:39, 56.26s/it] {'loss': 2.0171, 'learning_rate': 1.9708448647547575e-05, 'epoch': 0.84}
11%|β–ˆ | 65/616 [1:01:05<8:36:39, 56.26s/it] 11%|β–ˆ | 66/616 [1:02:02<8:37:02, 56.40s/it] {'loss': 2.1284, 'learning_rate': 1.9695700096118594e-05, 'epoch': 0.86}
11%|β–ˆ | 66/616 [1:02:02<8:37:02, 56.40s/it] 11%|β–ˆ | 67/616 [1:02:58<8:35:36, 56.35s/it] {'loss': 2.0166, 'learning_rate': 1.9682683053985073e-05, 'epoch': 0.87}
11%|β–ˆ | 67/616 [1:02:58<8:35:36, 56.35s/it] 11%|β–ˆ | 68/616 [1:03:54<8:33:34, 56.23s/it] {'loss': 2.062, 'learning_rate': 1.966939788161142e-05, 'epoch': 0.88}
11%|β–ˆ | 68/616 [1:03:54<8:33:34, 56.23s/it] 11%|β–ˆ | 69/616 [1:04:50<8:31:20, 56.09s/it] {'loss': 2.0142, 'learning_rate': 1.9655844946887035e-05, 'epoch': 0.9}
11%|β–ˆ | 69/616 [1:04:50<8:31:20, 56.09s/it] 11%|β–ˆβ– | 70/616 [1:05:45<8:27:52, 55.81s/it] {'loss': 2.0103, 'learning_rate': 1.9642024625116117e-05, 'epoch': 0.91}
11%|β–ˆβ– | 70/616 [1:05:45<8:27:52, 55.81s/it] 12%|β–ˆβ– | 71/616 [1:06:41<8:26:19, 55.74s/it] {'loss': 1.9956, 'learning_rate': 1.9627937299007286e-05, 'epoch': 0.92}
12%|β–ˆβ– | 71/616 [1:06:41<8:26:19, 55.74s/it] 12%|β–ˆβ– | 72/616 [1:07:38<8:29:09, 56.16s/it] {'loss': 1.9868, 'learning_rate': 1.961358335866296e-05, 'epoch': 0.94}
12%|β–ˆβ– | 72/616 [1:07:38<8:29:09, 56.16s/it] 12%|β–ˆβ– | 73/616 [1:08:34<8:27:42, 56.10s/it] {'loss': 2.0435, 'learning_rate': 1.959896320156857e-05, 'epoch': 0.95}
12%|β–ˆβ– | 73/616 [1:08:34<8:27:42, 56.10s/it] 12%|β–ˆβ– | 74/616 [1:09:31<8:28:19, 56.27s/it] {'loss': 2.0112, 'learning_rate': 1.958407723258156e-05, 'epoch': 0.96}
12%|β–ˆβ– | 74/616 [1:09:31<8:28:19, 56.27s/it] 12%|β–ˆβ– | 75/616 [1:10:26<8:24:52, 55.99s/it] {'loss': 2.0908, 'learning_rate': 1.9568925863920155e-05, 'epoch': 0.97}
12%|β–ˆβ– | 75/616 [1:10:26<8:24:52, 55.99s/it] 12%|β–ˆβ– | 76/616 [1:11:22<8:25:13, 56.14s/it] {'loss': 1.9795, 'learning_rate': 1.955350951515195e-05, 'epoch': 0.99}
12%|β–ˆβ– | 76/616 [1:11:22<8:25:13, 56.14s/it] 12%|β–ˆβ–Ž | 77/616 [1:12:19<8:25:03, 56.22s/it] {'loss': 2.0112, 'learning_rate': 1.9537828613182314e-05, 'epoch': 1.0}
12%|β–ˆβ–Ž | 77/616 [1:12:19<8:25:03, 56.22s/it] 13%|β–ˆβ–Ž | 78/616 [1:13:47<9:49:34, 65.75s/it] {'loss': 2.0459, 'learning_rate': 1.9521883592242537e-05, 'epoch': 1.01}
13%|β–ˆβ–Ž | 78/616 [1:13:47<9:49:34, 65.75s/it] 13%|β–ˆβ–Ž | 79/616 [1:14:43<9:21:38, 62.75s/it] {'loss': 2.0117, 'learning_rate': 1.950567489387783e-05, 'epoch': 1.03}
13%|β–ˆβ–Ž | 79/616 [1:14:43<9:21:38, 62.75s/it] 13%|β–ˆβ–Ž | 80/616 [1:15:37<8:59:10, 60.35s/it] {'loss': 2.0156, 'learning_rate': 1.9489202966935084e-05, 'epoch': 1.04}
13%|β–ˆβ–Ž | 80/616 [1:15:37<8:59:10, 60.35s/it] 13%|β–ˆβ–Ž | 81/616 [1:16:33<8:45:30, 58.93s/it] {'loss': 2.0547, 'learning_rate': 1.947246826755044e-05, 'epoch': 1.05}
13%|β–ˆβ–Ž | 81/616 [1:16:33<8:45:30, 58.93s/it] 13%|β–ˆβ–Ž | 82/616 [1:17:29<8:37:02, 58.09s/it] {'loss': 1.9639, 'learning_rate': 1.945547125913667e-05, 'epoch': 1.06}
13%|β–ˆβ–Ž | 82/616 [1:17:29<8:37:02, 58.09s/it] 13%|β–ˆβ–Ž | 83/616 [1:18:25<8:29:53, 57.40s/it] {'loss': 2.019, 'learning_rate': 1.943821241237034e-05, 'epoch': 1.08}
13%|β–ˆβ–Ž | 83/616 [1:18:25<8:29:53, 57.40s/it] 14%|β–ˆβ–Ž | 84/616 [1:19:20<8:23:52, 56.83s/it] {'loss': 1.9771, 'learning_rate': 1.9420692205178753e-05, 'epoch': 1.09}
14%|β–ˆβ–Ž | 84/616 [1:19:20<8:23:52, 56.83s/it] 14%|β–ˆβ– | 85/616 [1:20:16<8:21:01, 56.61s/it] {'loss': 1.9492, 'learning_rate': 1.9402911122726756e-05, 'epoch': 1.1}
14%|β–ˆβ– | 85/616 [1:20:16<8:21:01, 56.61s/it] 14%|β–ˆβ– | 86/616 [1:21:11<8:14:46, 56.01s/it] {'loss': 1.9702, 'learning_rate': 1.9384869657403277e-05, 'epoch': 1.12}
14%|β–ˆβ– | 86/616 [1:21:11<8:14:46, 56.01s/it] 14%|β–ˆβ– | 87/616 [1:22:06<8:11:43, 55.77s/it] {'loss': 1.9946, 'learning_rate': 1.9366568308807685e-05, 'epoch': 1.13}
14%|β–ˆβ– | 87/616 [1:22:06<8:11:43, 55.77s/it] 14%|β–ˆβ– | 88/616 [1:23:01<8:09:00, 55.57s/it] {'loss': 1.9854, 'learning_rate': 1.9348007583735985e-05, 'epoch': 1.14}
14%|β–ˆβ– | 88/616 [1:23:01<8:09:00, 55.57s/it] 14%|β–ˆβ– | 89/616 [1:23:57<8:06:59, 55.45s/it] {'loss': 1.959, 'learning_rate': 1.9329187996166747e-05, 'epoch': 1.16}
14%|β–ˆβ– | 89/616 [1:23:57<8:06:59, 55.45s/it] 15%|β–ˆβ– | 90/616 [1:24:52<8:07:03, 55.56s/it] {'loss': 1.9722, 'learning_rate': 1.9310110067246905e-05, 'epoch': 1.17}
15%|β–ˆβ– | 90/616 [1:24:52<8:07:03, 55.56s/it] 15%|β–ˆβ– | 91/616 [1:25:48<8:07:08, 55.67s/it] {'loss': 2.0376, 'learning_rate': 1.9290774325277305e-05, 'epoch': 1.18}
15%|β–ˆβ– | 91/616 [1:25:48<8:07:08, 55.67s/it] 15%|β–ˆβ– | 92/616 [1:26:44<8:06:06, 55.66s/it] {'loss': 1.9834, 'learning_rate': 1.9271181305698084e-05, 'epoch': 1.19}
15%|β–ˆβ– | 92/616 [1:26:44<8:06:06, 55.66s/it] 15%|β–ˆβ–Œ | 93/616 [1:27:40<8:05:12, 55.66s/it] {'loss': 2.0049, 'learning_rate': 1.9251331551073843e-05, 'epoch': 1.21}
15%|β–ˆβ–Œ | 93/616 [1:27:40<8:05:12, 55.66s/it] 15%|β–ˆβ–Œ | 94/616 [1:28:35<8:03:16, 55.55s/it] {'loss': 1.9824, 'learning_rate': 1.923122561107861e-05, 'epoch': 1.22}
15%|β–ˆβ–Œ | 94/616 [1:28:35<8:03:16, 55.55s/it] 15%|β–ˆβ–Œ | 95/616 [1:29:30<8:02:27, 55.56s/it] {'loss': 1.9624, 'learning_rate': 1.9210864042480645e-05, 'epoch': 1.23}
15%|β–ˆβ–Œ | 95/616 [1:29:30<8:02:27, 55.56s/it] 16%|β–ˆβ–Œ | 96/616 [1:30:26<8:02:30, 55.67s/it] {'loss': 1.9395, 'learning_rate': 1.9190247409126993e-05, 'epoch': 1.25}
16%|β–ˆβ–Œ | 96/616 [1:30:26<8:02:30, 55.67s/it] 16%|β–ˆβ–Œ | 97/616 [1:31:22<8:01:13, 55.63s/it] {'loss': 1.9746, 'learning_rate': 1.916937628192789e-05, 'epoch': 1.26}
16%|β–ˆβ–Œ | 97/616 [1:31:22<8:01:13, 55.63s/it] 16%|β–ˆβ–Œ | 98/616 [1:32:17<7:59:31, 55.54s/it] {'loss': 1.9507, 'learning_rate': 1.9148251238840947e-05, 'epoch': 1.27}
16%|β–ˆβ–Œ | 98/616 [1:32:17<7:59:31, 55.54s/it] 16%|β–ˆβ–Œ | 99/616 [1:33:13<7:59:26, 55.64s/it] {'loss': 2.0054, 'learning_rate': 1.9126872864855142e-05, 'epoch': 1.29}
16%|β–ˆβ–Œ | 99/616 [1:33:13<7:59:26, 55.64s/it] 16%|β–ˆβ–Œ | 100/616 [1:34:09<7:58:14, 55.61s/it] {'loss': 1.9409, 'learning_rate': 1.9105241751974624e-05, 'epoch': 1.3}
16%|β–ˆβ–Œ | 100/616 [1:34:09<7:58:14, 55.61s/it]/usr/local/miniconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/usr/local/miniconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
16%|β–ˆβ–‹ | 101/616 [1:36:09<10:43:30, 74.97s/it] {'loss': 1.9912, 'learning_rate': 1.9083358499202323e-05, 'epoch': 1.31}
16%|β–ˆβ–‹ | 101/616 [1:36:09<10:43:30, 74.97s/it] 17%|β–ˆβ–‹ | 102/616 [1:37:04<9:52:10, 69.12s/it] {'loss': 1.9404, 'learning_rate': 1.9061223712523352e-05, 'epoch': 1.32}
17%|β–ˆβ–‹ | 102/616 [1:37:04<9:52:10, 69.12s/it] 17%|β–ˆβ–‹ | 103/616 [1:38:00<9:16:14, 65.06s/it] {'loss': 1.9102, 'learning_rate': 1.903883800488824e-05, 'epoch': 1.34}
17%|β–ˆβ–‹ | 103/616 [1:38:00<9:16:14, 65.06s/it] 17%|β–ˆβ–‹ | 104/616 [1:38:55<8:49:52, 62.09s/it] {'loss': 1.9248, 'learning_rate': 1.9016201996195943e-05, 'epoch': 1.35}
17%|β–ˆβ–‹ | 104/616 [1:38:55<8:49:52, 62.09s/it] 17%|β–ˆβ–‹ | 105/616 [1:39:51<8:32:54, 60.22s/it] {'loss': 1.8984, 'learning_rate': 1.8993316313276694e-05, 'epoch': 1.36}
17%|β–ˆβ–‹ | 105/616 [1:39:51<8:32:54, 60.22s/it] 17%|β–ˆβ–‹ | 106/616 [1:40:46<8:19:20, 58.75s/it] {'loss': 1.9331, 'learning_rate': 1.8970181589874637e-05, 'epoch': 1.38}
17%|β–ˆβ–‹ | 106/616 [1:40:46<8:19:20, 58.75s/it] 17%|β–ˆβ–‹ | 107/616 [1:41:42<8:11:28, 57.93s/it] {'loss': 1.9561, 'learning_rate': 1.894679846663027e-05, 'epoch': 1.39}
17%|β–ˆβ–‹ | 107/616 [1:41:42<8:11:28, 57.93s/it] 18%|β–ˆβ–Š | 108/616 [1:42:38<8:04:50, 57.26s/it] {'loss': 1.8901, 'learning_rate': 1.8923167591062723e-05, 'epoch': 1.4}
18%|β–ˆβ–Š | 108/616 [1:42:38<8:04:50, 57.26s/it] 18%|β–ˆβ–Š | 109/616 [1:43:34<8:00:58, 56.92s/it] {'loss': 1.9922, 'learning_rate': 1.8899289617551803e-05, 'epoch': 1.42}
18%|β–ˆβ–Š | 109/616 [1:43:34<8:00:58, 56.92s/it] 18%|β–ˆβ–Š | 110/616 [1:44:29<7:55:58, 56.44s/it] {'loss': 1.9277, 'learning_rate': 1.8875165207319902e-05, 'epoch': 1.43}
18%|β–ˆβ–Š | 110/616 [1:44:29<7:55:58, 56.44s/it] 18%|β–ˆβ–Š | 111/616 [1:45:25<7:53:10, 56.22s/it] {'loss': 1.9185, 'learning_rate': 1.8850795028413658e-05, 'epoch': 1.44}
18%|β–ˆβ–Š | 111/616 [1:45:25<7:53:10, 56.22s/it] 18%|β–ˆβ–Š | 112/616 [1:46:21<7:50:46, 56.04s/it] {'loss': 1.9575, 'learning_rate': 1.882617975568547e-05, 'epoch': 1.45}
18%|β–ˆβ–Š | 112/616 [1:46:21<7:50:46, 56.04s/it] 18%|β–ˆβ–Š | 113/616 [1:47:15<7:46:22, 55.63s/it] {'loss': 1.957, 'learning_rate': 1.880132007077482e-05, 'epoch': 1.47}
18%|β–ˆβ–Š | 113/616 [1:47:15<7:46:22, 55.63s/it] 19%|β–ˆβ–Š | 114/616 [1:48:11<7:45:53, 55.69s/it] {'loss': 1.8984, 'learning_rate': 1.8776216662089373e-05, 'epoch': 1.48}
19%|β–ˆβ–Š | 114/616 [1:48:11<7:45:53, 55.69s/it] 19%|β–ˆβ–Š | 115/616 [1:49:08<7:46:53, 55.92s/it] {'loss': 1.9429, 'learning_rate': 1.875087022478594e-05, 'epoch': 1.49}
19%|β–ˆβ–Š | 115/616 [1:49:08<7:46:53, 55.92s/it] 19%|β–ˆβ–‰ | 116/616 [1:50:03<7:45:27, 55.86s/it] {'loss': 1.8701, 'learning_rate': 1.8725281460751198e-05, 'epoch': 1.51}
19%|β–ˆβ–‰ | 116/616 [1:50:03<7:45:27, 55.86s/it] 19%|β–ˆβ–‰ | 117/616 [1:50:59<7:43:13, 55.70s/it] {'loss': 1.9497, 'learning_rate': 1.869945107858228e-05, 'epoch': 1.52}
19%|β–ˆβ–‰ | 117/616 [1:50:59<7:43:13, 55.70s/it] 19%|β–ˆβ–‰ | 118/616 [1:51:55<7:44:34, 55.97s/it] {'loss': 1.8921, 'learning_rate': 1.867337979356715e-05, 'epoch': 1.53}
19%|β–ˆβ–‰ | 118/616 [1:51:55<7:44:34, 55.97s/it] 19%|β–ˆβ–‰ | 119/616 [1:52:51<7:42:44, 55.86s/it] {'loss': 1.8569, 'learning_rate': 1.8647068327664774e-05, 'epoch': 1.55}
19%|β–ˆβ–‰ | 119/616 [1:52:51<7:42:44, 55.86s/it] 19%|β–ˆβ–‰ | 120/616 [1:53:47<7:42:20, 55.93s/it] {'loss': 1.8882, 'learning_rate': 1.8620517409485148e-05, 'epoch': 1.56}
19%|β–ˆβ–‰ | 120/616 [1:53:47<7:42:20, 55.93s/it] 20%|β–ˆβ–‰ | 121/616 [1:54:43<7:40:16, 55.79s/it] {'loss': 1.8765, 'learning_rate': 1.8593727774269122e-05, 'epoch': 1.57}
20%|β–ˆβ–‰ | 121/616 [1:54:43<7:40:16, 55.79s/it] 20%|β–ˆβ–‰ | 122/616 [1:55:36<7:34:53, 55.25s/it] {'loss': 1.9282, 'learning_rate': 1.8566700163868027e-05, 'epoch': 1.58}
20%|β–ˆβ–‰ | 122/616 [1:55:36<7:34:53, 55.25s/it] 20%|β–ˆβ–‰ | 123/616 [1:56:32<7:34:34, 55.32s/it] {'loss': 1.8384, 'learning_rate': 1.8539435326723135e-05, 'epoch': 1.6}
20%|β–ˆβ–‰ | 123/616 [1:56:32<7:34:34, 55.32s/it] 20%|β–ˆβ–ˆ | 124/616 [1:57:28<7:35:47, 55.58s/it] {'loss': 1.9185, 'learning_rate': 1.851193401784495e-05, 'epoch': 1.61}
20%|β–ˆβ–ˆ | 124/616 [1:57:28<7:35:47, 55.58s/it] 20%|β–ˆβ–ˆ | 125/616 [1:58:23<7:32:30, 55.30s/it] {'loss': 1.834, 'learning_rate': 1.848419699879227e-05, 'epoch': 1.62}
20%|β–ˆβ–ˆ | 125/616 [1:58:23<7:32:30, 55.30s/it] 20%|β–ˆβ–ˆ | 126/616 [1:59:19<7:32:47, 55.44s/it] {'loss': 1.8657, 'learning_rate': 1.845622503765113e-05, 'epoch': 1.64}
20%|β–ˆβ–ˆ | 126/616 [1:59:19<7:32:47, 55.44s/it] 21%|β–ˆβ–ˆ | 127/616 [2:00:14<7:31:52, 55.44s/it] {'loss': 1.8457, 'learning_rate': 1.842801890901351e-05, 'epoch': 1.65}
21%|β–ˆβ–ˆ | 127/616 [2:00:14<7:31:52, 55.44s/it] 21%|β–ˆβ–ˆ | 128/616 [2:01:09<7:30:42, 55.41s/it] {'loss': 1.7671, 'learning_rate': 1.8399579393955893e-05, 'epoch': 1.66}
21%|β–ˆβ–ˆ | 128/616 [2:01:09<7:30:42, 55.41s/it] 21%|β–ˆβ–ˆ | 129/616 [2:02:04<7:28:38, 55.27s/it] {'loss': 1.8462, 'learning_rate': 1.837090728001764e-05, 'epoch': 1.68}
21%|β–ˆβ–ˆ | 129/616 [2:02:04<7:28:38, 55.27s/it] 21%|β–ˆβ–ˆ | 130/616 [2:03:00<7:28:17, 55.34s/it] {'loss': 1.8296, 'learning_rate': 1.834200336117918e-05, 'epoch': 1.69}
21%|β–ˆβ–ˆ | 130/616 [2:03:00<7:28:17, 55.34s/it] 21%|β–ˆβ–ˆβ– | 131/616 [2:03:55<7:27:18, 55.34s/it] {'loss': 1.8262, 'learning_rate': 1.8312868437840002e-05, 'epoch': 1.7}
21%|β–ˆβ–ˆβ– | 131/616 [2:03:55<7:27:18, 55.34s/it] 21%|β–ˆβ–ˆβ– | 132/616 [2:04:50<7:25:41, 55.25s/it] {'loss': 1.835, 'learning_rate': 1.8283503316796536e-05, 'epoch': 1.71}
21%|β–ˆβ–ˆβ– | 132/616 [2:04:50<7:25:41, 55.25s/it] 22%|β–ˆβ–ˆβ– | 133/616 [2:05:46<7:26:05, 55.42s/it] {'loss': 1.8979, 'learning_rate': 1.8253908811219764e-05, 'epoch': 1.73}
22%|β–ˆβ–ˆβ– | 133/616 [2:05:46<7:26:05, 55.42s/it] 22%|β–ˆβ–ˆβ– | 134/616 [2:06:43<7:28:04, 55.78s/it] {'loss': 1.8496, 'learning_rate': 1.822408574063273e-05, 'epoch': 1.74}
22%|β–ˆβ–ˆβ– | 134/616 [2:06:43<7:28:04, 55.78s/it] 22%|β–ˆβ–ˆβ– | 135/616 [2:07:39<7:27:34, 55.83s/it] {'loss': 1.8252, 'learning_rate': 1.8194034930887842e-05, 'epoch': 1.75}
22%|β–ˆβ–ˆβ– | 135/616 [2:07:39<7:27:34, 55.83s/it] 22%|β–ˆβ–ˆβ– | 136/616 [2:08:34<7:26:39, 55.83s/it] {'loss': 1.7812, 'learning_rate': 1.8163757214143993e-05, 'epoch': 1.77}
22%|β–ˆβ–ˆβ– | 136/616 [2:08:34<7:26:39, 55.83s/it] 22%|β–ˆβ–ˆβ– | 137/616 [2:09:29<7:23:29, 55.55s/it] {'loss': 1.8364, 'learning_rate': 1.8133253428843524e-05, 'epoch': 1.78}
22%|β–ˆβ–ˆβ– | 137/616 [2:09:29<7:23:29, 55.55s/it] 22%|β–ˆβ–ˆβ– | 138/616 [2:10:25<7:21:53, 55.47s/it] {'loss': 1.8013, 'learning_rate': 1.810252441968901e-05, 'epoch': 1.79}
22%|β–ˆβ–ˆβ– | 138/616 [2:10:25<7:21:53, 55.47s/it] 23%|β–ˆβ–ˆβ–Ž | 139/616 [2:11:20<7:21:37, 55.55s/it] {'loss': 1.8203, 'learning_rate': 1.8071571037619856e-05, 'epoch': 1.81}
23%|β–ˆβ–ˆβ–Ž | 139/616 [2:11:20<7:21:37, 55.55s/it] 23%|β–ˆβ–ˆβ–Ž | 140/616 [2:12:17<7:22:24, 55.77s/it] {'loss': 1.7729, 'learning_rate': 1.804039413978875e-05, 'epoch': 1.82}
23%|β–ˆβ–ˆβ–Ž | 140/616 [2:12:17<7:22:24, 55.77s/it] 23%|β–ˆβ–ˆβ–Ž | 141/616 [2:13:12<7:20:17, 55.62s/it] {'loss': 1.8491, 'learning_rate': 1.8008994589537913e-05, 'epoch': 1.83}
23%|β–ˆβ–ˆβ–Ž | 141/616 [2:13:12<7:20:17, 55.62s/it] 23%|β–ˆβ–ˆβ–Ž | 142/616 [2:14:08<7:20:03, 55.70s/it] {'loss': 1.7998, 'learning_rate': 1.7977373256375194e-05, 'epoch': 1.84}
23%|β–ˆβ–ˆβ–Ž | 142/616 [2:14:08<7:20:03, 55.70s/it] 23%|β–ˆβ–ˆβ–Ž | 143/616 [2:15:03<7:18:17, 55.60s/it] {'loss': 1.8364, 'learning_rate': 1.7945531015950008e-05, 'epoch': 1.86}
23%|β–ˆβ–ˆβ–Ž | 143/616 [2:15:03<7:18:17, 55.60s/it] 23%|β–ˆβ–ˆβ–Ž | 144/616 [2:16:00<7:19:32, 55.87s/it] {'loss': 1.8125, 'learning_rate': 1.791346875002905e-05, 'epoch': 1.87}
23%|β–ˆβ–ˆβ–Ž | 144/616 [2:16:00<7:19:32, 55.87s/it] 24%|β–ˆβ–ˆβ–Ž | 145/616 [2:16:56<7:20:03, 56.06s/it] {'loss': 1.832, 'learning_rate': 1.7881187346471924e-05, 'epoch': 1.88}
24%|β–ˆβ–ˆβ–Ž | 145/616 [2:16:56<7:20:03, 56.06s/it] 24%|β–ˆβ–ˆβ–Ž | 146/616 [2:17:52<7:19:22, 56.09s/it] {'loss': 1.8271, 'learning_rate': 1.784868769920653e-05, 'epoch': 1.9}
24%|β–ˆβ–ˆβ–Ž | 146/616 [2:17:52<7:19:22, 56.09s/it] 24%|β–ˆβ–ˆβ– | 147/616 [2:18:48<7:18:03, 56.04s/it] {'loss': 1.7959, 'learning_rate': 1.7815970708204296e-05, 'epoch': 1.91}
24%|β–ˆβ–ˆβ– | 147/616 [2:18:48<7:18:03, 56.04s/it] 24%|β–ˆβ–ˆβ– | 148/616 [2:19:44<7:16:24, 55.95s/it] {'loss': 1.7798, 'learning_rate': 1.77830372794553e-05, 'epoch': 1.92}
24%|β–ˆβ–ˆβ– | 148/616 [2:19:44<7:16:24, 55.95s/it] 24%|β–ˆβ–ˆβ– | 149/616 [2:20:39<7:14:15, 55.79s/it] {'loss': 1.7651, 'learning_rate': 1.774988832494314e-05, 'epoch': 1.94}
24%|β–ˆβ–ˆβ– | 149/616 [2:20:39<7:14:15, 55.79s/it] 24%|β–ˆβ–ˆβ– | 150/616 [2:21:34<7:11:42, 55.58s/it] {'loss': 1.8076, 'learning_rate': 1.7716524762619695e-05, 'epoch': 1.95}
24%|β–ˆβ–ˆβ– | 150/616 [2:21:34<7:11:42, 55.58s/it] 25%|β–ˆβ–ˆβ– | 151/616 [2:22:30<7:09:34, 55.43s/it] {'loss': 1.8379, 'learning_rate': 1.7682947516379706e-05, 'epoch': 1.96}
25%|β–ˆβ–ˆβ– | 151/616 [2:22:30<7:09:34, 55.43s/it] 25%|β–ˆβ–ˆβ– | 152/616 [2:23:24<7:06:59, 55.21s/it] {'loss': 1.8228, 'learning_rate': 1.7649157516035205e-05, 'epoch': 1.97}
25%|β–ˆβ–ˆβ– | 152/616 [2:23:24<7:06:59, 55.21s/it] 25%|β–ˆβ–ˆβ– | 153/616 [2:24:20<7:06:55, 55.32s/it] {'loss': 1.7783, 'learning_rate': 1.7615155697289734e-05, 'epoch': 1.99}
25%|β–ˆβ–ˆβ– | 153/616 [2:24:20<7:06:55, 55.32s/it] 25%|β–ˆβ–ˆβ–Œ | 154/616 [2:25:16<7:07:25, 55.51s/it] {'loss': 1.8188, 'learning_rate': 1.7580943001712457e-05, 'epoch': 2.0}
25%|β–ˆβ–ˆβ–Œ | 154/616 [2:25:16<7:07:25, 55.51s/it] 25%|β–ˆβ–ˆβ–Œ | 155/616 [2:26:40<8:11:56, 64.03s/it] {'loss': 1.7974, 'learning_rate': 1.7546520376712093e-05, 'epoch': 2.01}
25%|β–ˆβ–ˆβ–Œ | 155/616 [2:26:40<8:11:56, 64.03s/it] 25%|β–ˆβ–ˆβ–Œ | 156/616 [2:27:36<7:52:12, 61.59s/it] {'loss': 1.7964, 'learning_rate': 1.7511888775510662e-05, 'epoch': 2.03}
25%|β–ˆβ–ˆβ–Œ | 156/616 [2:27:36<7:52:12, 61.59s/it] 25%|β–ˆβ–ˆβ–Œ | 157/616 [2:28:31<7:36:15, 59.64s/it] {'loss': 1.7515, 'learning_rate': 1.7477049157117093e-05, 'epoch': 2.04}
25%|β–ˆβ–ˆβ–Œ | 157/616 [2:28:31<7:36:15, 59.64s/it] 26%|β–ˆβ–ˆβ–Œ | 158/616 [2:29:26<7:25:42, 58.39s/it] {'loss': 1.7725, 'learning_rate': 1.744200248630068e-05, 'epoch': 2.05}
26%|β–ˆβ–ˆβ–Œ | 158/616 [2:29:26<7:25:42, 58.39s/it] 26%|β–ˆβ–ˆβ–Œ | 159/616 [2:30:22<7:18:37, 57.59s/it] {'loss': 1.7534, 'learning_rate': 1.7406749733564344e-05, 'epoch': 2.06}
26%|β–ˆβ–ˆβ–Œ | 159/616 [2:30:22<7:18:37, 57.59s/it] 26%|β–ˆβ–ˆβ–Œ | 160/616 [2:31:18<7:13:47, 57.08s/it] {'loss': 1.8408, 'learning_rate': 1.737129187511779e-05, 'epoch': 2.08}
26%|β–ˆβ–ˆβ–Œ | 160/616 [2:31:18<7:13:47, 57.08s/it] 26%|β–ˆβ–ˆβ–Œ | 161/616 [2:32:13<7:09:24, 56.63s/it] {'loss': 1.7686, 'learning_rate': 1.7335629892850436e-05, 'epoch': 2.09}
26%|β–ˆβ–ˆβ–Œ | 161/616 [2:32:13<7:09:24, 56.63s/it] 26%|β–ˆβ–ˆβ–‹ | 162/616 [2:33:10<7:08:30, 56.63s/it] {'loss': 1.7642, 'learning_rate': 1.729976477430425e-05, 'epoch': 2.1}
26%|β–ˆβ–ˆβ–‹ | 162/616 [2:33:10<7:08:30, 56.63s/it] 26%|β–ˆβ–ˆβ–‹ | 163/616 [2:34:06<7:05:21, 56.34s/it] {'loss': 1.8047, 'learning_rate': 1.7263697512646397e-05, 'epoch': 2.12}
26%|β–ˆβ–ˆβ–‹ | 163/616 [2:34:06<7:05:21, 56.34s/it] 27%|β–ˆβ–ˆβ–‹ | 164/616 [2:35:02<7:03:59, 56.28s/it] {'loss': 1.8301, 'learning_rate': 1.7227429106641726e-05, 'epoch': 2.13}
27%|β–ˆβ–ˆβ–‹ | 164/616 [2:35:02<7:03:59, 56.28s/it] 27%|β–ˆβ–ˆβ–‹ | 165/616 [2:35:58<7:02:14, 56.17s/it] {'loss': 1.7588, 'learning_rate': 1.7190960560625127e-05, 'epoch': 2.14}
27%|β–ˆβ–ˆβ–‹ | 165/616 [2:35:58<7:02:14, 56.17s/it] 27%|β–ˆβ–ˆβ–‹ | 166/616 [2:36:53<6:59:31, 55.94s/it] {'loss': 1.7749, 'learning_rate': 1.7154292884473712e-05, 'epoch': 2.16}
27%|β–ˆβ–ˆβ–‹ | 166/616 [2:36:53<6:59:31, 55.94s/it] 27%|β–ˆβ–ˆβ–‹ | 167/616 [2:37:49<6:58:57, 55.99s/it] {'loss': 1.7251, 'learning_rate': 1.711742709357886e-05, 'epoch': 2.17}
27%|β–ˆβ–ˆβ–‹ | 167/616 [2:37:49<6:58:57, 55.99s/it] 27%|β–ˆβ–ˆβ–‹ | 168/616 [2:38:44<6:56:05, 55.73s/it] {'loss': 1.7603, 'learning_rate': 1.708036420881807e-05, 'epoch': 2.18}
27%|β–ˆβ–ˆβ–‹ | 168/616 [2:38:44<6:56:05, 55.73s/it] 27%|β–ˆβ–ˆβ–‹ | 169/616 [2:39:41<6:56:55, 55.96s/it] {'loss': 1.7339, 'learning_rate': 1.7043105256526723e-05, 'epoch': 2.19}
27%|β–ˆβ–ˆβ–‹ | 169/616 [2:39:41<6:56:55, 55.96s/it] 28%|β–ˆβ–ˆβ–Š | 170/616 [2:40:38<6:58:19, 56.28s/it] {'loss': 1.731, 'learning_rate': 1.7005651268469652e-05, 'epoch': 2.21}
28%|β–ˆβ–ˆβ–Š | 170/616 [2:40:38<6:58:19, 56.28s/it] 28%|β–ˆβ–ˆβ–Š | 171/616 [2:41:33<6:54:13, 55.85s/it] {'loss': 1.7598, 'learning_rate': 1.6968003281812563e-05, 'epoch': 2.22}
28%|β–ˆβ–ˆβ–Š | 171/616 [2:41:33<6:54:13, 55.85s/it] 28%|β–ˆβ–ˆβ–Š | 172/616 [2:42:29<6:53:17, 55.85s/it] {'loss': 1.7007, 'learning_rate': 1.693016233909332e-05, 'epoch': 2.23}
28%|β–ˆβ–ˆβ–Š | 172/616 [2:42:29<6:53:17, 55.85s/it] 28%|β–ˆβ–ˆβ–Š | 173/616 [2:43:24<6:51:59, 55.80s/it] {'loss': 1.7183, 'learning_rate': 1.689212948819307e-05, 'epoch': 2.25}
28%|β–ˆβ–ˆβ–Š | 173/616 [2:43:24<6:51:59, 55.80s/it] 28%|β–ˆβ–ˆβ–Š | 174/616 [2:44:18<6:47:09, 55.27s/it] {'loss': 1.7173, 'learning_rate': 1.6853905782307235e-05, 'epoch': 2.26}
28%|β–ˆβ–ˆβ–Š | 174/616 [2:44:18<6:47:09, 55.27s/it] 28%|β–ˆβ–ˆβ–Š | 175/616 [2:45:16<6:51:57, 56.05s/it] {'loss': 1.7856, 'learning_rate': 1.681549227991634e-05, 'epoch': 2.27}
28%|β–ˆβ–ˆβ–Š | 175/616 [2:45:16<6:51:57, 56.05s/it] 29%|β–ˆβ–ˆβ–Š | 176/616 [2:46:11<6:48:50, 55.75s/it] {'loss': 1.7329, 'learning_rate': 1.67768900447567e-05, 'epoch': 2.29}
29%|β–ˆβ–ˆβ–Š | 176/616 [2:46:11<6:48:50, 55.75s/it] 29%|β–ˆβ–ˆβ–Š | 177/616 [2:47:07<6:46:59, 55.63s/it] {'loss': 1.7578, 'learning_rate': 1.6738100145790977e-05, 'epoch': 2.3}
29%|β–ˆβ–ˆβ–Š | 177/616 [2:47:07<6:46:59, 55.63s/it] 29%|β–ˆβ–ˆβ–‰ | 178/616 [2:48:03<6:46:51, 55.73s/it] {'loss': 1.6846, 'learning_rate': 1.6699123657178553e-05, 'epoch': 2.31}
29%|β–ˆβ–ˆβ–‰ | 178/616 [2:48:03<6:46:51, 55.73s/it] 29%|β–ˆβ–ˆβ–‰ | 179/616 [2:48:57<6:43:53, 55.45s/it] {'loss': 1.791, 'learning_rate': 1.6659961658245813e-05, 'epoch': 2.32}
29%|β–ˆβ–ˆβ–‰ | 179/616 [2:48:57<6:43:53, 55.45s/it] 29%|β–ˆβ–ˆβ–‰ | 180/616 [2:49:53<6:43:27, 55.52s/it] {'loss': 1.7798, 'learning_rate': 1.6620615233456235e-05, 'epoch': 2.34}
29%|β–ˆβ–ˆβ–‰ | 180/616 [2:49:53<6:43:27, 55.52s/it] 29%|β–ˆβ–ˆβ–‰ | 181/616 [2:50:49<6:43:06, 55.60s/it] {'loss': 1.6987, 'learning_rate': 1.658108547238038e-05, 'epoch': 2.35}
29%|β–ˆβ–ˆβ–‰ | 181/616 [2:50:49<6:43:06, 55.60s/it] 30%|β–ˆβ–ˆβ–‰ | 182/616 [2:51:45<6:42:48, 55.69s/it] {'loss': 1.7202, 'learning_rate': 1.6541373469665688e-05, 'epoch': 2.36}
30%|β–ˆβ–ˆβ–‰ | 182/616 [2:51:45<6:42:48, 55.69s/it] 30%|β–ˆβ–ˆβ–‰ | 183/616 [2:52:40<6:40:16, 55.46s/it] {'loss': 1.7285, 'learning_rate': 1.6501480325006206e-05, 'epoch': 2.38}
30%|β–ˆβ–ˆβ–‰ | 183/616 [2:52:40<6:40:16, 55.46s/it] 30%|β–ˆβ–ˆβ–‰ | 184/616 [2:53:35<6:38:17, 55.32s/it] {'loss': 1.7417, 'learning_rate': 1.64614071431121e-05, 'epoch': 2.39}
30%|β–ˆβ–ˆβ–‰ | 184/616 [2:53:35<6:38:17, 55.32s/it] 30%|β–ˆβ–ˆβ–ˆ | 185/616 [2:54:31<6:38:58, 55.54s/it] {'loss': 1.79, 'learning_rate': 1.6421155033679085e-05, 'epoch': 2.4}
30%|β–ˆβ–ˆβ–ˆ | 185/616 [2:54:31<6:38:58, 55.54s/it] 30%|β–ˆβ–ˆβ–ˆ | 186/616 [2:55:27<6:38:52, 55.66s/it] {'loss': 1.7876, 'learning_rate': 1.6380725111357693e-05, 'epoch': 2.42}
30%|β–ˆβ–ˆβ–ˆ | 186/616 [2:55:27<6:38:52, 55.66s/it] 30%|β–ˆβ–ˆβ–ˆ | 187/616 [2:56:23<6:39:32, 55.88s/it] {'loss': 1.7734, 'learning_rate': 1.634011849572239e-05, 'epoch': 2.43}
30%|β–ˆβ–ˆβ–ˆ | 187/616 [2:56:23<6:39:32, 55.88s/it] 31%|β–ˆβ–ˆβ–ˆ | 188/616 [2:57:18<6:37:16, 55.69s/it] {'loss': 1.7686, 'learning_rate': 1.6299336311240593e-05, 'epoch': 2.44}
31%|β–ˆβ–ˆβ–ˆ | 188/616 [2:57:18<6:37:16, 55.69s/it] 31%|β–ˆβ–ˆβ–ˆ | 189/616 [2:58:15<6:38:07, 55.94s/it] {'loss': 1.7993, 'learning_rate': 1.6258379687241533e-05, 'epoch': 2.45}
31%|β–ˆβ–ˆβ–ˆ | 189/616 [2:58:15<6:38:07, 55.94s/it] 31%|β–ˆβ–ˆβ–ˆ | 190/616 [2:59:09<6:34:19, 55.54s/it] {'loss': 1.708, 'learning_rate': 1.6217249757884954e-05, 'epoch': 2.47}
31%|β–ˆβ–ˆβ–ˆ | 190/616 [2:59:09<6:34:19, 55.54s/it] 31%|β–ˆβ–ˆβ–ˆ | 191/616 [3:00:05<6:33:15, 55.52s/it] {'loss': 1.7065, 'learning_rate': 1.6175947662129735e-05, 'epoch': 2.48}
31%|β–ˆβ–ˆβ–ˆ | 191/616 [3:00:05<6:33:15, 55.52s/it] 31%|β–ˆβ–ˆβ–ˆ | 192/616 [3:01:00<6:32:25, 55.53s/it] {'loss': 1.7324, 'learning_rate': 1.6134474543702353e-05, 'epoch': 2.49}
31%|β–ˆβ–ˆβ–ˆ | 192/616 [3:01:00<6:32:25, 55.53s/it] 31%|β–ˆβ–ˆβ–ˆβ– | 193/616 [3:01:56<6:31:58, 55.60s/it] {'loss': 1.7686, 'learning_rate': 1.609283155106517e-05, 'epoch': 2.51}
31%|β–ˆβ–ˆβ–ˆβ– | 193/616 [3:01:56<6:31:58, 55.60s/it] 31%|β–ˆβ–ˆβ–ˆβ– | 194/616 [3:02:51<6:30:30, 55.52s/it] {'loss': 1.7563, 'learning_rate': 1.605101983738468e-05, 'epoch': 2.52}
31%|β–ˆβ–ˆβ–ˆβ– | 194/616 [3:02:51<6:30:30, 55.52s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 195/616 [3:03:48<6:31:31, 55.80s/it] {'loss': 1.7373, 'learning_rate': 1.6009040560499548e-05, 'epoch': 2.53}
32%|β–ˆβ–ˆβ–ˆβ– | 195/616 [3:03:48<6:31:31, 55.80s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 196/616 [3:04:44<6:32:05, 56.01s/it] {'loss': 1.7104, 'learning_rate': 1.596689488288856e-05, 'epoch': 2.55}
32%|β–ˆβ–ˆβ–ˆβ– | 196/616 [3:04:44<6:32:05, 56.01s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 197/616 [3:05:40<6:29:58, 55.84s/it] {'loss': 1.7368, 'learning_rate': 1.5924583971638416e-05, 'epoch': 2.56}
32%|β–ˆβ–ˆβ–ˆβ– | 197/616 [3:05:40<6:29:58, 55.84s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 198/616 [3:06:36<6:30:01, 55.99s/it] {'loss': 1.7886, 'learning_rate': 1.5882108998411427e-05, 'epoch': 2.57}
32%|β–ˆβ–ˆβ–ˆβ– | 198/616 [3:06:36<6:30:01, 55.99s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 199/616 [3:07:32<6:28:20, 55.88s/it] {'loss': 1.6855, 'learning_rate': 1.5839471139413065e-05, 'epoch': 2.58}
32%|β–ˆβ–ˆβ–ˆβ– | 199/616 [3:07:32<6:28:20, 55.88s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 200/616 [3:08:27<6:25:31, 55.60s/it] {'loss': 1.7158, 'learning_rate': 1.5796671575359382e-05, 'epoch': 2.6}
32%|β–ˆβ–ˆβ–ˆβ– | 200/616 [3:08:27<6:25:31, 55.60s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 201/616 [3:10:31<8:46:36, 76.14s/it] {'loss': 1.7144, 'learning_rate': 1.5753711491444336e-05, 'epoch': 2.61}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 201/616 [3:10:31<8:46:36, 76.14s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 202/616 [3:11:27<8:03:20, 70.05s/it] {'loss': 1.6909, 'learning_rate': 1.571059207730695e-05, 'epoch': 2.62}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 202/616 [3:11:27<8:03:20, 70.05s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 203/616 [3:12:23<7:33:14, 65.85s/it] {'loss': 1.8003, 'learning_rate': 1.5667314526998373e-05, 'epoch': 2.64}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 203/616 [3:12:23<7:33:14, 65.85s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 204/616 [3:13:19<7:11:50, 62.89s/it] {'loss': 1.7231, 'learning_rate': 1.5623880038948828e-05, 'epoch': 2.65}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 204/616 [3:13:19<7:11:50, 62.89s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 205/616 [3:14:14<6:55:21, 60.64s/it] {'loss': 1.6816, 'learning_rate': 1.55802898159344e-05, 'epoch': 2.66}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 205/616 [3:14:14<6:55:21, 60.64s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 206/616 [3:15:10<6:43:56, 59.11s/it] {'loss': 1.6826, 'learning_rate': 1.553654506504377e-05, 'epoch': 2.68}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 206/616 [3:15:10<6:43:56, 59.11s/it] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 207/616 [3:16:06<6:36:32, 58.17s/it] {'loss': 1.7085, 'learning_rate': 1.5492646997644737e-05, 'epoch': 2.69}
34%|β–ˆβ–ˆβ–ˆβ–Ž | 207/616 [3:16:06<6:36:32, 58.17s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 208/616 [3:17:01<6:29:54, 57.34s/it] {'loss': 1.6797, 'learning_rate': 1.5448596829350706e-05, 'epoch': 2.7}
34%|β–ˆβ–ˆβ–ˆβ– | 208/616 [3:17:01<6:29:54, 57.34s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 209/616 [3:17:56<6:24:38, 56.70s/it] {'loss': 1.708, 'learning_rate': 1.540439577998703e-05, 'epoch': 2.71}
34%|β–ˆβ–ˆβ–ˆβ– | 209/616 [3:17:56<6:24:38, 56.70s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 210/616 [3:18:51<6:20:13, 56.19s/it] {'loss': 1.7036, 'learning_rate': 1.5360045073557214e-05, 'epoch': 2.73}
34%|β–ˆβ–ˆβ–ˆβ– | 210/616 [3:18:51<6:20:13, 56.19s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 211/616 [3:19:47<6:17:35, 55.94s/it] {'loss': 1.7129, 'learning_rate': 1.5315545938209016e-05, 'epoch': 2.74}
34%|β–ˆβ–ˆβ–ˆβ– | 211/616 [3:19:47<6:17:35, 55.94s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 212/616 [3:20:42<6:15:56, 55.83s/it] {'loss': 1.6855, 'learning_rate': 1.527089960620046e-05, 'epoch': 2.75}
34%|β–ˆβ–ˆβ–ˆβ– | 212/616 [3:20:42<6:15:56, 55.83s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 213/616 [3:21:37<6:12:54, 55.52s/it] {'loss': 1.645, 'learning_rate': 1.5226107313865701e-05, 'epoch': 2.77}
35%|β–ˆβ–ˆβ–ˆβ– | 213/616 [3:21:37<6:12:54, 55.52s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 214/616 [3:22:32<6:11:06, 55.39s/it] {'loss': 1.6982, 'learning_rate': 1.5181170301580776e-05, 'epoch': 2.78}
35%|β–ˆβ–ˆβ–ˆβ– | 214/616 [3:22:32<6:11:06, 55.39s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 215/616 [3:23:27<6:09:14, 55.25s/it] {'loss': 1.731, 'learning_rate': 1.5136089813729276e-05, 'epoch': 2.79}
35%|β–ˆβ–ˆβ–ˆβ– | 215/616 [3:23:27<6:09:14, 55.25s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 216/616 [3:24:22<6:08:42, 55.31s/it] {'loss': 1.7192, 'learning_rate': 1.509086709866788e-05, 'epoch': 2.81}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 216/616 [3:24:22<6:08:42, 55.31s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 217/616 [3:25:18<6:09:08, 55.51s/it] {'loss': 1.6982, 'learning_rate': 1.5045503408691776e-05, 'epoch': 2.82}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 217/616 [3:25:18<6:09:08, 55.51s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 218/616 [3:26:15<6:10:32, 55.86s/it] {'loss': 1.7266, 'learning_rate': 1.5000000000000002e-05, 'epoch': 2.83}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 218/616 [3:26:15<6:10:32, 55.86s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 219/616 [3:27:11<6:08:45, 55.73s/it] {'loss': 1.6958, 'learning_rate': 1.495435813266064e-05, 'epoch': 2.84}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 219/616 [3:27:11<6:08:45, 55.73s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 220/616 [3:28:06<6:07:56, 55.75s/it] {'loss': 1.7056, 'learning_rate': 1.4908579070575936e-05, 'epoch': 2.86}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 220/616 [3:28:06<6:07:56, 55.75s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 221/616 [3:29:02<6:07:44, 55.86s/it] {'loss': 1.6943, 'learning_rate': 1.4862664081447297e-05, 'epoch': 2.87}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 221/616 [3:29:02<6:07:44, 55.86s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 222/616 [3:29:57<6:04:46, 55.55s/it] {'loss': 1.6724, 'learning_rate': 1.4816614436740184e-05, 'epoch': 2.88}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 222/616 [3:29:57<6:04:46, 55.55s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 223/616 [3:30:52<6:02:26, 55.34s/it] {'loss': 1.6641, 'learning_rate': 1.4770431411648898e-05, 'epoch': 2.9}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 223/616 [3:30:52<6:02:26, 55.34s/it] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 224/616 [3:31:48<6:02:46, 55.53s/it] {'loss': 1.7461, 'learning_rate': 1.4724116285061278e-05, 'epoch': 2.91}
36%|β–ˆβ–ˆβ–ˆβ–‹ | 224/616 [3:31:48<6:02:46, 55.53s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 225/616 [3:32:43<5:59:56, 55.23s/it] {'loss': 1.7207, 'learning_rate': 1.4677670339523285e-05, 'epoch': 2.92}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 225/616 [3:32:43<5:59:56, 55.23s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 226/616 [3:33:39<6:02:09, 55.72s/it] {'loss': 1.7061, 'learning_rate': 1.4631094861203478e-05, 'epoch': 2.94}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 226/616 [3:33:39<6:02:09, 55.72s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 227/616 [3:34:35<6:00:28, 55.60s/it] {'loss': 1.6758, 'learning_rate': 1.4584391139857407e-05, 'epoch': 2.95}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 227/616 [3:34:35<6:00:28, 55.60s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 228/616 [3:35:31<6:00:26, 55.74s/it] {'loss': 1.73, 'learning_rate': 1.4537560468791889e-05, 'epoch': 2.96}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 228/616 [3:35:31<6:00:26, 55.74s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 229/616 [3:36:26<5:57:53, 55.49s/it] {'loss': 1.7314, 'learning_rate': 1.4490604144829204e-05, 'epoch': 2.97}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 229/616 [3:36:26<5:57:53, 55.49s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 230/616 [3:37:21<5:56:16, 55.38s/it] {'loss': 1.7114, 'learning_rate': 1.4443523468271168e-05, 'epoch': 2.99}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 230/616 [3:37:21<5:56:16, 55.38s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 231/616 [3:38:18<5:58:35, 55.89s/it] {'loss': 1.7212, 'learning_rate': 1.4396319742863145e-05, 'epoch': 3.0}
38%|β–ˆβ–ˆβ–ˆβ–Š | 231/616 [3:38:18<5:58:35, 55.89s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 232/616 [3:39:42<6:51:47, 64.34s/it] {'loss': 1.7036, 'learning_rate': 1.4348994275757933e-05, 'epoch': 3.01}
38%|β–ˆβ–ˆβ–ˆβ–Š | 232/616 [3:39:42<6:51:47, 64.34s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 233/616 [3:40:38<6:34:52, 61.86s/it] {'loss': 1.71, 'learning_rate': 1.4301548377479562e-05, 'epoch': 3.03}
38%|β–ˆβ–ˆβ–ˆβ–Š | 233/616 [3:40:38<6:34:52, 61.86s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 234/616 [3:41:33<6:20:43, 59.80s/it] {'loss': 1.7432, 'learning_rate': 1.4253983361887017e-05, 'epoch': 3.04}
38%|β–ˆβ–ˆβ–ˆβ–Š | 234/616 [3:41:33<6:20:43, 59.80s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 235/616 [3:42:29<6:12:23, 58.65s/it] {'loss': 1.6992, 'learning_rate': 1.4206300546137844e-05, 'epoch': 3.05}
38%|β–ˆβ–ˆβ–ˆβ–Š | 235/616 [3:42:29<6:12:23, 58.65s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 236/616 [3:43:24<6:05:20, 57.69s/it] {'loss': 1.7271, 'learning_rate': 1.415850125065168e-05, 'epoch': 3.06}
38%|β–ˆβ–ˆβ–ˆβ–Š | 236/616 [3:43:24<6:05:20, 57.69s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 237/616 [3:44:19<5:59:01, 56.84s/it] {'loss': 1.6792, 'learning_rate': 1.4110586799073684e-05, 'epoch': 3.08}
38%|β–ˆβ–ˆβ–ˆβ–Š | 237/616 [3:44:19<5:59:01, 56.84s/it] 39%|β–ˆβ–ˆβ–ˆβ–Š | 238/616 [3:45:15<5:56:01, 56.51s/it] {'loss': 1.73, 'learning_rate': 1.4062558518237893e-05, 'epoch': 3.09}
39%|β–ˆβ–ˆβ–ˆβ–Š | 238/616 [3:45:15<5:56:01, 56.51s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 239/616 [3:46:11<5:53:55, 56.33s/it] {'loss': 1.7192, 'learning_rate': 1.4014417738130464e-05, 'epoch': 3.1}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 239/616 [3:46:11<5:53:55, 56.33s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 240/616 [3:47:06<5:50:00, 55.85s/it] {'loss': 1.7476, 'learning_rate': 1.3966165791852862e-05, 'epoch': 3.12}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 240/616 [3:47:06<5:50:00, 55.85s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 241/616 [3:48:02<5:49:47, 55.97s/it] {'loss': 1.6958, 'learning_rate': 1.3917804015584932e-05, 'epoch': 3.13}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 241/616 [3:48:02<5:49:47, 55.97s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 242/616 [3:48:57<5:47:38, 55.77s/it] {'loss': 1.6865, 'learning_rate': 1.3869333748547901e-05, 'epoch': 3.14}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 242/616 [3:48:57<5:47:38, 55.77s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 243/616 [3:49:53<5:46:15, 55.70s/it] {'loss': 1.668, 'learning_rate': 1.3820756332967294e-05, 'epoch': 3.16}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 243/616 [3:49:53<5:46:15, 55.70s/it] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 244/616 [3:50:48<5:44:15, 55.53s/it] {'loss': 1.6826, 'learning_rate': 1.3772073114035762e-05, 'epoch': 3.17}
40%|β–ˆβ–ˆβ–ˆβ–‰ | 244/616 [3:50:48<5:44:15, 55.53s/it] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 245/616 [3:51:43<5:42:32, 55.40s/it] {'loss': 1.7227, 'learning_rate': 1.3723285439875836e-05, 'epoch': 3.18}
40%|β–ˆβ–ˆβ–ˆβ–‰ | 245/616 [3:51:43<5:42:32, 55.40s/it] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 246/616 [3:52:39<5:41:59, 55.46s/it] {'loss': 1.7163, 'learning_rate': 1.3674394661502595e-05, 'epoch': 3.19}
40%|β–ˆβ–ˆβ–ˆβ–‰ | 246/616 [3:52:39<5:41:59, 55.46s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 247/616 [3:53:35<5:42:19, 55.66s/it] {'loss': 1.6606, 'learning_rate': 1.3625402132786247e-05, 'epoch': 3.21}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 247/616 [3:53:35<5:42:19, 55.66s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 248/616 [3:54:31<5:42:14, 55.80s/it] {'loss': 1.7085, 'learning_rate': 1.3576309210414646e-05, 'epoch': 3.22}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 248/616 [3:54:31<5:42:14, 55.80s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 249/616 [3:55:26<5:40:19, 55.64s/it] {'loss': 1.668, 'learning_rate': 1.352711725385572e-05, 'epoch': 3.23}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 249/616 [3:55:26<5:40:19, 55.64s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 250/616 [3:56:22<5:39:13, 55.61s/it] {'loss': 1.7173, 'learning_rate': 1.3477827625319826e-05, 'epoch': 3.25}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 250/616 [3:56:22<5:39:13, 55.61s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 251/616 [3:57:17<5:38:23, 55.63s/it] {'loss': 1.7656, 'learning_rate': 1.3428441689722023e-05, 'epoch': 3.26}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 251/616 [3:57:17<5:38:23, 55.63s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 252/616 [3:58:14<5:38:25, 55.78s/it] {'loss': 1.6812, 'learning_rate': 1.3378960814644283e-05, 'epoch': 3.27}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 252/616 [3:58:14<5:38:25, 55.78s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 253/616 [3:59:09<5:36:11, 55.57s/it] {'loss': 1.6953, 'learning_rate': 1.3329386370297615e-05, 'epoch': 3.29}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 253/616 [3:59:09<5:36:11, 55.57s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 254/616 [4:00:04<5:35:02, 55.53s/it] {'loss': 1.665, 'learning_rate': 1.3279719729484117e-05, 'epoch': 3.3}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 254/616 [4:00:04<5:35:02, 55.53s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 255/616 [4:00:59<5:33:43, 55.47s/it] {'loss': 1.6587, 'learning_rate': 1.3229962267558982e-05, 'epoch': 3.31}
41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 255/616 [4:00:59<5:33:43, 55.47s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 256/616 [4:01:55<5:33:39, 55.61s/it] {'loss': 1.6797, 'learning_rate': 1.3180115362392383e-05, 'epoch': 3.32}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 256/616 [4:01:55<5:33:39, 55.61s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 257/616 [4:02:51<5:32:48, 55.62s/it] {'loss': 1.6992, 'learning_rate': 1.3130180394331335e-05, 'epoch': 3.34}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 257/616 [4:02:51<5:32:48, 55.62s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 258/616 [4:03:47<5:32:16, 55.69s/it] {'loss': 1.6567, 'learning_rate': 1.3080158746161468e-05, 'epoch': 3.35}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 258/616 [4:03:47<5:32:16, 55.69s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 259/616 [4:04:42<5:31:01, 55.63s/it] {'loss': 1.6641, 'learning_rate': 1.3030051803068729e-05, 'epoch': 3.36}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 259/616 [4:04:42<5:31:01, 55.63s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 260/616 [4:05:39<5:31:17, 55.84s/it] {'loss': 1.6841, 'learning_rate': 1.2979860952601038e-05, 'epoch': 3.38}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 260/616 [4:05:39<5:31:17, 55.84s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 261/616 [4:06:33<5:28:37, 55.54s/it] {'loss': 1.6777, 'learning_rate': 1.2929587584629845e-05, 'epoch': 3.39}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 261/616 [4:06:33<5:28:37, 55.54s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 262/616 [4:07:30<5:29:36, 55.87s/it] {'loss': 1.7065, 'learning_rate': 1.2879233091311667e-05, 'epoch': 3.4}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 262/616 [4:07:30<5:29:36, 55.87s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 263/616 [4:08:26<5:28:11, 55.78s/it] {'loss': 1.6997, 'learning_rate': 1.2828798867049504e-05, 'epoch': 3.42}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 263/616 [4:08:26<5:28:11, 55.78s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 264/616 [4:09:21<5:27:20, 55.80s/it] {'loss': 1.6704, 'learning_rate': 1.2778286308454255e-05, 'epoch': 3.43}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 264/616 [4:09:21<5:27:20, 55.80s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 265/616 [4:10:16<5:24:37, 55.49s/it] {'loss': 1.6489, 'learning_rate': 1.2727696814306034e-05, 'epoch': 3.44}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 265/616 [4:10:16<5:24:37, 55.49s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 266/616 [4:11:12<5:23:30, 55.46s/it] {'loss': 1.6777, 'learning_rate': 1.2677031785515423e-05, 'epoch': 3.45}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 266/616 [4:11:12<5:23:30, 55.46s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 267/616 [4:12:07<5:22:50, 55.50s/it] {'loss': 1.6284, 'learning_rate': 1.26262926250847e-05, 'epoch': 3.47}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 267/616 [4:12:07<5:22:50, 55.50s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 268/616 [4:13:03<5:21:36, 55.45s/it] {'loss': 1.6445, 'learning_rate': 1.2575480738068971e-05, 'epoch': 3.48}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 268/616 [4:13:03<5:21:36, 55.45s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 269/616 [4:13:58<5:20:21, 55.39s/it] {'loss': 1.626, 'learning_rate': 1.2524597531537261e-05, 'epoch': 3.49}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 269/616 [4:13:58<5:20:21, 55.39s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 270/616 [4:14:54<5:19:56, 55.48s/it] {'loss': 1.626, 'learning_rate': 1.2473644414533573e-05, 'epoch': 3.51}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 270/616 [4:14:54<5:19:56, 55.48s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 271/616 [4:15:50<5:20:41, 55.77s/it] {'loss': 1.6919, 'learning_rate': 1.2422622798037833e-05, 'epoch': 3.52}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 271/616 [4:15:50<5:20:41, 55.77s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 272/616 [4:16:46<5:20:14, 55.86s/it] {'loss': 1.6602, 'learning_rate': 1.2371534094926852e-05, 'epoch': 3.53}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 272/616 [4:16:46<5:20:14, 55.86s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 273/616 [4:17:42<5:18:58, 55.80s/it] {'loss': 1.6401, 'learning_rate': 1.232037971993517e-05, 'epoch': 3.55}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 273/616 [4:17:42<5:18:58, 55.80s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 274/616 [4:18:36<5:16:22, 55.50s/it] {'loss': 1.7026, 'learning_rate': 1.2269161089615902e-05, 'epoch': 3.56}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 274/616 [4:18:37<5:16:22, 55.50s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 275/616 [4:19:32<5:15:51, 55.58s/it] {'loss': 1.6875, 'learning_rate': 1.2217879622301514e-05, 'epoch': 3.57}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 275/616 [4:19:32<5:15:51, 55.58s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 276/616 [4:20:27<5:14:12, 55.45s/it] {'loss': 1.6646, 'learning_rate': 1.2166536738064523e-05, 'epoch': 3.58}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 276/616 [4:20:27<5:14:12, 55.45s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 277/616 [4:21:23<5:13:32, 55.49s/it] {'loss': 1.6631, 'learning_rate': 1.2115133858678192e-05, 'epoch': 3.6}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 277/616 [4:21:23<5:13:32, 55.49s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 278/616 [4:22:19<5:13:43, 55.69s/it] {'loss': 1.6196, 'learning_rate': 1.2063672407577154e-05, 'epoch': 3.61}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 278/616 [4:22:19<5:13:43, 55.69s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 279/616 [4:23:14<5:11:50, 55.52s/it] {'loss': 1.6606, 'learning_rate': 1.2012153809817992e-05, 'epoch': 3.62}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 279/616 [4:23:14<5:11:50, 55.52s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 280/616 [4:24:10<5:11:51, 55.69s/it] {'loss': 1.6719, 'learning_rate': 1.1960579492039783e-05, 'epoch': 3.64}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 280/616 [4:24:10<5:11:51, 55.69s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 281/616 [4:25:07<5:11:43, 55.83s/it] {'loss': 1.6958, 'learning_rate': 1.1908950882424581e-05, 'epoch': 3.65}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 281/616 [4:25:07<5:11:43, 55.83s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 282/616 [4:26:03<5:12:04, 56.06s/it] {'loss': 1.645, 'learning_rate': 1.1857269410657883e-05, 'epoch': 3.66}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 282/616 [4:26:03<5:12:04, 56.06s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 283/616 [4:27:01<5:13:38, 56.51s/it] {'loss': 1.6782, 'learning_rate': 1.1805536507889021e-05, 'epoch': 3.68}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 283/616 [4:27:01<5:13:38, 56.51s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 284/616 [4:27:56<5:10:37, 56.14s/it] {'loss': 1.6724, 'learning_rate': 1.1753753606691554e-05, 'epoch': 3.69}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 284/616 [4:27:56<5:10:37, 56.14s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 285/616 [4:28:52<5:09:53, 56.17s/it] {'loss': 1.6108, 'learning_rate': 1.1701922141023566e-05, 'epoch': 3.7}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 285/616 [4:28:52<5:09:53, 56.17s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 286/616 [4:29:47<5:06:06, 55.66s/it] {'loss': 1.6313, 'learning_rate': 1.1650043546187994e-05, 'epoch': 3.71}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 286/616 [4:29:47<5:06:06, 55.66s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 287/616 [4:30:42<5:05:23, 55.70s/it] {'loss': 1.647, 'learning_rate': 1.1598119258792848e-05, 'epoch': 3.73}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 287/616 [4:30:42<5:05:23, 55.70s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 288/616 [4:31:38<5:04:18, 55.67s/it] {'loss': 1.6816, 'learning_rate': 1.1546150716711448e-05, 'epoch': 3.74}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 288/616 [4:31:38<5:04:18, 55.67s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 289/616 [4:32:34<5:03:48, 55.74s/it] {'loss': 1.6846, 'learning_rate': 1.1494139359042612e-05, 'epoch': 3.75}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 289/616 [4:32:34<5:03:48, 55.74s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 290/616 [4:33:30<5:04:10, 55.98s/it] {'loss': 1.6602, 'learning_rate': 1.1442086626070781e-05, 'epoch': 3.77}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 290/616 [4:33:30<5:04:10, 55.98s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 291/616 [4:34:26<5:02:43, 55.89s/it] {'loss': 1.6133, 'learning_rate': 1.1389993959226163e-05, 'epoch': 3.78}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 291/616 [4:34:26<5:02:43, 55.89s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 292/616 [4:35:22<5:01:18, 55.80s/it] {'loss': 1.6997, 'learning_rate': 1.1337862801044792e-05, 'epoch': 3.79}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 292/616 [4:35:22<5:01:18, 55.80s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 293/616 [4:36:18<5:00:40, 55.85s/it] {'loss': 1.6172, 'learning_rate': 1.1285694595128606e-05, 'epoch': 3.81}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 293/616 [4:36:18<5:00:40, 55.85s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 294/616 [4:37:13<4:59:35, 55.82s/it] {'loss': 1.6479, 'learning_rate': 1.123349078610545e-05, 'epoch': 3.82}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 294/616 [4:37:13<4:59:35, 55.82s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 295/616 [4:38:10<4:59:14, 55.93s/it] {'loss': 1.6851, 'learning_rate': 1.1181252819589081e-05, 'epoch': 3.83}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 295/616 [4:38:10<4:59:14, 55.93s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 296/616 [4:39:06<4:58:40, 56.00s/it] {'loss': 1.6533, 'learning_rate': 1.1128982142139142e-05, 'epoch': 3.84}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 296/616 [4:39:06<4:58:40, 56.00s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 297/616 [4:40:02<4:58:04, 56.06s/it] {'loss': 1.6367, 'learning_rate': 1.1076680201221093e-05, 'epoch': 3.86}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 297/616 [4:40:02<4:58:04, 56.06s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 298/616 [4:40:58<4:56:22, 55.92s/it] {'loss': 1.6426, 'learning_rate': 1.1024348445166133e-05, 'epoch': 3.87}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 298/616 [4:40:58<4:56:22, 55.92s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 299/616 [4:41:54<4:56:48, 56.18s/it] {'loss': 1.6509, 'learning_rate': 1.0971988323131099e-05, 'epoch': 3.88}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 299/616 [4:41:54<4:56:48, 56.18s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 300/616 [4:42:49<4:53:36, 55.75s/it] {'loss': 1.6997, 'learning_rate': 1.091960128505833e-05, 'epoch': 3.9}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 300/616 [4:42:49<4:53:36, 55.75s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 301/616 [4:44:56<6:44:24, 77.03s/it] {'loss': 1.6187, 'learning_rate': 1.086718878163551e-05, 'epoch': 3.91}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 301/616 [4:44:56<6:44:24, 77.03s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 302/616 [4:45:52<6:09:55, 70.69s/it] {'loss': 1.6914, 'learning_rate': 1.0814752264255508e-05, 'epoch': 3.92}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 302/616 [4:45:52<6:09:55, 70.69s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 303/616 [4:46:47<5:44:48, 66.10s/it] {'loss': 1.6421, 'learning_rate': 1.0762293184976178e-05, 'epoch': 3.94}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 303/616 [4:46:47<5:44:48, 66.10s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 304/616 [4:47:42<5:26:46, 62.84s/it] {'loss': 1.6631, 'learning_rate': 1.070981299648016e-05, 'epoch': 3.95}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 304/616 [4:47:42<5:26:46, 62.84s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 305/616 [4:48:38<5:14:34, 60.69s/it] {'loss': 1.7046, 'learning_rate': 1.0657313152034634e-05, 'epoch': 3.96}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 305/616 [4:48:38<5:14:34, 60.69s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 306/616 [4:49:33<5:04:42, 58.97s/it] {'loss': 1.5845, 'learning_rate': 1.0604795105451096e-05, 'epoch': 3.97}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 306/616 [4:49:33<5:04:42, 58.97s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 307/616 [4:50:29<4:58:34, 57.97s/it] {'loss': 1.6621, 'learning_rate': 1.0552260311045082e-05, 'epoch': 3.99}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 307/616 [4:50:29<4:58:34, 57.97s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 308/616 [4:51:24<4:53:57, 57.26s/it] {'loss': 1.6782, 'learning_rate': 1.0499710223595913e-05, 'epoch': 4.0}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 308/616 [4:51:24<4:53:57, 57.26s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 309/616 [4:52:56<5:46:30, 67.72s/it] {'loss': 1.6611, 'learning_rate': 1.0447146298306394e-05, 'epoch': 4.01}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 309/616 [4:52:56<5:46:30, 67.72s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 310/616 [4:53:52<5:26:19, 63.98s/it] {'loss': 1.6626, 'learning_rate': 1.0394569990762528e-05, 'epoch': 4.03}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 310/616 [4:53:52<5:26:19, 63.98s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 311/616 [4:54:47<5:11:51, 61.35s/it] {'loss': 1.6406, 'learning_rate': 1.0341982756893203e-05, 'epoch': 4.04}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 311/616 [4:54:47<5:11:51, 61.35s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 312/616 [4:55:42<5:01:12, 59.45s/it] {'loss': 1.6455, 'learning_rate': 1.0289386052929874e-05, 'epoch': 4.05}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 312/616 [4:55:42<5:01:12, 59.45s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 313/616 [4:56:37<4:53:24, 58.10s/it] {'loss': 1.7051, 'learning_rate': 1.0236781335366239e-05, 'epoch': 4.06}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 313/616 [4:56:37<4:53:24, 58.10s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 314/616 [4:57:32<4:47:47, 57.18s/it] {'loss': 1.5967, 'learning_rate': 1.0184170060917914e-05, 'epoch': 4.08}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 314/616 [4:57:32<4:47:47, 57.18s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 315/616 [4:58:28<4:45:07, 56.84s/it] {'loss': 1.6772, 'learning_rate': 1.0131553686482077e-05, 'epoch': 4.09}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 315/616 [4:58:28<4:45:07, 56.84s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 316/616 [4:59:24<4:42:42, 56.54s/it] {'loss': 1.625, 'learning_rate': 1.0078933669097135e-05, 'epoch': 4.1}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 316/616 [4:59:24<4:42:42, 56.54s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 317/616 [5:00:19<4:40:23, 56.27s/it] {'loss': 1.6572, 'learning_rate': 1.002631146590238e-05, 'epoch': 4.12}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 317/616 [5:00:19<4:40:23, 56.27s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 318/616 [5:01:17<4:41:30, 56.68s/it] {'loss': 1.6694, 'learning_rate': 9.973688534097624e-06, 'epoch': 4.13}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 318/616 [5:01:17<4:41:30, 56.68s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 319/616 [5:02:13<4:39:49, 56.53s/it] {'loss': 1.6377, 'learning_rate': 9.92106633090287e-06, 'epoch': 4.14}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 319/616 [5:02:13<4:39:49, 56.53s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 320/616 [5:03:08<4:36:51, 56.12s/it] {'loss': 1.6782, 'learning_rate': 9.868446313517927e-06, 'epoch': 4.16}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 320/616 [5:03:08<4:36:51, 56.12s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 321/616 [5:04:04<4:35:33, 56.04s/it] {'loss': 1.6147, 'learning_rate': 9.815829939082087e-06, 'epoch': 4.17}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 321/616 [5:04:04<4:35:33, 56.04s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 322/616 [5:05:00<4:33:42, 55.86s/it] {'loss': 1.6826, 'learning_rate': 9.763218664633763e-06, 'epoch': 4.18}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 322/616 [5:05:00<4:33:42, 55.86s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 323/616 [5:05:56<4:32:53, 55.88s/it] {'loss': 1.7041, 'learning_rate': 9.710613947070127e-06, 'epoch': 4.19}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 323/616 [5:05:56<4:32:53, 55.88s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 324/616 [5:06:51<4:31:22, 55.76s/it] {'loss': 1.6343, 'learning_rate': 9.658017243106802e-06, 'epoch': 4.21}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 324/616 [5:06:51<4:31:22, 55.76s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 325/616 [5:07:47<4:30:07, 55.69s/it] {'loss': 1.6724, 'learning_rate': 9.605430009237474e-06, 'epoch': 4.22}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 325/616 [5:07:47<4:30:07, 55.69s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 326/616 [5:08:42<4:28:06, 55.47s/it] {'loss': 1.6812, 'learning_rate': 9.552853701693606e-06, 'epoch': 4.23}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 326/616 [5:08:42<4:28:06, 55.47s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 327/616 [5:09:37<4:27:10, 55.47s/it] {'loss': 1.6289, 'learning_rate': 9.50028977640409e-06, 'epoch': 4.25}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 327/616 [5:09:37<4:27:10, 55.47s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 328/616 [5:10:33<4:26:54, 55.60s/it] {'loss': 1.6313, 'learning_rate': 9.44773968895492e-06, 'epoch': 4.26}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 328/616 [5:10:33<4:26:54, 55.60s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 329/616 [5:11:29<4:26:11, 55.65s/it] {'loss': 1.6274, 'learning_rate': 9.395204894548907e-06, 'epoch': 4.27}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 329/616 [5:11:29<4:26:11, 55.65s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 330/616 [5:12:24<4:24:35, 55.51s/it] {'loss': 1.6572, 'learning_rate': 9.342686847965367e-06, 'epoch': 4.29}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 330/616 [5:12:24<4:24:35, 55.51s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 331/616 [5:13:20<4:25:05, 55.81s/it] {'loss': 1.6333, 'learning_rate': 9.290187003519841e-06, 'epoch': 4.3}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 331/616 [5:13:20<4:25:05, 55.81s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 332/616 [5:14:15<4:22:49, 55.53s/it] {'loss': 1.687, 'learning_rate': 9.237706815023824e-06, 'epoch': 4.31}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 332/616 [5:14:15<4:22:49, 55.53s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 333/616 [5:15:11<4:22:17, 55.61s/it] {'loss': 1.6626, 'learning_rate': 9.185247735744495e-06, 'epoch': 4.32}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 333/616 [5:15:11<4:22:17, 55.61s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 334/616 [5:16:08<4:22:56, 55.94s/it] {'loss': 1.6431, 'learning_rate': 9.132811218364494e-06, 'epoch': 4.34}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 334/616 [5:16:08<4:22:56, 55.94s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 335/616 [5:17:04<4:22:14, 56.00s/it] {'loss': 1.6562, 'learning_rate': 9.080398714941672e-06, 'epoch': 4.35}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 335/616 [5:17:04<4:22:14, 56.00s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 336/616 [5:17:59<4:20:36, 55.84s/it] {'loss': 1.6714, 'learning_rate': 9.028011676868901e-06, 'epoch': 4.36}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 336/616 [5:17:59<4:20:36, 55.84s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 337/616 [5:18:55<4:20:00, 55.92s/it] {'loss': 1.604, 'learning_rate': 8.975651554833869e-06, 'epoch': 4.38}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 337/616 [5:18:55<4:20:00, 55.92s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 338/616 [5:19:51<4:18:46, 55.85s/it] {'loss': 1.6719, 'learning_rate': 8.92331979877891e-06, 'epoch': 4.39}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 338/616 [5:19:51<4:18:46, 55.85s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 339/616 [5:20:47<4:18:26, 55.98s/it] {'loss': 1.707, 'learning_rate': 8.871017857860863e-06, 'epoch': 4.4}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 339/616 [5:20:47<4:18:26, 55.98s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 340/616 [5:21:42<4:15:46, 55.60s/it] {'loss': 1.647, 'learning_rate': 8.81874718041092e-06, 'epoch': 4.42}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 340/616 [5:21:42<4:15:46, 55.60s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 341/616 [5:22:38<4:14:55, 55.62s/it] {'loss': 1.6675, 'learning_rate': 8.766509213894552e-06, 'epoch': 4.43}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 341/616 [5:22:38<4:14:55, 55.62s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 342/616 [5:23:34<4:14:39, 55.76s/it] {'loss': 1.6636, 'learning_rate': 8.714305404871397e-06, 'epoch': 4.44}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 342/616 [5:23:34<4:14:39, 55.76s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 343/616 [5:24:29<4:12:18, 55.45s/it] {'loss': 1.6768, 'learning_rate': 8.662137198955211e-06, 'epoch': 4.45}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 343/616 [5:24:29<4:12:18, 55.45s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 344/616 [5:25:23<4:10:08, 55.18s/it] {'loss': 1.5864, 'learning_rate': 8.610006040773844e-06, 'epoch': 4.47}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 344/616 [5:25:23<4:10:08, 55.18s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 345/616 [5:26:19<4:10:41, 55.50s/it] {'loss': 1.6304, 'learning_rate': 8.557913373929222e-06, 'epoch': 4.48}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 345/616 [5:26:19<4:10:41, 55.50s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 346/616 [5:27:15<4:10:23, 55.64s/it] {'loss': 1.6289, 'learning_rate': 8.50586064095739e-06, 'epoch': 4.49}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 346/616 [5:27:15<4:10:23, 55.64s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 347/616 [5:28:11<4:09:24, 55.63s/it] {'loss': 1.6436, 'learning_rate': 8.453849283288554e-06, 'epoch': 4.51}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 347/616 [5:28:11<4:09:24, 55.63s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 348/616 [5:29:07<4:08:28, 55.63s/it] {'loss': 1.6221, 'learning_rate': 8.401880741207155e-06, 'epoch': 4.52}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 348/616 [5:29:07<4:08:28, 55.63s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 349/616 [5:30:03<4:08:31, 55.85s/it] {'loss': 1.6904, 'learning_rate': 8.349956453812009e-06, 'epoch': 4.53}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 349/616 [5:30:03<4:08:31, 55.85s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 350/616 [5:30:58<4:06:06, 55.51s/it] {'loss': 1.5898, 'learning_rate': 8.298077858976435e-06, 'epoch': 4.55}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 350/616 [5:30:58<4:06:06, 55.51s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 351/616 [5:31:53<4:04:15, 55.30s/it] {'loss': 1.667, 'learning_rate': 8.246246393308448e-06, 'epoch': 4.56}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 351/616 [5:31:53<4:04:15, 55.30s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 352/616 [5:32:48<4:03:37, 55.37s/it] {'loss': 1.6543, 'learning_rate': 8.194463492110982e-06, 'epoch': 4.57}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 352/616 [5:32:48<4:03:37, 55.37s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 353/616 [5:33:44<4:03:10, 55.48s/it] {'loss': 1.6572, 'learning_rate': 8.142730589342119e-06, 'epoch': 4.58}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 353/616 [5:33:44<4:03:10, 55.48s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 354/616 [5:34:40<4:03:36, 55.79s/it] {'loss': 1.6685, 'learning_rate': 8.091049117575424e-06, 'epoch': 4.6}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 354/616 [5:34:40<4:03:36, 55.79s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 355/616 [5:35:36<4:02:46, 55.81s/it] {'loss': 1.6484, 'learning_rate': 8.03942050796022e-06, 'epoch': 4.61}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 355/616 [5:35:36<4:02:46, 55.81s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 356/616 [5:36:31<4:00:57, 55.61s/it] {'loss': 1.5405, 'learning_rate': 7.98784619018201e-06, 'epoch': 4.62}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 356/616 [5:36:31<4:00:57, 55.61s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 357/616 [5:37:26<3:59:25, 55.46s/it] {'loss': 1.644, 'learning_rate': 7.93632759242285e-06, 'epoch': 4.64}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 357/616 [5:37:26<3:59:25, 55.46s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 358/616 [5:38:22<3:58:25, 55.45s/it] {'loss': 1.6206, 'learning_rate': 7.884866141321811e-06, 'epoch': 4.65}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 358/616 [5:38:22<3:58:25, 55.45s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 359/616 [5:39:17<3:57:30, 55.45s/it] {'loss': 1.6079, 'learning_rate': 7.833463261935482e-06, 'epoch': 4.66}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 359/616 [5:39:17<3:57:30, 55.45s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 360/616 [5:40:13<3:56:23, 55.40s/it] {'loss': 1.6108, 'learning_rate': 7.782120377698489e-06, 'epoch': 4.68}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 360/616 [5:40:13<3:56:23, 55.40s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 361/616 [5:41:09<3:56:28, 55.64s/it] {'loss': 1.5625, 'learning_rate': 7.730838910384098e-06, 'epoch': 4.69}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 361/616 [5:41:09<3:56:28, 55.64s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 362/616 [5:42:04<3:54:35, 55.42s/it] {'loss': 1.647, 'learning_rate': 7.679620280064837e-06, 'epoch': 4.7}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 362/616 [5:42:04<3:54:35, 55.42s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 363/616 [5:43:00<3:54:24, 55.59s/it] {'loss': 1.5493, 'learning_rate': 7.6284659050731525e-06, 'epoch': 4.71}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 363/616 [5:43:00<3:54:24, 55.59s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 364/616 [5:43:55<3:53:37, 55.62s/it] {'loss': 1.6362, 'learning_rate': 7.57737720196217e-06, 'epoch': 4.73}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 364/616 [5:43:55<3:53:37, 55.62s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 365/616 [5:44:51<3:53:05, 55.72s/it] {'loss': 1.6294, 'learning_rate': 7.526355585466432e-06, 'epoch': 4.74}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 365/616 [5:44:51<3:53:05, 55.72s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 366/616 [5:45:48<3:53:24, 56.02s/it] {'loss': 1.6675, 'learning_rate': 7.4754024684627405e-06, 'epoch': 4.75}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 366/616 [5:45:48<3:53:24, 56.02s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 367/616 [5:46:44<3:51:58, 55.90s/it] {'loss': 1.6519, 'learning_rate': 7.424519261931036e-06, 'epoch': 4.77}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 367/616 [5:46:44<3:51:58, 55.90s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 368/616 [5:47:39<3:50:52, 55.86s/it] {'loss': 1.6807, 'learning_rate': 7.373707374915303e-06, 'epoch': 4.78}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 368/616 [5:47:39<3:50:52, 55.86s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 369/616 [5:48:35<3:49:39, 55.79s/it] {'loss': 1.6221, 'learning_rate': 7.322968214484583e-06, 'epoch': 4.79}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 369/616 [5:48:35<3:49:39, 55.79s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 370/616 [5:49:30<3:48:07, 55.64s/it] {'loss': 1.6523, 'learning_rate': 7.27230318569397e-06, 'epoch': 4.81}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 370/616 [5:49:30<3:48:07, 55.64s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 371/616 [5:50:26<3:47:32, 55.72s/it] {'loss': 1.6118, 'learning_rate': 7.221713691545746e-06, 'epoch': 4.82}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 371/616 [5:50:26<3:47:32, 55.72s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 372/616 [5:51:22<3:46:53, 55.79s/it] {'loss': 1.6279, 'learning_rate': 7.171201132950502e-06, 'epoch': 4.83}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 372/616 [5:51:22<3:46:53, 55.79s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 373/616 [5:52:18<3:45:21, 55.64s/it] {'loss': 1.6416, 'learning_rate': 7.1207669086883366e-06, 'epoch': 4.84}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 373/616 [5:52:18<3:45:21, 55.64s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 374/616 [5:53:13<3:44:41, 55.71s/it] {'loss': 1.605, 'learning_rate': 7.070412415370158e-06, 'epoch': 4.86}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 374/616 [5:53:13<3:44:41, 55.71s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 375/616 [5:54:10<3:44:46, 55.96s/it] {'loss': 1.627, 'learning_rate': 7.020139047398966e-06, 'epoch': 4.87}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 375/616 [5:54:10<3:44:46, 55.96s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 376/616 [5:55:04<3:41:28, 55.37s/it] {'loss': 1.6123, 'learning_rate': 6.969948196931272e-06, 'epoch': 4.88}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 376/616 [5:55:04<3:41:28, 55.37s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 377/616 [5:55:59<3:40:43, 55.41s/it] {'loss': 1.6333, 'learning_rate': 6.919841253838537e-06, 'epoch': 4.9}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 377/616 [5:55:59<3:40:43, 55.41s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 378/616 [5:56:56<3:40:57, 55.70s/it] {'loss': 1.5981, 'learning_rate': 6.869819605668669e-06, 'epoch': 4.91}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 378/616 [5:56:56<3:40:57, 55.70s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 379/616 [5:57:51<3:39:35, 55.59s/it] {'loss': 1.646, 'learning_rate': 6.819884637607619e-06, 'epoch': 4.92}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 379/616 [5:57:51<3:39:35, 55.59s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 380/616 [5:58:46<3:38:23, 55.52s/it] {'loss': 1.6641, 'learning_rate': 6.770037732441019e-06, 'epoch': 4.94}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 380/616 [5:58:46<3:38:23, 55.52s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 381/616 [5:59:42<3:36:55, 55.38s/it] {'loss': 1.6362, 'learning_rate': 6.720280270515882e-06, 'epoch': 4.95}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 381/616 [5:59:42<3:36:55, 55.38s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 382/616 [6:00:38<3:36:40, 55.56s/it] {'loss': 1.6562, 'learning_rate': 6.670613629702391e-06, 'epoch': 4.96}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 382/616 [6:00:38<3:36:40, 55.56s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 383/616 [6:01:33<3:35:24, 55.47s/it] {'loss': 1.6772, 'learning_rate': 6.62103918535572e-06, 'epoch': 4.97}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 383/616 [6:01:33<3:35:24, 55.47s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 384/616 [6:02:29<3:34:58, 55.60s/it] {'loss': 1.6729, 'learning_rate': 6.5715583102779815e-06, 'epoch': 4.99}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 384/616 [6:02:29<3:34:58, 55.60s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 385/616 [6:03:24<3:33:52, 55.55s/it] {'loss': 1.6597, 'learning_rate': 6.522172374680177e-06, 'epoch': 5.0}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 385/616 [6:03:24<3:33:52, 55.55s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 386/616 [6:04:53<4:11:03, 65.49s/it] {'loss': 1.6348, 'learning_rate': 6.472882746144282e-06, 'epoch': 5.01}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 386/616 [6:04:53<4:11:03, 65.49s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 387/616 [6:05:49<3:59:32, 62.76s/it] {'loss': 1.6108, 'learning_rate': 6.423690789585359e-06, 'epoch': 5.03}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 387/616 [6:05:49<3:59:32, 62.76s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 388/616 [6:06:45<3:50:45, 60.73s/it] {'loss': 1.6421, 'learning_rate': 6.374597867213756e-06, 'epoch': 5.04}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 388/616 [6:06:45<3:50:45, 60.73s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 389/616 [6:07:41<3:43:48, 59.16s/it] {'loss': 1.6455, 'learning_rate': 6.3256053384974105e-06, 'epoch': 5.05}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 389/616 [6:07:41<3:43:48, 59.16s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 390/616 [6:08:37<3:39:10, 58.19s/it] {'loss': 1.6616, 'learning_rate': 6.276714560124166e-06, 'epoch': 5.06}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 390/616 [6:08:37<3:39:10, 58.19s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 391/616 [6:09:31<3:34:03, 57.08s/it] {'loss': 1.6162, 'learning_rate': 6.2279268859642396e-06, 'epoch': 5.08}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 391/616 [6:09:31<3:34:03, 57.08s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 392/616 [6:10:27<3:31:15, 56.59s/it] {'loss': 1.6646, 'learning_rate': 6.179243667032709e-06, 'epoch': 5.09}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 392/616 [6:10:27<3:31:15, 56.59s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 393/616 [6:11:22<3:29:23, 56.34s/it] {'loss': 1.6445, 'learning_rate': 6.130666251452102e-06, 'epoch': 5.1}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 393/616 [6:11:22<3:29:23, 56.34s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 394/616 [6:12:18<3:28:01, 56.22s/it] {'loss': 1.6299, 'learning_rate': 6.082195984415069e-06, 'epoch': 5.12}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 394/616 [6:12:18<3:28:01, 56.22s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 395/616 [6:13:13<3:25:56, 55.91s/it] {'loss': 1.6221, 'learning_rate': 6.03383420814714e-06, 'epoch': 5.13}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 395/616 [6:13:13<3:25:56, 55.91s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 396/616 [6:14:08<3:24:04, 55.65s/it] {'loss': 1.647, 'learning_rate': 5.9855822618695385e-06, 'epoch': 5.14}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 396/616 [6:14:08<3:24:04, 55.65s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 397/616 [6:15:04<3:22:35, 55.50s/it] {'loss': 1.6147, 'learning_rate': 5.937441481762112e-06, 'epoch': 5.16}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 397/616 [6:15:04<3:22:35, 55.50s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 398/616 [6:15:59<3:21:06, 55.35s/it] {'loss': 1.6025, 'learning_rate': 5.889413200926317e-06, 'epoch': 5.17}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 398/616 [6:15:59<3:21:06, 55.35s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 399/616 [6:16:54<3:19:48, 55.25s/it] {'loss': 1.6064, 'learning_rate': 5.841498749348322e-06, 'epoch': 5.18}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 399/616 [6:16:54<3:19:48, 55.25s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 400/616 [6:17:50<3:19:49, 55.50s/it] {'loss': 1.6587, 'learning_rate': 5.793699453862161e-06, 'epoch': 5.19}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 400/616 [6:17:50<3:19:49, 55.50s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 401/616 [6:19:54<4:33:18, 76.27s/it] {'loss': 1.6255, 'learning_rate': 5.746016638112986e-06, 'epoch': 5.21}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 401/616 [6:19:54<4:33:18, 76.27s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 402/616 [6:20:50<4:09:42, 70.01s/it] {'loss': 1.6523, 'learning_rate': 5.698451622520442e-06, 'epoch': 5.22}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 402/616 [6:20:50<4:09:42, 70.01s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 403/616 [6:21:45<3:52:28, 65.49s/it] {'loss': 1.6367, 'learning_rate': 5.651005724242072e-06, 'epoch': 5.23}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 403/616 [6:21:45<3:52:28, 65.49s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 404/616 [6:22:40<3:40:53, 62.52s/it] {'loss': 1.6006, 'learning_rate': 5.603680257136857e-06, 'epoch': 5.25}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 404/616 [6:22:40<3:40:53, 62.52s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 405/616 [6:23:37<3:33:15, 60.64s/it] {'loss': 1.6294, 'learning_rate': 5.556476531728836e-06, 'epoch': 5.26}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 405/616 [6:23:37<3:33:15, 60.64s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 406/616 [6:24:33<3:27:23, 59.26s/it] {'loss': 1.6284, 'learning_rate': 5.509395855170798e-06, 'epoch': 5.27}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 406/616 [6:24:33<3:27:23, 59.26s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 407/616 [6:25:29<3:23:21, 58.38s/it] {'loss': 1.6392, 'learning_rate': 5.4624395312081125e-06, 'epoch': 5.29}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 407/616 [6:25:29<3:23:21, 58.38s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 408/616 [6:26:25<3:20:23, 57.80s/it] {'loss': 1.625, 'learning_rate': 5.415608860142593e-06, 'epoch': 5.3}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 408/616 [6:26:25<3:20:23, 57.80s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 409/616 [6:27:21<3:17:10, 57.15s/it] {'loss': 1.6162, 'learning_rate': 5.368905138796523e-06, 'epoch': 5.31}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 409/616 [6:27:21<3:17:10, 57.15s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 410/616 [6:28:17<3:14:41, 56.71s/it] {'loss': 1.5752, 'learning_rate': 5.322329660476715e-06, 'epoch': 5.32}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 410/616 [6:28:17<3:14:41, 56.71s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 411/616 [6:29:12<3:12:29, 56.34s/it] {'loss': 1.6655, 'learning_rate': 5.275883714938726e-06, 'epoch': 5.34}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 411/616 [6:29:12<3:12:29, 56.34s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 412/616 [6:30:08<3:10:27, 56.02s/it] {'loss': 1.5972, 'learning_rate': 5.2295685883511086e-06, 'epoch': 5.35}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 412/616 [6:30:08<3:10:27, 56.02s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 413/616 [6:31:03<3:09:04, 55.88s/it] {'loss': 1.6421, 'learning_rate': 5.183385563259819e-06, 'epoch': 5.36}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 413/616 [6:31:03<3:09:04, 55.88s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 414/616 [6:31:59<3:07:42, 55.76s/it] {'loss': 1.5869, 'learning_rate': 5.137335918552702e-06, 'epoch': 5.38}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 414/616 [6:31:59<3:07:42, 55.76s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 415/616 [6:32:54<3:06:39, 55.72s/it] {'loss': 1.6333, 'learning_rate': 5.091420929424065e-06, 'epoch': 5.39}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 415/616 [6:32:54<3:06:39, 55.72s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 416/616 [6:33:50<3:05:36, 55.68s/it] {'loss': 1.6445, 'learning_rate': 5.045641867339361e-06, 'epoch': 5.4}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 416/616 [6:33:50<3:05:36, 55.68s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 417/616 [6:34:47<3:05:53, 56.05s/it] {'loss': 1.6597, 'learning_rate': 5.000000000000003e-06, 'epoch': 5.42}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 417/616 [6:34:47<3:05:53, 56.05s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 418/616 [6:35:42<3:04:23, 55.88s/it] {'loss': 1.6387, 'learning_rate': 4.954496591308227e-06, 'epoch': 5.43}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 418/616 [6:35:42<3:04:23, 55.88s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 419/616 [6:36:38<3:03:44, 55.96s/it] {'loss': 1.6489, 'learning_rate': 4.909132901332122e-06, 'epoch': 5.44}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 419/616 [6:36:38<3:03:44, 55.96s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 420/616 [6:37:35<3:03:39, 56.22s/it] {'loss': 1.6318, 'learning_rate': 4.863910186270726e-06, 'epoch': 5.45}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 420/616 [6:37:35<3:03:39, 56.22s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 421/616 [6:38:31<3:02:23, 56.12s/it] {'loss': 1.6841, 'learning_rate': 4.818829698419225e-06, 'epoch': 5.47}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 421/616 [6:38:31<3:02:23, 56.12s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 422/616 [6:39:27<3:01:16, 56.07s/it] {'loss': 1.666, 'learning_rate': 4.773892686134301e-06, 'epoch': 5.48}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 422/616 [6:39:27<3:01:16, 56.07s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 423/616 [6:40:22<2:59:42, 55.87s/it] {'loss': 1.6162, 'learning_rate': 4.729100393799538e-06, 'epoch': 5.49}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 423/616 [6:40:22<2:59:42, 55.87s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 424/616 [6:41:19<2:59:17, 56.03s/it] {'loss': 1.5957, 'learning_rate': 4.684454061790987e-06, 'epoch': 5.51}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 424/616 [6:41:19<2:59:17, 56.03s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 425/616 [6:42:13<2:56:55, 55.58s/it] {'loss': 1.6201, 'learning_rate': 4.639954926442792e-06, 'epoch': 5.52}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 425/616 [6:42:13<2:56:55, 55.58s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 426/616 [6:43:09<2:55:42, 55.49s/it] {'loss': 1.6533, 'learning_rate': 4.5956042200129725e-06, 'epoch': 5.53}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 426/616 [6:43:09<2:55:42, 55.49s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 427/616 [6:44:05<2:55:42, 55.78s/it] {'loss': 1.624, 'learning_rate': 4.551403170649299e-06, 'epoch': 5.55}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 427/616 [6:44:05<2:55:42, 55.78s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 428/616 [6:45:00<2:53:57, 55.52s/it] {'loss': 1.604, 'learning_rate': 4.507353002355269e-06, 'epoch': 5.56}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 428/616 [6:45:00<2:53:57, 55.52s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 429/616 [6:45:56<2:53:53, 55.80s/it] {'loss': 1.6089, 'learning_rate': 4.4634549349562315e-06, 'epoch': 5.57}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 429/616 [6:45:56<2:53:53, 55.80s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 430/616 [6:46:52<2:52:24, 55.62s/it] {'loss': 1.5962, 'learning_rate': 4.4197101840656e-06, 'epoch': 5.58}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 430/616 [6:46:52<2:52:24, 55.62s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 431/616 [6:47:48<2:51:48, 55.72s/it] {'loss': 1.5962, 'learning_rate': 4.376119961051175e-06, 'epoch': 5.6}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 431/616 [6:47:48<2:51:48, 55.72s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 432/616 [6:48:43<2:50:43, 55.67s/it] {'loss': 1.6313, 'learning_rate': 4.33268547300163e-06, 'epoch': 5.61}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 432/616 [6:48:43<2:50:43, 55.67s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 433/616 [6:49:38<2:48:53, 55.37s/it] {'loss': 1.6626, 'learning_rate': 4.289407922693053e-06, 'epoch': 5.62}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 433/616 [6:49:38<2:48:53, 55.37s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 434/616 [6:50:34<2:48:44, 55.63s/it] {'loss': 1.5796, 'learning_rate': 4.2462885085556635e-06, 'epoch': 5.64}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 434/616 [6:50:34<2:48:44, 55.63s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 435/616 [6:51:30<2:48:04, 55.72s/it] {'loss': 1.6836, 'learning_rate': 4.203328424640619e-06, 'epoch': 5.65}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 435/616 [6:51:30<2:48:04, 55.72s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 436/616 [6:52:25<2:46:41, 55.56s/it] {'loss': 1.6675, 'learning_rate': 4.1605288605869365e-06, 'epoch': 5.66}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 436/616 [6:52:25<2:46:41, 55.56s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 437/616 [6:53:21<2:46:12, 55.71s/it] {'loss': 1.6807, 'learning_rate': 4.117891001588574e-06, 'epoch': 5.68}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 437/616 [6:53:21<2:46:12, 55.71s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 438/616 [6:54:17<2:45:16, 55.71s/it] {'loss': 1.6167, 'learning_rate': 4.075416028361584e-06, 'epoch': 5.69}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 438/616 [6:54:17<2:45:16, 55.71s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 439/616 [6:55:14<2:45:15, 56.02s/it] {'loss': 1.6851, 'learning_rate': 4.033105117111441e-06, 'epoch': 5.7}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 439/616 [6:55:14<2:45:15, 56.02s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 440/616 [6:56:09<2:43:49, 55.85s/it] {'loss': 1.6191, 'learning_rate': 3.9909594395004545e-06, 'epoch': 5.71}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 440/616 [6:56:09<2:43:49, 55.85s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 441/616 [6:57:05<2:43:19, 56.00s/it] {'loss': 1.6362, 'learning_rate': 3.948980162615323e-06, 'epoch': 5.73}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 441/616 [6:57:05<2:43:19, 56.00s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 442/616 [6:58:01<2:41:55, 55.84s/it] {'loss': 1.5825, 'learning_rate': 3.907168448934836e-06, 'epoch': 5.74}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 442/616 [6:58:01<2:41:55, 55.84s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 443/616 [6:58:56<2:40:39, 55.72s/it] {'loss': 1.6182, 'learning_rate': 3.865525456297652e-06, 'epoch': 5.75}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 443/616 [6:58:56<2:40:39, 55.72s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 444/616 [6:59:51<2:39:13, 55.54s/it] {'loss': 1.5908, 'learning_rate': 3.824052337870263e-06, 'epoch': 5.77}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 444/616 [6:59:51<2:39:13, 55.54s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 445/616 [7:00:47<2:38:13, 55.52s/it] {'loss': 1.6162, 'learning_rate': 3.7827502421150497e-06, 'epoch': 5.78}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 445/616 [7:00:47<2:38:13, 55.52s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 446/616 [7:01:43<2:38:07, 55.81s/it] {'loss': 1.6021, 'learning_rate': 3.741620312758469e-06, 'epoch': 5.79}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 446/616 [7:01:43<2:38:07, 55.81s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 447/616 [7:02:39<2:37:12, 55.81s/it] {'loss': 1.6479, 'learning_rate': 3.7006636887594095e-06, 'epoch': 5.81}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 447/616 [7:02:39<2:37:12, 55.81s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 448/616 [7:03:34<2:35:15, 55.45s/it] {'loss': 1.6294, 'learning_rate': 3.6598815042776135e-06, 'epoch': 5.82}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 448/616 [7:03:34<2:35:15, 55.45s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 449/616 [7:04:29<2:34:24, 55.48s/it] {'loss': 1.6914, 'learning_rate': 3.619274888642309e-06, 'epoch': 5.83}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 449/616 [7:04:29<2:34:24, 55.48s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 450/616 [7:05:25<2:33:44, 55.57s/it] {'loss': 1.6226, 'learning_rate': 3.578844966320917e-06, 'epoch': 5.84}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 450/616 [7:05:25<2:33:44, 55.57s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 451/616 [7:06:20<2:32:34, 55.48s/it] {'loss': 1.6196, 'learning_rate': 3.5385928568879012e-06, 'epoch': 5.86}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 451/616 [7:06:20<2:32:34, 55.48s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 452/616 [7:07:16<2:31:53, 55.57s/it] {'loss': 1.5977, 'learning_rate': 3.4985196749937976e-06, 'epoch': 5.87}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 452/616 [7:07:16<2:31:53, 55.57s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 453/616 [7:08:12<2:31:23, 55.73s/it] {'loss': 1.5786, 'learning_rate': 3.458626530334316e-06, 'epoch': 5.88}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 453/616 [7:08:12<2:31:23, 55.73s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 454/616 [7:09:08<2:30:20, 55.68s/it] {'loss': 1.6113, 'learning_rate': 3.4189145276196244e-06, 'epoch': 5.9}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 454/616 [7:09:08<2:30:20, 55.68s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 455/616 [7:10:03<2:28:44, 55.43s/it] {'loss': 1.6025, 'learning_rate': 3.3793847665437674e-06, 'epoch': 5.91}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 455/616 [7:10:03<2:28:44, 55.43s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 456/616 [7:10:58<2:27:26, 55.29s/it] {'loss': 1.6191, 'learning_rate': 3.340038341754189e-06, 'epoch': 5.92}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 456/616 [7:10:58<2:27:26, 55.29s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 457/616 [7:11:53<2:26:08, 55.15s/it] {'loss': 1.604, 'learning_rate': 3.300876342821451e-06, 'epoch': 5.94}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 457/616 [7:11:53<2:26:08, 55.15s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 458/616 [7:12:48<2:25:24, 55.22s/it] {'loss': 1.6274, 'learning_rate': 3.2618998542090263e-06, 'epoch': 5.95}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 458/616 [7:12:48<2:25:24, 55.22s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 459/616 [7:13:44<2:24:50, 55.36s/it] {'loss': 1.6543, 'learning_rate': 3.2231099552433e-06, 'epoch': 5.96}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 459/616 [7:13:44<2:24:50, 55.36s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 460/616 [7:14:40<2:24:28, 55.57s/it] {'loss': 1.6265, 'learning_rate': 3.1845077200836638e-06, 'epoch': 5.97}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 460/616 [7:14:40<2:24:28, 55.57s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 461/616 [7:15:35<2:23:23, 55.51s/it] {'loss': 1.6123, 'learning_rate': 3.1460942176927666e-06, 'epoch': 5.99}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 461/616 [7:15:35<2:23:23, 55.51s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 462/616 [7:16:31<2:22:50, 55.66s/it] {'loss': 1.6401, 'learning_rate': 3.107870511806934e-06, 'epoch': 6.0}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 462/616 [7:16:31<2:22:50, 55.66s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 463/616 [7:17:59<2:46:44, 65.39s/it] {'loss': 1.6094, 'learning_rate': 3.0698376609066828e-06, 'epoch': 6.01}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 463/616 [7:17:59<2:46:44, 65.39s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 464/616 [7:18:54<2:37:50, 62.31s/it] {'loss': 1.5859, 'learning_rate': 3.0319967181874366e-06, 'epoch': 6.03}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 464/616 [7:18:54<2:37:50, 62.31s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 465/616 [7:19:49<2:31:01, 60.01s/it] {'loss': 1.6182, 'learning_rate': 2.9943487315303486e-06, 'epoch': 6.04}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 465/616 [7:19:49<2:31:01, 60.01s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 466/616 [7:20:44<2:26:06, 58.44s/it] {'loss': 1.6196, 'learning_rate': 2.9568947434732777e-06, 'epoch': 6.05}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 466/616 [7:20:44<2:26:06, 58.44s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 467/616 [7:21:39<2:22:56, 57.56s/it] {'loss': 1.6367, 'learning_rate': 2.919635791181934e-06, 'epoch': 6.06}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 467/616 [7:21:39<2:22:56, 57.56s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 468/616 [7:22:34<2:19:59, 56.76s/it] {'loss': 1.7124, 'learning_rate': 2.882572906421145e-06, 'epoch': 6.08}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 468/616 [7:22:34<2:19:59, 56.76s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 469/616 [7:23:29<2:17:48, 56.25s/it] {'loss': 1.623, 'learning_rate': 2.8457071155262885e-06, 'epoch': 6.09}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 469/616 [7:23:29<2:17:48, 56.25s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 470/616 [7:24:25<2:16:44, 56.19s/it] {'loss': 1.5874, 'learning_rate': 2.809039439374878e-06, 'epoch': 6.1}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 470/616 [7:24:25<2:16:44, 56.19s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 471/616 [7:25:22<2:16:04, 56.31s/it] {'loss': 1.6362, 'learning_rate': 2.7725708933582785e-06, 'epoch': 6.12}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 471/616 [7:25:22<2:16:04, 56.31s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 472/616 [7:26:17<2:14:30, 56.05s/it] {'loss': 1.6221, 'learning_rate': 2.7363024873536093e-06, 'epoch': 6.13}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 472/616 [7:26:17<2:14:30, 56.05s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 473/616 [7:27:13<2:13:28, 56.01s/it] {'loss': 1.6416, 'learning_rate': 2.700235225695752e-06, 'epoch': 6.14}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 473/616 [7:27:13<2:13:28, 56.01s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 474/616 [7:28:08<2:11:56, 55.75s/it] {'loss': 1.668, 'learning_rate': 2.6643701071495644e-06, 'epoch': 6.16}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 474/616 [7:28:08<2:11:56, 55.75s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 475/616 [7:29:04<2:10:54, 55.70s/it] {'loss': 1.5928, 'learning_rate': 2.628708124882212e-06, 'epoch': 6.17}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 475/616 [7:29:04<2:10:54, 55.70s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 476/616 [7:30:00<2:09:59, 55.71s/it] {'loss': 1.6172, 'learning_rate': 2.5932502664356553e-06, 'epoch': 6.18}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 476/616 [7:30:00<2:09:59, 55.71s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 477/616 [7:30:56<2:09:20, 55.83s/it] {'loss': 1.6162, 'learning_rate': 2.5579975136993253e-06, 'epoch': 6.19}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 477/616 [7:30:56<2:09:20, 55.83s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 478/616 [7:31:51<2:07:52, 55.60s/it] {'loss': 1.6636, 'learning_rate': 2.52295084288291e-06, 'epoch': 6.21}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 478/616 [7:31:51<2:07:52, 55.60s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 479/616 [7:32:47<2:07:11, 55.70s/it] {'loss': 1.6748, 'learning_rate': 2.4881112244893403e-06, 'epoch': 6.22}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 479/616 [7:32:47<2:07:11, 55.70s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 480/616 [7:33:42<2:06:17, 55.71s/it] {'loss': 1.6167, 'learning_rate': 2.453479623287909e-06, 'epoch': 6.23}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 480/616 [7:33:42<2:06:17, 55.71s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 481/616 [7:34:39<2:05:45, 55.90s/it] {'loss': 1.6763, 'learning_rate': 2.419056998287547e-06, 'epoch': 6.25}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 481/616 [7:34:39<2:05:45, 55.90s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 482/616 [7:35:36<2:05:33, 56.22s/it] {'loss': 1.6587, 'learning_rate': 2.3848443027102706e-06, 'epoch': 6.26}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 482/616 [7:35:36<2:05:33, 56.22s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 483/616 [7:36:31<2:03:57, 55.92s/it] {'loss': 1.6538, 'learning_rate': 2.3508424839647994e-06, 'epoch': 6.27}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 483/616 [7:36:31<2:03:57, 55.92s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 484/616 [7:37:27<2:03:00, 55.91s/it] {'loss': 1.5952, 'learning_rate': 2.3170524836202936e-06, 'epoch': 6.29}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 484/616 [7:37:27<2:03:00, 55.91s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 485/616 [7:38:24<2:02:39, 56.18s/it] {'loss': 1.6348, 'learning_rate': 2.2834752373803094e-06, 'epoch': 6.3}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 485/616 [7:38:24<2:02:39, 56.18s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 486/616 [7:39:20<2:01:47, 56.21s/it] {'loss': 1.6074, 'learning_rate': 2.250111675056863e-06, 'epoch': 6.31}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 486/616 [7:39:20<2:01:47, 56.21s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 487/616 [7:40:17<2:01:07, 56.34s/it] {'loss': 1.6284, 'learning_rate': 2.216962720544703e-06, 'epoch': 6.32}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 487/616 [7:40:17<2:01:07, 56.34s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 488/616 [7:41:12<1:59:49, 56.16s/it] {'loss': 1.6143, 'learning_rate': 2.184029291795705e-06, 'epoch': 6.34}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 488/616 [7:41:12<1:59:49, 56.16s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 489/616 [7:42:08<1:58:41, 56.07s/it] {'loss': 1.6323, 'learning_rate': 2.151312300793473e-06, 'epoch': 6.35}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 489/616 [7:42:08<1:58:41, 56.07s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 490/616 [7:43:04<1:57:44, 56.07s/it] {'loss': 1.6387, 'learning_rate': 2.118812653528077e-06, 'epoch': 6.36}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 490/616 [7:43:04<1:57:44, 56.07s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 491/616 [7:43:59<1:56:07, 55.74s/it] {'loss': 1.6016, 'learning_rate': 2.086531249970952e-06, 'epoch': 6.38}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 491/616 [7:43:59<1:56:07, 55.74s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 492/616 [7:44:56<1:55:33, 55.91s/it] {'loss': 1.6616, 'learning_rate': 2.0544689840499988e-06, 'epoch': 6.39}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 492/616 [7:44:56<1:55:33, 55.91s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 493/616 [7:45:51<1:54:35, 55.90s/it] {'loss': 1.6211, 'learning_rate': 2.022626743624807e-06, 'epoch': 6.4}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 493/616 [7:45:51<1:54:35, 55.90s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 494/616 [7:46:47<1:53:29, 55.82s/it] {'loss': 1.6504, 'learning_rate': 1.991005410462089e-06, 'epoch': 6.42}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 494/616 [7:46:47<1:53:29, 55.82s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 495/616 [7:47:44<1:52:57, 56.01s/it] {'loss': 1.6748, 'learning_rate': 1.9596058602112533e-06, 'epoch': 6.43}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 495/616 [7:47:44<1:52:57, 56.01s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 496/616 [7:48:41<1:53:07, 56.56s/it] {'loss': 1.6597, 'learning_rate': 1.928428962380148e-06, 'epoch': 6.44}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 496/616 [7:48:41<1:53:07, 56.56s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 497/616 [7:49:37<1:51:38, 56.29s/it] {'loss': 1.6133, 'learning_rate': 1.8974755803109968e-06, 'epoch': 6.45}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 497/616 [7:49:37<1:51:38, 56.29s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 498/616 [7:50:33<1:50:23, 56.14s/it] {'loss': 1.6294, 'learning_rate': 1.866746571156479e-06, 'epoch': 6.47}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 498/616 [7:50:33<1:50:23, 56.14s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 499/616 [7:51:29<1:49:28, 56.14s/it] {'loss': 1.6074, 'learning_rate': 1.8362427858560094e-06, 'epoch': 6.48}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 499/616 [7:51:29<1:49:28, 56.14s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 500/616 [7:52:25<1:48:36, 56.18s/it] {'loss': 1.645, 'learning_rate': 1.8059650691121611e-06, 'epoch': 6.49}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 500/616 [7:52:25<1:48:36, 56.18s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 501/616 [7:54:18<2:20:08, 73.11s/it] {'loss': 1.5884, 'learning_rate': 1.7759142593672707e-06, 'epoch': 6.51}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 501/616 [7:54:18<2:20:08, 73.11s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 502/616 [7:55:13<2:08:26, 67.60s/it] {'loss': 1.6245, 'learning_rate': 1.74609118878024e-06, 'epoch': 6.52}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 502/616 [7:55:13<2:08:26, 67.60s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 503/616 [7:56:08<2:00:33, 64.02s/it] {'loss': 1.6309, 'learning_rate': 1.7164966832034668e-06, 'epoch': 6.53}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 503/616 [7:56:08<2:00:33, 64.02s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 504/616 [7:57:05<1:55:25, 61.83s/it] {'loss': 1.6035, 'learning_rate': 1.6871315621599982e-06, 'epoch': 6.55}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 504/616 [7:57:05<1:55:25, 61.83s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 505/616 [7:58:01<1:51:20, 60.18s/it] {'loss': 1.5688, 'learning_rate': 1.6579966388208257e-06, 'epoch': 6.56}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 505/616 [7:58:01<1:51:20, 60.18s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 506/616 [7:58:57<1:47:46, 58.79s/it] {'loss': 1.5762, 'learning_rate': 1.6290927199823604e-06, 'epoch': 6.57}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 506/616 [7:58:57<1:47:46, 58.79s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 507/616 [7:59:52<1:44:51, 57.72s/it] {'loss': 1.6323, 'learning_rate': 1.6004206060441096e-06, 'epoch': 6.58}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 507/616 [7:59:52<1:44:51, 57.72s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 508/616 [8:00:48<1:42:45, 57.09s/it] {'loss': 1.5884, 'learning_rate': 1.5719810909864941e-06, 'epoch': 6.6}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 508/616 [8:00:48<1:42:45, 57.09s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 509/616 [8:01:44<1:41:20, 56.83s/it] {'loss': 1.6382, 'learning_rate': 1.543774962348874e-06, 'epoch': 6.61}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 509/616 [8:01:44<1:41:20, 56.83s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 510/616 [8:02:40<1:39:51, 56.52s/it] {'loss': 1.6279, 'learning_rate': 1.5158030012077329e-06, 'epoch': 6.62}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 510/616 [8:02:40<1:39:51, 56.52s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 511/616 [8:03:37<1:39:08, 56.65s/it] {'loss': 1.6304, 'learning_rate': 1.4880659821550547e-06, 'epoch': 6.64}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 511/616 [8:03:37<1:39:08, 56.65s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 512/616 [8:04:32<1:37:24, 56.20s/it] {'loss': 1.6289, 'learning_rate': 1.4605646732768685e-06, 'epoch': 6.65}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 512/616 [8:04:32<1:37:24, 56.20s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 513/616 [8:05:27<1:36:08, 56.00s/it] {'loss': 1.5889, 'learning_rate': 1.4332998361319783e-06, 'epoch': 6.66}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 513/616 [8:05:27<1:36:08, 56.00s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 514/616 [8:06:24<1:35:25, 56.13s/it] {'loss': 1.6221, 'learning_rate': 1.4062722257308803e-06, 'epoch': 6.68}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 514/616 [8:06:24<1:35:25, 56.13s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 515/616 [8:07:20<1:34:25, 56.09s/it] {'loss': 1.604, 'learning_rate': 1.3794825905148557e-06, 'epoch': 6.69}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 515/616 [8:07:20<1:34:25, 56.09s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 516/616 [8:08:15<1:33:07, 55.88s/it] {'loss': 1.6099, 'learning_rate': 1.3529316723352303e-06, 'epoch': 6.7}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 516/616 [8:08:15<1:33:07, 55.88s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 517/616 [8:09:11<1:32:09, 55.85s/it] {'loss': 1.6045, 'learning_rate': 1.3266202064328548e-06, 'epoch': 6.71}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 517/616 [8:09:11<1:32:09, 55.85s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 518/616 [8:10:06<1:31:00, 55.72s/it] {'loss': 1.6289, 'learning_rate': 1.3005489214177213e-06, 'epoch': 6.73}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 518/616 [8:10:06<1:31:00, 55.72s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 519/616 [8:11:03<1:30:18, 55.86s/it] {'loss': 1.6519, 'learning_rate': 1.2747185392488048e-06, 'epoch': 6.74}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 519/616 [8:11:03<1:30:18, 55.86s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 520/616 [8:11:57<1:28:46, 55.48s/it] {'loss': 1.6338, 'learning_rate': 1.249129775214064e-06, 'epoch': 6.75}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 520/616 [8:11:57<1:28:46, 55.48s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 521/616 [8:12:53<1:28:06, 55.64s/it] {'loss': 1.6196, 'learning_rate': 1.2237833379106257e-06, 'epoch': 6.77}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 521/616 [8:12:53<1:28:06, 55.64s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 522/616 [8:13:48<1:26:42, 55.35s/it] {'loss': 1.6104, 'learning_rate': 1.1986799292251816e-06, 'epoch': 6.78}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 522/616 [8:13:48<1:26:42, 55.35s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 523/616 [8:14:44<1:26:06, 55.56s/it] {'loss': 1.6309, 'learning_rate': 1.1738202443145307e-06, 'epoch': 6.79}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 523/616 [8:14:44<1:26:06, 55.56s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 524/616 [8:15:40<1:25:39, 55.86s/it] {'loss': 1.5845, 'learning_rate': 1.1492049715863464e-06, 'epoch': 6.81}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 524/616 [8:15:40<1:25:39, 55.86s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 525/616 [8:16:35<1:24:11, 55.52s/it] {'loss': 1.582, 'learning_rate': 1.1248347926801029e-06, 'epoch': 6.82}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 525/616 [8:16:35<1:24:11, 55.52s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 526/616 [8:17:31<1:23:34, 55.71s/it] {'loss': 1.6553, 'learning_rate': 1.100710382448198e-06, 'epoch': 6.83}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 526/616 [8:17:31<1:23:34, 55.71s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 527/616 [8:18:27<1:22:36, 55.69s/it] {'loss': 1.5771, 'learning_rate': 1.0768324089372816e-06, 'epoch': 6.84}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 527/616 [8:18:27<1:22:36, 55.69s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 528/616 [8:19:22<1:21:27, 55.54s/it] {'loss': 1.6611, 'learning_rate': 1.053201533369731e-06, 'epoch': 6.86}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 528/616 [8:19:22<1:21:27, 55.54s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 529/616 [8:20:18<1:20:36, 55.59s/it] {'loss': 1.6128, 'learning_rate': 1.029818410125365e-06, 'epoch': 6.87}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 529/616 [8:20:18<1:20:36, 55.59s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 530/616 [8:21:14<1:19:49, 55.69s/it] {'loss': 1.5957, 'learning_rate': 1.0066836867233087e-06, 'epoch': 6.88}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 530/616 [8:21:14<1:19:49, 55.69s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 531/616 [8:22:10<1:19:18, 55.98s/it] {'loss': 1.6299, 'learning_rate': 9.837980038040607e-07, 'epoch': 6.9}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 531/616 [8:22:10<1:19:18, 55.98s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 532/616 [8:23:07<1:18:41, 56.21s/it] {'loss': 1.6147, 'learning_rate': 9.611619951117657e-07, 'epoch': 6.91}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 532/616 [8:23:07<1:18:41, 56.21s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 533/616 [8:24:03<1:17:41, 56.16s/it] {'loss': 1.5864, 'learning_rate': 9.387762874766515e-07, 'epoch': 6.92}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 533/616 [8:24:03<1:17:41, 56.16s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 534/616 [8:24:59<1:16:26, 55.94s/it] {'loss': 1.6245, 'learning_rate': 9.166415007976803e-07, 'epoch': 6.94}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 534/616 [8:24:59<1:16:26, 55.94s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 535/616 [8:25:55<1:15:36, 56.01s/it] {'loss': 1.5781, 'learning_rate': 8.94758248025378e-07, 'epoch': 6.95}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 535/616 [8:25:55<1:15:36, 56.01s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 536/616 [8:26:51<1:14:55, 56.19s/it] {'loss': 1.5845, 'learning_rate': 8.7312713514486e-07, 'epoch': 6.96}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 536/616 [8:26:51<1:14:55, 56.19s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 537/616 [8:27:46<1:13:23, 55.74s/it] {'loss': 1.624, 'learning_rate': 8.517487611590558e-07, 'epoch': 6.97}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 537/616 [8:27:46<1:13:23, 55.74s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 538/616 [8:28:41<1:12:04, 55.45s/it] {'loss': 1.5811, 'learning_rate': 8.306237180721121e-07, 'epoch': 6.99}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 538/616 [8:28:41<1:12:04, 55.45s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 539/616 [8:29:37<1:11:36, 55.80s/it] {'loss': 1.5898, 'learning_rate': 8.097525908730108e-07, 'epoch': 7.0}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 539/616 [8:29:38<1:11:36, 55.80s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 540/616 [8:30:59<1:20:35, 63.62s/it] {'loss': 1.5542, 'learning_rate': 7.891359575193613e-07, 'epoch': 7.01}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 540/616 [8:30:59<1:20:35, 63.62s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 541/616 [8:31:55<1:16:28, 61.19s/it] {'loss': 1.6382, 'learning_rate': 7.687743889213939e-07, 'epoch': 7.03}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 541/616 [8:31:55<1:16:28, 61.19s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 542/616 [8:32:51<1:13:34, 59.65s/it] {'loss': 1.6597, 'learning_rate': 7.486684489261609e-07, 'epoch': 7.04}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 542/616 [8:32:51<1:13:34, 59.65s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 543/616 [8:33:46<1:10:46, 58.18s/it] {'loss': 1.5918, 'learning_rate': 7.288186943019171e-07, 'epoch': 7.05}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 543/616 [8:33:46<1:10:46, 58.18s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 544/616 [8:34:42<1:09:00, 57.51s/it] {'loss': 1.6226, 'learning_rate': 7.092256747226944e-07, 'epoch': 7.06}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 544/616 [8:34:42<1:09:00, 57.51s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 545/616 [8:35:38<1:07:30, 57.04s/it] {'loss': 1.563, 'learning_rate': 6.89889932753095e-07, 'epoch': 7.08}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 545/616 [8:35:38<1:07:30, 57.04s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 546/616 [8:36:33<1:06:06, 56.67s/it] {'loss': 1.6348, 'learning_rate': 6.708120038332533e-07, 'epoch': 7.09}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 546/616 [8:36:33<1:06:06, 56.67s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 547/616 [8:37:29<1:04:41, 56.25s/it] {'loss': 1.6089, 'learning_rate': 6.519924162640168e-07, 'epoch': 7.1}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 547/616 [8:37:29<1:04:41, 56.25s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 548/616 [8:38:25<1:03:42, 56.22s/it] {'loss': 1.6143, 'learning_rate': 6.334316911923155e-07, 'epoch': 7.12}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 548/616 [8:38:25<1:03:42, 56.22s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 549/616 [8:39:21<1:02:51, 56.29s/it] {'loss': 1.6396, 'learning_rate': 6.151303425967259e-07, 'epoch': 7.13}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 549/616 [8:39:21<1:02:51, 56.29s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 550/616 [8:40:18<1:01:58, 56.34s/it] {'loss': 1.6387, 'learning_rate': 5.970888772732453e-07, 'epoch': 7.14}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 550/616 [8:40:18<1:01:58, 56.34s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 551/616 [8:41:13<1:00:45, 56.08s/it] {'loss': 1.5835, 'learning_rate': 5.793077948212478e-07, 'epoch': 7.16}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 551/616 [8:41:13<1:00:45, 56.08s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 552/616 [8:42:09<59:35, 55.87s/it] {'loss': 1.6489, 'learning_rate': 5.617875876296641e-07, 'epoch': 7.17}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 552/616 [8:42:09<59:35, 55.87s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 553/616 [8:43:05<58:43, 55.92s/it] {'loss': 1.6318, 'learning_rate': 5.445287408633304e-07, 'epoch': 7.18}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 553/616 [8:43:05<58:43, 55.92s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 554/616 [8:44:00<57:42, 55.85s/it] {'loss': 1.645, 'learning_rate': 5.27531732449561e-07, 'epoch': 7.19}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 554/616 [8:44:00<57:42, 55.85s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 555/616 [8:44:56<56:50, 55.91s/it] {'loss': 1.5996, 'learning_rate': 5.107970330649204e-07, 'epoch': 7.21}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 555/616 [8:44:56<56:50, 55.91s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 556/616 [8:45:52<55:49, 55.83s/it] {'loss': 1.5962, 'learning_rate': 4.943251061221721e-07, 'epoch': 7.22}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 556/616 [8:45:52<55:49, 55.83s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 557/616 [8:46:48<54:52, 55.81s/it] {'loss': 1.6211, 'learning_rate': 4.78116407757464e-07, 'epoch': 7.23}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 557/616 [8:46:48<54:52, 55.81s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 558/616 [8:47:45<54:16, 56.15s/it] {'loss': 1.6011, 'learning_rate': 4.6217138681769026e-07, 'epoch': 7.25}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 558/616 [8:47:45<54:16, 56.15s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 559/616 [8:48:40<53:13, 56.03s/it] {'loss': 1.6392, 'learning_rate': 4.464904848480522e-07, 'epoch': 7.26}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 559/616 [8:48:40<53:13, 56.03s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 560/616 [8:49:38<52:36, 56.37s/it] {'loss': 1.6265, 'learning_rate': 4.310741360798498e-07, 'epoch': 7.27}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 560/616 [8:49:38<52:36, 56.37s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 561/616 [8:50:33<51:24, 56.08s/it] {'loss': 1.6255, 'learning_rate': 4.1592276741844075e-07, 'epoch': 7.29}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 561/616 [8:50:33<51:24, 56.08s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 562/616 [8:51:29<50:22, 55.97s/it] {'loss': 1.6196, 'learning_rate': 4.0103679843142895e-07, 'epoch': 7.3}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 562/616 [8:51:29<50:22, 55.97s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 563/616 [8:52:26<49:45, 56.33s/it] {'loss': 1.6201, 'learning_rate': 3.864166413370429e-07, 'epoch': 7.31}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 563/616 [8:52:26<49:45, 56.33s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 564/616 [8:53:22<48:45, 56.25s/it] {'loss': 1.6396, 'learning_rate': 3.720627009927158e-07, 'epoch': 7.32}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 564/616 [8:53:22<48:45, 56.25s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 565/616 [8:54:18<47:48, 56.25s/it] {'loss': 1.6553, 'learning_rate': 3.5797537488388326e-07, 'epoch': 7.34}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 565/616 [8:54:18<47:48, 56.25s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 566/616 [8:55:15<46:58, 56.37s/it] {'loss': 1.6431, 'learning_rate': 3.441550531129667e-07, 'epoch': 7.35}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 566/616 [8:55:15<46:58, 56.37s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 567/616 [8:56:10<45:50, 56.13s/it] {'loss': 1.6362, 'learning_rate': 3.3060211838858104e-07, 'epoch': 7.36}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 567/616 [8:56:10<45:50, 56.13s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 568/616 [8:57:05<44:38, 55.80s/it] {'loss': 1.5654, 'learning_rate': 3.1731694601492834e-07, 'epoch': 7.38}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 568/616 [8:57:05<44:38, 55.80s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 569/616 [8:58:01<43:33, 55.61s/it] {'loss': 1.6074, 'learning_rate': 3.042999038814076e-07, 'epoch': 7.39}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 569/616 [8:58:01<43:33, 55.61s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 570/616 [8:58:56<42:35, 55.55s/it] {'loss': 1.6094, 'learning_rate': 2.915513524524294e-07, 'epoch': 7.4}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 570/616 [8:58:56<42:35, 55.55s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 571/616 [8:59:52<41:47, 55.72s/it] {'loss': 1.6758, 'learning_rate': 2.790716447574304e-07, 'epoch': 7.42}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 571/616 [8:59:52<41:47, 55.72s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 572/616 [9:00:49<41:08, 56.11s/it] {'loss': 1.6313, 'learning_rate': 2.668611263811016e-07, 'epoch': 7.43}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 572/616 [9:00:49<41:08, 56.11s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 573/616 [9:01:45<40:05, 55.95s/it] {'loss': 1.5835, 'learning_rate': 2.5492013545381666e-07, 'epoch': 7.44}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 573/616 [9:01:45<40:05, 55.95s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 574/616 [9:02:41<39:14, 56.06s/it] {'loss': 1.5972, 'learning_rate': 2.4324900264226405e-07, 'epoch': 7.45}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 574/616 [9:02:41<39:14, 56.06s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 575/616 [9:03:37<38:16, 56.02s/it] {'loss': 1.6689, 'learning_rate': 2.3184805114029872e-07, 'epoch': 7.47}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 575/616 [9:03:37<38:16, 56.02s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 576/616 [9:04:32<37:07, 55.68s/it] {'loss': 1.6304, 'learning_rate': 2.2071759665998282e-07, 'epoch': 7.48}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 576/616 [9:04:32<37:07, 55.68s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 577/616 [9:05:27<36:06, 55.55s/it] {'loss': 1.6035, 'learning_rate': 2.098579474228546e-07, 'epoch': 7.49}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 577/616 [9:05:27<36:06, 55.55s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 578/616 [9:06:23<35:13, 55.63s/it] {'loss': 1.5952, 'learning_rate': 1.9926940415138206e-07, 'epoch': 7.51}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 578/616 [9:06:23<35:13, 55.63s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 579/616 [9:07:20<34:30, 55.97s/it] {'loss': 1.584, 'learning_rate': 1.8895226006064084e-07, 'epoch': 7.52}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 579/616 [9:07:20<34:30, 55.97s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 580/616 [9:08:16<33:42, 56.18s/it] {'loss': 1.6064, 'learning_rate': 1.7890680085019597e-07, 'epoch': 7.53}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 580/616 [9:08:16<33:42, 56.18s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 581/616 [9:09:13<32:50, 56.30s/it] {'loss': 1.6235, 'learning_rate': 1.6913330469618628e-07, 'epoch': 7.55}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 581/616 [9:09:13<32:50, 56.30s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 582/616 [9:10:10<32:00, 56.48s/it] {'loss': 1.6294, 'learning_rate': 1.5963204224362261e-07, 'epoch': 7.56}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 582/616 [9:10:10<32:00, 56.48s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 583/616 [9:11:06<30:57, 56.29s/it] {'loss': 1.6055, 'learning_rate': 1.504032765988961e-07, 'epoch': 7.57}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 583/616 [9:11:06<30:57, 56.29s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 584/616 [9:12:02<29:59, 56.23s/it] {'loss': 1.6353, 'learning_rate': 1.4144726332248726e-07, 'epoch': 7.58}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 584/616 [9:12:02<29:59, 56.23s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 585/616 [9:12:58<29:03, 56.25s/it] {'loss': 1.6108, 'learning_rate': 1.327642504218951e-07, 'epoch': 7.6}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 585/616 [9:12:58<29:03, 56.25s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 586/616 [9:13:54<28:00, 56.00s/it] {'loss': 1.6201, 'learning_rate': 1.2435447834476254e-07, 'epoch': 7.61}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 586/616 [9:13:54<28:00, 56.00s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 587/616 [9:14:50<27:10, 56.22s/it] {'loss': 1.6128, 'learning_rate': 1.1621817997222507e-07, 'epoch': 7.62}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 587/616 [9:14:50<27:10, 56.22s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 588/616 [9:15:46<26:06, 55.95s/it] {'loss': 1.6196, 'learning_rate': 1.0835558061245587e-07, 'epoch': 7.64}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 588/616 [9:15:46<26:06, 55.95s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 589/616 [9:16:42<25:16, 56.17s/it] {'loss': 1.6621, 'learning_rate': 1.0076689799442874e-07, 'epoch': 7.65}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 589/616 [9:16:42<25:16, 56.17s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 590/616 [9:17:38<24:14, 55.95s/it] {'loss': 1.6216, 'learning_rate': 9.34523422618916e-08, 'epoch': 7.66}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 590/616 [9:17:38<24:14, 55.95s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 591/616 [9:18:35<23:27, 56.31s/it] {'loss': 1.6289, 'learning_rate': 8.641211596754129e-08, 'epoch': 7.68}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 591/616 [9:18:35<23:27, 56.31s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 592/616 [9:19:31<22:28, 56.19s/it] {'loss': 1.6279, 'learning_rate': 7.964641406742135e-08, 'epoch': 7.69}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 592/616 [9:19:31<22:28, 56.19s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 593/616 [9:20:28<21:37, 56.42s/it] {'loss': 1.6187, 'learning_rate': 7.315542391551966e-08, 'epoch': 7.7}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 593/616 [9:20:28<21:37, 56.42s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 594/616 [9:21:24<20:42, 56.46s/it] {'loss': 1.6445, 'learning_rate': 6.693932525857927e-08, 'epoch': 7.71}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 594/616 [9:21:24<20:42, 56.46s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 595/616 [9:22:21<19:45, 56.45s/it] {'loss': 1.6226, 'learning_rate': 6.099829023112236e-08, 'epoch': 7.73}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 595/616 [9:22:21<19:45, 56.45s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 596/616 [9:23:17<18:47, 56.35s/it] {'loss': 1.6025, 'learning_rate': 5.533248335068409e-08, 'epoch': 7.74}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 596/616 [9:23:17<18:47, 56.35s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 597/616 [9:24:13<17:49, 56.29s/it] {'loss': 1.5981, 'learning_rate': 4.994206151325509e-08, 'epoch': 7.75}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 597/616 [9:24:13<17:49, 56.29s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 598/616 [9:25:10<16:58, 56.58s/it] {'loss': 1.6479, 'learning_rate': 4.482717398894165e-08, 'epoch': 7.77}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 598/616 [9:25:10<16:58, 56.58s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 599/616 [9:26:07<16:03, 56.66s/it] {'loss': 1.6494, 'learning_rate': 3.998796241782232e-08, 'epoch': 7.78}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 599/616 [9:26:07<16:03, 56.66s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 600/616 [9:27:02<14:58, 56.13s/it] {'loss': 1.6328, 'learning_rate': 3.5424560806036625e-08, 'epoch': 7.79}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 600/616 [9:27:02<14:58, 56.13s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 601/616 [9:28:54<18:13, 72.93s/it] {'loss': 1.5732, 'learning_rate': 3.1137095522068006e-08, 'epoch': 7.81}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 601/616 [9:28:54<18:13, 72.93s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 602/616 [9:29:49<15:47, 67.67s/it] {'loss': 1.6196, 'learning_rate': 2.7125685293245552e-08, 'epoch': 7.82}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 602/616 [9:29:49<15:47, 67.67s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 603/616 [9:30:46<13:56, 64.33s/it] {'loss': 1.5894, 'learning_rate': 2.3390441202455484e-08, 'epoch': 7.83}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 603/616 [9:30:46<13:56, 64.33s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 604/616 [9:31:42<12:22, 61.86s/it] {'loss': 1.6172, 'learning_rate': 1.993146668506585e-08, 'epoch': 7.84}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 604/616 [9:31:42<12:22, 61.86s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 605/616 [9:32:40<11:07, 60.66s/it] {'loss': 1.604, 'learning_rate': 1.6748857526066588e-08, 'epoch': 7.86}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 605/616 [9:32:40<11:07, 60.66s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 606/616 [9:33:36<09:53, 59.31s/it] {'loss': 1.6172, 'learning_rate': 1.3842701857406104e-08, 'epoch': 7.87}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 606/616 [9:33:36<09:53, 59.31s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 607/616 [9:34:32<08:43, 58.22s/it] {'loss': 1.6377, 'learning_rate': 1.1213080155564327e-08, 'epoch': 7.88}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 607/616 [9:34:32<08:43, 58.22s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 608/616 [9:35:27<07:39, 57.40s/it] {'loss': 1.6064, 'learning_rate': 8.860065239311155e-09, 'epoch': 7.9}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 608/616 [9:35:27<07:39, 57.40s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 609/616 [9:36:23<06:38, 56.99s/it] {'loss': 1.6211, 'learning_rate': 6.783722267701409e-09, 'epoch': 7.91}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 609/616 [9:36:23<06:38, 56.99s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 610/616 [9:37:19<05:39, 56.62s/it] {'loss': 1.6274, 'learning_rate': 4.984108738261828e-09, 'epoch': 7.92}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 610/616 [9:37:19<05:39, 56.62s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 611/616 [9:38:16<04:43, 56.63s/it] {'loss': 1.6328, 'learning_rate': 3.4612744854045645e-09, 'epoch': 7.94}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 611/616 [9:38:16<04:43, 56.63s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 612/616 [9:39:11<03:45, 56.36s/it] {'loss': 1.583, 'learning_rate': 2.215261679042735e-09, 'epoch': 7.95}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 612/616 [9:39:11<03:45, 56.36s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 613/616 [9:40:08<02:49, 56.35s/it] {'loss': 1.6318, 'learning_rate': 1.246104823426908e-09, 'epoch': 7.96}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 613/616 [9:40:08<02:49, 56.35s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 614/616 [9:41:03<01:52, 56.01s/it] {'loss': 1.6265, 'learning_rate': 5.538307561858691e-10, 'epoch': 7.97}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 614/616 [9:41:03<01:52, 56.01s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 615/616 [9:41:58<00:55, 55.78s/it] {'loss': 1.605, 'learning_rate': 1.3845864758610384e-10, 'epoch': 7.99}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 615/616 [9:41:58<00:55, 55.78s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 616/616 [9:42:54<00:00, 55.92s/it] {'loss': 1.6025, 'learning_rate': 0.0, 'epoch': 8.0}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 616/616 [9:42:54<00:00, 55.92s/it] {'train_runtime': 34978.7912, 'train_samples_per_second': 2.252, 'train_steps_per_second': 0.018, 'train_loss': 2.115578391335227, 'epoch': 8.0}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 616/616 [9:42:54<00:00, 55.92s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 616/616 [9:42:54<00:00, 56.78s/it]
Non lora weights: dict_keys(['base_model.model.model.mm_projector.weight', 'base_model.model.model.mm_projector.bias', 'base_model.model.model.frames_conv.weight', 'base_model.model.model.frames_conv.bias'])
Non lora weights: dict_keys(['base_model.model.model.mm_projector.weight', 'base_model.model.model.mm_projector.bias', 'base_model.model.model.frames_conv.weight', 'base_model.model.model.frames_conv.bias'])
wandb: Waiting for W&B process to finish... (success).
[2023-10-13 12:46:18,400] [INFO] [launch.py:347:main] Process 1707 exits successfully.
wandb:
wandb: Run history:
wandb: train/epoch β–β–β–β–β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
wandb: train/global_step β–β–β–β–β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
wandb: train/learning_rate β–„β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–‡β–‡β–†β–†β–†β–†β–…β–…β–…β–…β–„β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–
wandb: train/loss β–ˆβ–…β–ƒβ–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–
wandb: train/total_flos ▁
wandb: train/train_loss ▁
wandb: train/train_runtime ▁
wandb: train/train_samples_per_second ▁
wandb: train/train_steps_per_second ▁
wandb:
wandb: Run summary:
wandb: train/epoch 8.0
wandb: train/global_step 616
wandb: train/learning_rate 0.0
wandb: train/loss 1.6025
wandb: train/total_flos 1.5114021399418634e+18
wandb: train/train_loss 2.11558
wandb: train/train_runtime 34978.7912
wandb: train/train_samples_per_second 2.252
wandb: train/train_steps_per_second 0.018
wandb:
wandb: πŸš€ View run fiery-dew-9 at: https://wandb.ai/wanghao-cst/huggingface/runs/30lhy90r
wandb: ️⚑ View job at https://wandb.ai/wanghao-cst/huggingface/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjEwNTk0Mjk1MA==/version_details/v2
wandb: Synced 5 W&B file(s), 0 media file(s), 2 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20231013_030309-30lhy90r/logs
[2023-10-13 12:46:56,444] [INFO] [launch.py:347:main] Process 1706 exits successfully.