|
[2023-10-13 02:59:14,478] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) |
|
[2023-10-13 02:59:16,541] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. |
|
[2023-10-13 02:59:16,541] [INFO] [runner.py:555:main] cmd = /usr/local/miniconda3/envs/llava/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None llava/train/train_mem_video.py --deepspeed ./scripts/zero2.json --lora_enable True --model_name_or_path /hy-tmp/vicuna-7b-v1.3 --version v1 --data_path ./data/avsd_train_omni.json --video_folder /hy-tmp/Charades_v1_480 --vision_tower /hy-tmp/clip-vit-large-patch14 --pretrain_mm_mlp_adapter /hy-tmp/llava-pretrain-vicuna-7b-v1.3/mm_projector.bin --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --bf16 True --output_dir /hy-tmp/checkpoints/omni-vicuna-7b-v1.3-finetune_lora --num_train_epochs 8 --per_device_train_batch_size 8 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --evaluation_strategy no --save_strategy steps --save_steps 100 --save_total_limit 3 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type cosine --logging_steps 1 --tf32 True --model_max_length 2048 --gradient_checkpointing True --lazy_preprocess True --dataloader_num_workers 8 --report_to wandb |
|
[2023-10-13 02:59:17,802] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) |
|
[2023-10-13 02:59:19,574] [INFO] [launch.py:138:main] 0 NCCL_P2P_LEVEL=NVL |
|
[2023-10-13 02:59:19,574] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]} |
|
[2023-10-13 02:59:19,574] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0 |
|
[2023-10-13 02:59:19,574] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]}) |
|
[2023-10-13 02:59:19,574] [INFO] [launch.py:163:main] dist_world_size=2 |
|
[2023-10-13 02:59:19,574] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1 |
|
[2023-10-13 02:59:22,389] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) |
|
[2023-10-13 02:59:22,433] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) |
|
[2023-10-13 02:59:22,977] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented |
|
[2023-10-13 02:59:22,977] [INFO] [comm.py:594:init_distributed] cdb=None |
|
[2023-10-13 02:59:22,977] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl |
|
[2023-10-13 02:59:23,051] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented |
|
[2023-10-13 02:59:23,051] [INFO] [comm.py:594:init_distributed] cdb=None |
|
You are using a model of type llama to instantiate a model of type omni. This is not supported for all configurations of models and can yield errors. |
|
You are using a model of type llama to instantiate a model of type omni. This is not supported for all configurations of models and can yield errors. |
|
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:17<00:17, 17.74s/it]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:24<00:24, 24.14s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:24<00:00, 11.07s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:24<00:00, 12.07s/it] |
|
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:35<00:00, 16.58s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:35<00:00, 17.72s/it] |
|
Adding LoRA adapters... |
|
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 |
|
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 |
|
Formatting inputs...Skip in lazy mode |
|
Rank: 0 partition count [2, 2] and sizes[(82444288, False), (2176, False)] |
|
Rank: 1 partition count [2, 2] and sizes[(82444288, False), (2176, False)] |
|
wandb: Currently logged in as: wanghao-cst. Use `wandb login --relogin` to force relogin |
|
wandb: Tracking run with wandb version 0.15.12 |
|
wandb: Run data is saved locally in /root/Omni-LLM/wandb/run-20231013_030309-30lhy90r |
|
wandb: Run `wandb offline` to turn off syncing. |
|
wandb: Syncing run fiery-dew-9 |
|
wandb: βοΈ View project at https://wandb.ai/wanghao-cst/huggingface |
|
wandb: π View run at https://wandb.ai/wanghao-cst/huggingface/runs/30lhy90r |
|
0%| | 0/616 [00:00<?, ?it/s]
0%| | 1/616 [01:39<17:00:22, 99.55s/it]
{'loss': 12.2148, 'learning_rate': 1.0526315789473685e-06, 'epoch': 0.01} |
|
0%| | 1/616 [01:39<17:00:22, 99.55s/it]
0%| | 2/616 [02:35<12:33:55, 73.67s/it]
{'loss': 12.0312, 'learning_rate': 2.105263157894737e-06, 'epoch': 0.03} |
|
0%| | 2/616 [02:35<12:33:55, 73.67s/it]
0%| | 3/616 [03:30<11:05:55, 65.18s/it]
{'loss': 12.3086, 'learning_rate': 3.157894736842105e-06, 'epoch': 0.04} |
|
0%| | 3/616 [03:30<11:05:55, 65.18s/it]
1%| | 4/616 [04:24<10:22:36, 61.04s/it]
{'loss': 12.1172, 'learning_rate': 4.210526315789474e-06, 'epoch': 0.05} |
|
1%| | 4/616 [04:24<10:22:36, 61.04s/it]
1%| | 5/616 [05:20<10:01:41, 59.09s/it]
{'loss': 12.0117, 'learning_rate': 5.263157894736842e-06, 'epoch': 0.06} |
|
1%| | 5/616 [05:20<10:01:41, 59.09s/it]
1%| | 6/616 [06:15<9:47:18, 57.77s/it]
{'loss': 12.2656, 'learning_rate': 6.31578947368421e-06, 'epoch': 0.08} |
|
1%| | 6/616 [06:15<9:47:18, 57.77s/it]
1%| | 7/616 [07:11<9:39:26, 57.09s/it]
{'loss': 12.125, 'learning_rate': 7.368421052631579e-06, 'epoch': 0.09} |
|
1%| | 7/616 [07:11<9:39:26, 57.09s/it]
1%|β | 8/616 [08:05<9:30:10, 56.27s/it]
{'loss': 11.2266, 'learning_rate': 8.421052631578948e-06, 'epoch': 0.1} |
|
1%|β | 8/616 [08:05<9:30:10, 56.27s/it]
1%|β | 9/616 [09:01<9:28:26, 56.19s/it]
{'loss': 11.1523, 'learning_rate': 9.473684210526315e-06, 'epoch': 0.12} |
|
1%|β | 9/616 [09:01<9:28:26, 56.19s/it]
2%|β | 10/616 [09:56<9:23:58, 55.84s/it]
{'loss': 9.5234, 'learning_rate': 1.0526315789473684e-05, 'epoch': 0.13} |
|
2%|β | 10/616 [09:56<9:23:58, 55.84s/it]
2%|β | 11/616 [10:52<9:23:12, 55.86s/it]
{'loss': 9.4688, 'learning_rate': 1.1578947368421053e-05, 'epoch': 0.14} |
|
2%|β | 11/616 [10:52<9:23:12, 55.86s/it]
2%|β | 12/616 [11:49<9:24:18, 56.06s/it]
{'loss': 9.25, 'learning_rate': 1.263157894736842e-05, 'epoch': 0.16} |
|
2%|β | 12/616 [11:49<9:24:18, 56.06s/it]
2%|β | 13/616 [12:44<9:20:17, 55.75s/it]
{'loss': 7.7285, 'learning_rate': 1.3684210526315791e-05, 'epoch': 0.17} |
|
2%|β | 13/616 [12:44<9:20:17, 55.75s/it]
2%|β | 14/616 [13:39<9:16:20, 55.45s/it]
{'loss': 7.6367, 'learning_rate': 1.4736842105263159e-05, 'epoch': 0.18} |
|
2%|β | 14/616 [13:39<9:16:20, 55.45s/it]
2%|β | 15/616 [14:34<9:16:23, 55.55s/it]
{'loss': 7.4844, 'learning_rate': 1.578947368421053e-05, 'epoch': 0.19} |
|
2%|β | 15/616 [14:34<9:16:23, 55.55s/it]
3%|β | 16/616 [15:30<9:16:29, 55.65s/it]
{'loss': 7.2422, 'learning_rate': 1.6842105263157896e-05, 'epoch': 0.21} |
|
3%|β | 16/616 [15:30<9:16:29, 55.65s/it]
3%|β | 17/616 [16:27<9:18:45, 55.97s/it]
{'loss': 7.0938, 'learning_rate': 1.7894736842105264e-05, 'epoch': 0.22} |
|
3%|β | 17/616 [16:27<9:18:45, 55.97s/it]
3%|β | 18/616 [17:22<9:14:48, 55.67s/it]
{'loss': 6.7266, 'learning_rate': 1.894736842105263e-05, 'epoch': 0.23} |
|
3%|β | 18/616 [17:22<9:14:48, 55.67s/it]
3%|β | 19/616 [18:17<9:10:47, 55.36s/it]
{'loss': 6.5234, 'learning_rate': 2e-05, 'epoch': 0.25} |
|
3%|β | 19/616 [18:17<9:10:47, 55.36s/it]
3%|β | 20/616 [19:13<9:11:42, 55.54s/it]
{'loss': 6.3477, 'learning_rate': 1.9999861541352416e-05, 'epoch': 0.26} |
|
3%|β | 20/616 [19:13<9:11:42, 55.54s/it]
3%|β | 21/616 [20:08<9:10:46, 55.54s/it]
{'loss': 6.127, 'learning_rate': 1.9999446169243816e-05, 'epoch': 0.27} |
|
3%|β | 21/616 [20:08<9:10:46, 55.54s/it]
4%|β | 22/616 [21:03<9:07:59, 55.35s/it]
{'loss': 5.8555, 'learning_rate': 1.9998753895176576e-05, 'epoch': 0.29} |
|
4%|β | 22/616 [21:03<9:07:59, 55.35s/it]
4%|β | 23/616 [22:00<9:11:42, 55.82s/it]
{'loss': 5.7402, 'learning_rate': 1.999778473832096e-05, 'epoch': 0.3} |
|
4%|β | 23/616 [22:00<9:11:42, 55.82s/it]
4%|β | 24/616 [22:54<9:06:29, 55.39s/it]
{'loss': 5.5605, 'learning_rate': 1.9996538725514597e-05, 'epoch': 0.31} |
|
4%|β | 24/616 [22:54<9:06:29, 55.39s/it]
4%|β | 25/616 [23:50<9:05:28, 55.38s/it]
{'loss': 5.4199, 'learning_rate': 1.999501589126174e-05, 'epoch': 0.32} |
|
4%|β | 25/616 [23:50<9:05:28, 55.38s/it]
4%|β | 26/616 [24:46<9:07:10, 55.65s/it]
{'loss': 5.3242, 'learning_rate': 1.9993216277732302e-05, 'epoch': 0.34} |
|
4%|β | 26/616 [24:46<9:07:10, 55.65s/it]
4%|β | 27/616 [25:42<9:05:58, 55.62s/it]
{'loss': 5.2148, 'learning_rate': 1.999113993476069e-05, 'epoch': 0.35} |
|
4%|β | 27/616 [25:42<9:05:58, 55.62s/it]
5%|β | 28/616 [26:37<9:04:07, 55.52s/it]
{'loss': 5.1016, 'learning_rate': 1.9988786919844437e-05, 'epoch': 0.36} |
|
5%|β | 28/616 [26:37<9:04:07, 55.52s/it]
5%|β | 29/616 [27:34<9:06:56, 55.91s/it]
{'loss': 5.0488, 'learning_rate': 1.9986157298142595e-05, 'epoch': 0.38} |
|
5%|β | 29/616 [27:34<9:06:56, 55.91s/it]
5%|β | 30/616 [28:28<9:02:34, 55.55s/it]
{'loss': 4.9258, 'learning_rate': 1.9983251142473935e-05, 'epoch': 0.39} |
|
5%|β | 30/616 [28:28<9:02:34, 55.55s/it]
5%|β | 31/616 [29:26<9:06:21, 56.04s/it]
{'loss': 4.9531, 'learning_rate': 1.9980068533314937e-05, 'epoch': 0.4} |
|
5%|β | 31/616 [29:26<9:06:21, 56.04s/it]
5%|β | 32/616 [30:21<9:05:01, 56.00s/it]
{'loss': 4.8535, 'learning_rate': 1.9976609558797545e-05, 'epoch': 0.42} |
|
5%|β | 32/616 [30:21<9:05:01, 56.00s/it]
5%|β | 33/616 [31:17<9:02:02, 55.79s/it]
{'loss': 4.8203, 'learning_rate': 1.9972874314706755e-05, 'epoch': 0.43} |
|
5%|β | 33/616 [31:17<9:02:02, 55.79s/it]
6%|β | 34/616 [32:12<8:58:28, 55.51s/it]
{'loss': 4.8535, 'learning_rate': 1.9968862904477936e-05, 'epoch': 0.44} |
|
6%|β | 34/616 [32:12<8:58:28, 55.51s/it]
6%|β | 35/616 [33:07<8:56:38, 55.42s/it]
{'loss': 4.7168, 'learning_rate': 1.9964575439193966e-05, 'epoch': 0.45} |
|
6%|β | 35/616 [33:07<8:56:38, 55.42s/it]
6%|β | 36/616 [34:02<8:53:51, 55.23s/it]
{'loss': 4.6875, 'learning_rate': 1.996001203758218e-05, 'epoch': 0.47} |
|
6%|β | 36/616 [34:02<8:53:51, 55.23s/it]
6%|β | 37/616 [34:56<8:50:03, 54.93s/it]
{'loss': 4.6172, 'learning_rate': 1.995517282601106e-05, 'epoch': 0.48} |
|
6%|β | 37/616 [34:56<8:50:03, 54.93s/it]
6%|β | 38/616 [35:52<8:51:28, 55.17s/it]
{'loss': 4.6523, 'learning_rate': 1.9950057938486745e-05, 'epoch': 0.49} |
|
6%|β | 38/616 [35:52<8:51:28, 55.17s/it]
6%|β | 39/616 [36:48<8:54:36, 55.59s/it]
{'loss': 4.5195, 'learning_rate': 1.994466751664932e-05, 'epoch': 0.51} |
|
6%|β | 39/616 [36:48<8:54:36, 55.59s/it]
6%|β | 40/616 [37:44<8:54:53, 55.72s/it]
{'loss': 4.5117, 'learning_rate': 1.993900170976888e-05, 'epoch': 0.52} |
|
6%|β | 40/616 [37:44<8:54:53, 55.72s/it]
7%|β | 41/616 [38:39<8:52:15, 55.54s/it]
{'loss': 4.4141, 'learning_rate': 1.9933060674741422e-05, 'epoch': 0.53} |
|
7%|β | 41/616 [38:39<8:52:15, 55.54s/it]
7%|β | 42/616 [39:35<8:52:13, 55.63s/it]
{'loss': 4.3398, 'learning_rate': 1.9926844576084483e-05, 'epoch': 0.55} |
|
7%|β | 42/616 [39:35<8:52:13, 55.63s/it]
7%|β | 43/616 [40:32<8:54:09, 55.93s/it]
{'loss': 4.3232, 'learning_rate': 1.992035358593258e-05, 'epoch': 0.56} |
|
7%|β | 43/616 [40:32<8:54:09, 55.93s/it]
7%|β | 44/616 [41:28<8:52:43, 55.88s/it]
{'loss': 4.2305, 'learning_rate': 1.991358788403246e-05, 'epoch': 0.57} |
|
7%|β | 44/616 [41:28<8:52:43, 55.88s/it]
7%|β | 45/616 [42:23<8:50:59, 55.80s/it]
{'loss': 4.1641, 'learning_rate': 1.990654765773811e-05, 'epoch': 0.58} |
|
7%|β | 45/616 [42:23<8:50:59, 55.80s/it]
7%|β | 46/616 [43:18<8:48:37, 55.65s/it]
{'loss': 4.0674, 'learning_rate': 1.9899233102005573e-05, 'epoch': 0.6} |
|
7%|β | 46/616 [43:18<8:48:37, 55.65s/it]
8%|β | 47/616 [44:14<8:48:51, 55.77s/it]
{'loss': 3.915, 'learning_rate': 1.9891644419387545e-05, 'epoch': 0.61} |
|
8%|β | 47/616 [44:14<8:48:51, 55.77s/it]
8%|β | 48/616 [45:10<8:46:06, 55.58s/it]
{'loss': 3.7822, 'learning_rate': 1.9883781820027777e-05, 'epoch': 0.62} |
|
8%|β | 48/616 [45:10<8:46:06, 55.58s/it]
8%|β | 49/616 [46:06<8:47:09, 55.78s/it]
{'loss': 3.709, 'learning_rate': 1.987564552165524e-05, 'epoch': 0.64} |
|
8%|β | 49/616 [46:06<8:47:09, 55.78s/it]
8%|β | 50/616 [47:03<8:49:27, 56.13s/it]
{'loss': 3.4131, 'learning_rate': 1.9867235749578108e-05, 'epoch': 0.65} |
|
8%|β | 50/616 [47:03<8:49:27, 56.13s/it]
8%|β | 51/616 [47:59<8:47:39, 56.03s/it]
{'loss': 3.1318, 'learning_rate': 1.9858552736677516e-05, 'epoch': 0.66} |
|
8%|β | 51/616 [47:59<8:47:39, 56.03s/it]
8%|β | 52/616 [48:56<8:49:19, 56.31s/it]
{'loss': 2.834, 'learning_rate': 1.984959672340111e-05, 'epoch': 0.68} |
|
8%|β | 52/616 [48:56<8:49:19, 56.31s/it]
9%|β | 53/616 [49:52<8:48:34, 56.33s/it]
{'loss': 2.5654, 'learning_rate': 1.984036795775638e-05, 'epoch': 0.69} |
|
9%|β | 53/616 [49:52<8:48:34, 56.33s/it]
9%|β | 54/616 [50:48<8:47:14, 56.29s/it]
{'loss': 2.417, 'learning_rate': 1.9830866695303817e-05, 'epoch': 0.7} |
|
9%|β | 54/616 [50:48<8:47:14, 56.29s/it]
9%|β | 55/616 [51:45<8:46:52, 56.35s/it]
{'loss': 2.1909, 'learning_rate': 1.9821093199149806e-05, 'epoch': 0.71} |
|
9%|β | 55/616 [51:45<8:46:52, 56.35s/it]
9%|β | 56/616 [52:41<8:47:14, 56.49s/it]
{'loss': 2.2568, 'learning_rate': 1.981104773993936e-05, 'epoch': 0.73} |
|
9%|β | 56/616 [52:41<8:47:14, 56.49s/it]
9%|β | 57/616 [53:37<8:44:24, 56.29s/it]
{'loss': 2.2744, 'learning_rate': 1.980073059584862e-05, 'epoch': 0.74} |
|
9%|β | 57/616 [53:37<8:44:24, 56.29s/it]
9%|β | 58/616 [54:34<8:43:43, 56.31s/it]
{'loss': 2.0771, 'learning_rate': 1.9790142052577148e-05, 'epoch': 0.75} |
|
9%|β | 58/616 [54:34<8:43:43, 56.31s/it]
10%|β | 59/616 [55:29<8:41:20, 56.16s/it]
{'loss': 2.1729, 'learning_rate': 1.977928240334002e-05, 'epoch': 0.77} |
|
10%|β | 59/616 [55:29<8:41:20, 56.16s/it]
10%|β | 60/616 [56:25<8:37:59, 55.90s/it]
{'loss': 2.123, 'learning_rate': 1.9768151948859705e-05, 'epoch': 0.78} |
|
10%|β | 60/616 [56:25<8:37:59, 55.90s/it]
10%|β | 61/616 [57:21<8:38:57, 56.10s/it]
{'loss': 2.0356, 'learning_rate': 1.9756750997357738e-05, 'epoch': 0.79} |
|
10%|β | 61/616 [57:21<8:38:57, 56.10s/it]
10%|β | 62/616 [58:17<8:37:46, 56.08s/it]
{'loss': 2.0142, 'learning_rate': 1.9745079864546184e-05, 'epoch': 0.81} |
|
10%|β | 62/616 [58:17<8:37:46, 56.08s/it]
10%|β | 63/616 [59:12<8:34:10, 55.79s/it]
{'loss': 2.061, 'learning_rate': 1.97331388736189e-05, 'epoch': 0.82} |
|
10%|β | 63/616 [59:12<8:34:10, 55.79s/it]
10%|β | 64/616 [1:00:08<8:31:47, 55.63s/it]
{'loss': 2.0508, 'learning_rate': 1.972092835524257e-05, 'epoch': 0.83} |
|
10%|β | 64/616 [1:00:08<8:31:47, 55.63s/it]
11%|β | 65/616 [1:01:05<8:36:39, 56.26s/it]
{'loss': 2.0171, 'learning_rate': 1.9708448647547575e-05, 'epoch': 0.84} |
|
11%|β | 65/616 [1:01:05<8:36:39, 56.26s/it]
11%|β | 66/616 [1:02:02<8:37:02, 56.40s/it]
{'loss': 2.1284, 'learning_rate': 1.9695700096118594e-05, 'epoch': 0.86} |
|
11%|β | 66/616 [1:02:02<8:37:02, 56.40s/it]
11%|β | 67/616 [1:02:58<8:35:36, 56.35s/it]
{'loss': 2.0166, 'learning_rate': 1.9682683053985073e-05, 'epoch': 0.87} |
|
11%|β | 67/616 [1:02:58<8:35:36, 56.35s/it]
11%|β | 68/616 [1:03:54<8:33:34, 56.23s/it]
{'loss': 2.062, 'learning_rate': 1.966939788161142e-05, 'epoch': 0.88} |
|
11%|β | 68/616 [1:03:54<8:33:34, 56.23s/it]
11%|β | 69/616 [1:04:50<8:31:20, 56.09s/it]
{'loss': 2.0142, 'learning_rate': 1.9655844946887035e-05, 'epoch': 0.9} |
|
11%|β | 69/616 [1:04:50<8:31:20, 56.09s/it]
11%|ββ | 70/616 [1:05:45<8:27:52, 55.81s/it]
{'loss': 2.0103, 'learning_rate': 1.9642024625116117e-05, 'epoch': 0.91} |
|
11%|ββ | 70/616 [1:05:45<8:27:52, 55.81s/it]
12%|ββ | 71/616 [1:06:41<8:26:19, 55.74s/it]
{'loss': 1.9956, 'learning_rate': 1.9627937299007286e-05, 'epoch': 0.92} |
|
12%|ββ | 71/616 [1:06:41<8:26:19, 55.74s/it]
12%|ββ | 72/616 [1:07:38<8:29:09, 56.16s/it]
{'loss': 1.9868, 'learning_rate': 1.961358335866296e-05, 'epoch': 0.94} |
|
12%|ββ | 72/616 [1:07:38<8:29:09, 56.16s/it]
12%|ββ | 73/616 [1:08:34<8:27:42, 56.10s/it]
{'loss': 2.0435, 'learning_rate': 1.959896320156857e-05, 'epoch': 0.95} |
|
12%|ββ | 73/616 [1:08:34<8:27:42, 56.10s/it]
12%|ββ | 74/616 [1:09:31<8:28:19, 56.27s/it]
{'loss': 2.0112, 'learning_rate': 1.958407723258156e-05, 'epoch': 0.96} |
|
12%|ββ | 74/616 [1:09:31<8:28:19, 56.27s/it]
12%|ββ | 75/616 [1:10:26<8:24:52, 55.99s/it]
{'loss': 2.0908, 'learning_rate': 1.9568925863920155e-05, 'epoch': 0.97} |
|
12%|ββ | 75/616 [1:10:26<8:24:52, 55.99s/it]
12%|ββ | 76/616 [1:11:22<8:25:13, 56.14s/it]
{'loss': 1.9795, 'learning_rate': 1.955350951515195e-05, 'epoch': 0.99} |
|
12%|ββ | 76/616 [1:11:22<8:25:13, 56.14s/it]
12%|ββ | 77/616 [1:12:19<8:25:03, 56.22s/it]
{'loss': 2.0112, 'learning_rate': 1.9537828613182314e-05, 'epoch': 1.0} |
|
12%|ββ | 77/616 [1:12:19<8:25:03, 56.22s/it]
13%|ββ | 78/616 [1:13:47<9:49:34, 65.75s/it]
{'loss': 2.0459, 'learning_rate': 1.9521883592242537e-05, 'epoch': 1.01} |
|
13%|ββ | 78/616 [1:13:47<9:49:34, 65.75s/it]
13%|ββ | 79/616 [1:14:43<9:21:38, 62.75s/it]
{'loss': 2.0117, 'learning_rate': 1.950567489387783e-05, 'epoch': 1.03} |
|
13%|ββ | 79/616 [1:14:43<9:21:38, 62.75s/it]
13%|ββ | 80/616 [1:15:37<8:59:10, 60.35s/it]
{'loss': 2.0156, 'learning_rate': 1.9489202966935084e-05, 'epoch': 1.04} |
|
13%|ββ | 80/616 [1:15:37<8:59:10, 60.35s/it]
13%|ββ | 81/616 [1:16:33<8:45:30, 58.93s/it]
{'loss': 2.0547, 'learning_rate': 1.947246826755044e-05, 'epoch': 1.05} |
|
13%|ββ | 81/616 [1:16:33<8:45:30, 58.93s/it]
13%|ββ | 82/616 [1:17:29<8:37:02, 58.09s/it]
{'loss': 1.9639, 'learning_rate': 1.945547125913667e-05, 'epoch': 1.06} |
|
13%|ββ | 82/616 [1:17:29<8:37:02, 58.09s/it]
13%|ββ | 83/616 [1:18:25<8:29:53, 57.40s/it]
{'loss': 2.019, 'learning_rate': 1.943821241237034e-05, 'epoch': 1.08} |
|
13%|ββ | 83/616 [1:18:25<8:29:53, 57.40s/it]
14%|ββ | 84/616 [1:19:20<8:23:52, 56.83s/it]
{'loss': 1.9771, 'learning_rate': 1.9420692205178753e-05, 'epoch': 1.09} |
|
14%|ββ | 84/616 [1:19:20<8:23:52, 56.83s/it]
14%|ββ | 85/616 [1:20:16<8:21:01, 56.61s/it]
{'loss': 1.9492, 'learning_rate': 1.9402911122726756e-05, 'epoch': 1.1} |
|
14%|ββ | 85/616 [1:20:16<8:21:01, 56.61s/it]
14%|ββ | 86/616 [1:21:11<8:14:46, 56.01s/it]
{'loss': 1.9702, 'learning_rate': 1.9384869657403277e-05, 'epoch': 1.12} |
|
14%|ββ | 86/616 [1:21:11<8:14:46, 56.01s/it]
14%|ββ | 87/616 [1:22:06<8:11:43, 55.77s/it]
{'loss': 1.9946, 'learning_rate': 1.9366568308807685e-05, 'epoch': 1.13} |
|
14%|ββ | 87/616 [1:22:06<8:11:43, 55.77s/it]
14%|ββ | 88/616 [1:23:01<8:09:00, 55.57s/it]
{'loss': 1.9854, 'learning_rate': 1.9348007583735985e-05, 'epoch': 1.14} |
|
14%|ββ | 88/616 [1:23:01<8:09:00, 55.57s/it]
14%|ββ | 89/616 [1:23:57<8:06:59, 55.45s/it]
{'loss': 1.959, 'learning_rate': 1.9329187996166747e-05, 'epoch': 1.16} |
|
14%|ββ | 89/616 [1:23:57<8:06:59, 55.45s/it]
15%|ββ | 90/616 [1:24:52<8:07:03, 55.56s/it]
{'loss': 1.9722, 'learning_rate': 1.9310110067246905e-05, 'epoch': 1.17} |
|
15%|ββ | 90/616 [1:24:52<8:07:03, 55.56s/it]
15%|ββ | 91/616 [1:25:48<8:07:08, 55.67s/it]
{'loss': 2.0376, 'learning_rate': 1.9290774325277305e-05, 'epoch': 1.18} |
|
15%|ββ | 91/616 [1:25:48<8:07:08, 55.67s/it]
15%|ββ | 92/616 [1:26:44<8:06:06, 55.66s/it]
{'loss': 1.9834, 'learning_rate': 1.9271181305698084e-05, 'epoch': 1.19} |
|
15%|ββ | 92/616 [1:26:44<8:06:06, 55.66s/it]
15%|ββ | 93/616 [1:27:40<8:05:12, 55.66s/it]
{'loss': 2.0049, 'learning_rate': 1.9251331551073843e-05, 'epoch': 1.21} |
|
15%|ββ | 93/616 [1:27:40<8:05:12, 55.66s/it]
15%|ββ | 94/616 [1:28:35<8:03:16, 55.55s/it]
{'loss': 1.9824, 'learning_rate': 1.923122561107861e-05, 'epoch': 1.22} |
|
15%|ββ | 94/616 [1:28:35<8:03:16, 55.55s/it]
15%|ββ | 95/616 [1:29:30<8:02:27, 55.56s/it]
{'loss': 1.9624, 'learning_rate': 1.9210864042480645e-05, 'epoch': 1.23} |
|
15%|ββ | 95/616 [1:29:30<8:02:27, 55.56s/it]
16%|ββ | 96/616 [1:30:26<8:02:30, 55.67s/it]
{'loss': 1.9395, 'learning_rate': 1.9190247409126993e-05, 'epoch': 1.25} |
|
16%|ββ | 96/616 [1:30:26<8:02:30, 55.67s/it]
16%|ββ | 97/616 [1:31:22<8:01:13, 55.63s/it]
{'loss': 1.9746, 'learning_rate': 1.916937628192789e-05, 'epoch': 1.26} |
|
16%|ββ | 97/616 [1:31:22<8:01:13, 55.63s/it]
16%|ββ | 98/616 [1:32:17<7:59:31, 55.54s/it]
{'loss': 1.9507, 'learning_rate': 1.9148251238840947e-05, 'epoch': 1.27} |
|
16%|ββ | 98/616 [1:32:17<7:59:31, 55.54s/it]
16%|ββ | 99/616 [1:33:13<7:59:26, 55.64s/it]
{'loss': 2.0054, 'learning_rate': 1.9126872864855142e-05, 'epoch': 1.29} |
|
16%|ββ | 99/616 [1:33:13<7:59:26, 55.64s/it]
16%|ββ | 100/616 [1:34:09<7:58:14, 55.61s/it]
{'loss': 1.9409, 'learning_rate': 1.9105241751974624e-05, 'epoch': 1.3} |
|
16%|ββ | 100/616 [1:34:09<7:58:14, 55.61s/it]/usr/local/miniconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. |
|
warnings.warn( |
|
/usr/local/miniconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. |
|
warnings.warn( |
|
16%|ββ | 101/616 [1:36:09<10:43:30, 74.97s/it]
{'loss': 1.9912, 'learning_rate': 1.9083358499202323e-05, 'epoch': 1.31} |
|
16%|ββ | 101/616 [1:36:09<10:43:30, 74.97s/it]
17%|ββ | 102/616 [1:37:04<9:52:10, 69.12s/it]
{'loss': 1.9404, 'learning_rate': 1.9061223712523352e-05, 'epoch': 1.32} |
|
17%|ββ | 102/616 [1:37:04<9:52:10, 69.12s/it]
17%|ββ | 103/616 [1:38:00<9:16:14, 65.06s/it]
{'loss': 1.9102, 'learning_rate': 1.903883800488824e-05, 'epoch': 1.34} |
|
17%|ββ | 103/616 [1:38:00<9:16:14, 65.06s/it]
17%|ββ | 104/616 [1:38:55<8:49:52, 62.09s/it]
{'loss': 1.9248, 'learning_rate': 1.9016201996195943e-05, 'epoch': 1.35} |
|
17%|ββ | 104/616 [1:38:55<8:49:52, 62.09s/it]
17%|ββ | 105/616 [1:39:51<8:32:54, 60.22s/it]
{'loss': 1.8984, 'learning_rate': 1.8993316313276694e-05, 'epoch': 1.36} |
|
17%|ββ | 105/616 [1:39:51<8:32:54, 60.22s/it]
17%|ββ | 106/616 [1:40:46<8:19:20, 58.75s/it]
{'loss': 1.9331, 'learning_rate': 1.8970181589874637e-05, 'epoch': 1.38} |
|
17%|ββ | 106/616 [1:40:46<8:19:20, 58.75s/it]
17%|ββ | 107/616 [1:41:42<8:11:28, 57.93s/it]
{'loss': 1.9561, 'learning_rate': 1.894679846663027e-05, 'epoch': 1.39} |
|
17%|ββ | 107/616 [1:41:42<8:11:28, 57.93s/it]
18%|ββ | 108/616 [1:42:38<8:04:50, 57.26s/it]
{'loss': 1.8901, 'learning_rate': 1.8923167591062723e-05, 'epoch': 1.4} |
|
18%|ββ | 108/616 [1:42:38<8:04:50, 57.26s/it]
18%|ββ | 109/616 [1:43:34<8:00:58, 56.92s/it]
{'loss': 1.9922, 'learning_rate': 1.8899289617551803e-05, 'epoch': 1.42} |
|
18%|ββ | 109/616 [1:43:34<8:00:58, 56.92s/it]
18%|ββ | 110/616 [1:44:29<7:55:58, 56.44s/it]
{'loss': 1.9277, 'learning_rate': 1.8875165207319902e-05, 'epoch': 1.43} |
|
18%|ββ | 110/616 [1:44:29<7:55:58, 56.44s/it]
18%|ββ | 111/616 [1:45:25<7:53:10, 56.22s/it]
{'loss': 1.9185, 'learning_rate': 1.8850795028413658e-05, 'epoch': 1.44} |
|
18%|ββ | 111/616 [1:45:25<7:53:10, 56.22s/it]
18%|ββ | 112/616 [1:46:21<7:50:46, 56.04s/it]
{'loss': 1.9575, 'learning_rate': 1.882617975568547e-05, 'epoch': 1.45} |
|
18%|ββ | 112/616 [1:46:21<7:50:46, 56.04s/it]
18%|ββ | 113/616 [1:47:15<7:46:22, 55.63s/it]
{'loss': 1.957, 'learning_rate': 1.880132007077482e-05, 'epoch': 1.47} |
|
18%|ββ | 113/616 [1:47:15<7:46:22, 55.63s/it]
19%|ββ | 114/616 [1:48:11<7:45:53, 55.69s/it]
{'loss': 1.8984, 'learning_rate': 1.8776216662089373e-05, 'epoch': 1.48} |
|
19%|ββ | 114/616 [1:48:11<7:45:53, 55.69s/it]
19%|ββ | 115/616 [1:49:08<7:46:53, 55.92s/it]
{'loss': 1.9429, 'learning_rate': 1.875087022478594e-05, 'epoch': 1.49} |
|
19%|ββ | 115/616 [1:49:08<7:46:53, 55.92s/it]
19%|ββ | 116/616 [1:50:03<7:45:27, 55.86s/it]
{'loss': 1.8701, 'learning_rate': 1.8725281460751198e-05, 'epoch': 1.51} |
|
19%|ββ | 116/616 [1:50:03<7:45:27, 55.86s/it]
19%|ββ | 117/616 [1:50:59<7:43:13, 55.70s/it]
{'loss': 1.9497, 'learning_rate': 1.869945107858228e-05, 'epoch': 1.52} |
|
19%|ββ | 117/616 [1:50:59<7:43:13, 55.70s/it]
19%|ββ | 118/616 [1:51:55<7:44:34, 55.97s/it]
{'loss': 1.8921, 'learning_rate': 1.867337979356715e-05, 'epoch': 1.53} |
|
19%|ββ | 118/616 [1:51:55<7:44:34, 55.97s/it]
19%|ββ | 119/616 [1:52:51<7:42:44, 55.86s/it]
{'loss': 1.8569, 'learning_rate': 1.8647068327664774e-05, 'epoch': 1.55} |
|
19%|ββ | 119/616 [1:52:51<7:42:44, 55.86s/it]
19%|ββ | 120/616 [1:53:47<7:42:20, 55.93s/it]
{'loss': 1.8882, 'learning_rate': 1.8620517409485148e-05, 'epoch': 1.56} |
|
19%|ββ | 120/616 [1:53:47<7:42:20, 55.93s/it]
20%|ββ | 121/616 [1:54:43<7:40:16, 55.79s/it]
{'loss': 1.8765, 'learning_rate': 1.8593727774269122e-05, 'epoch': 1.57} |
|
20%|ββ | 121/616 [1:54:43<7:40:16, 55.79s/it]
20%|ββ | 122/616 [1:55:36<7:34:53, 55.25s/it]
{'loss': 1.9282, 'learning_rate': 1.8566700163868027e-05, 'epoch': 1.58} |
|
20%|ββ | 122/616 [1:55:36<7:34:53, 55.25s/it]
20%|ββ | 123/616 [1:56:32<7:34:34, 55.32s/it]
{'loss': 1.8384, 'learning_rate': 1.8539435326723135e-05, 'epoch': 1.6} |
|
20%|ββ | 123/616 [1:56:32<7:34:34, 55.32s/it]
20%|ββ | 124/616 [1:57:28<7:35:47, 55.58s/it]
{'loss': 1.9185, 'learning_rate': 1.851193401784495e-05, 'epoch': 1.61} |
|
20%|ββ | 124/616 [1:57:28<7:35:47, 55.58s/it]
20%|ββ | 125/616 [1:58:23<7:32:30, 55.30s/it]
{'loss': 1.834, 'learning_rate': 1.848419699879227e-05, 'epoch': 1.62} |
|
20%|ββ | 125/616 [1:58:23<7:32:30, 55.30s/it]
20%|ββ | 126/616 [1:59:19<7:32:47, 55.44s/it]
{'loss': 1.8657, 'learning_rate': 1.845622503765113e-05, 'epoch': 1.64} |
|
20%|ββ | 126/616 [1:59:19<7:32:47, 55.44s/it]
21%|ββ | 127/616 [2:00:14<7:31:52, 55.44s/it]
{'loss': 1.8457, 'learning_rate': 1.842801890901351e-05, 'epoch': 1.65} |
|
21%|ββ | 127/616 [2:00:14<7:31:52, 55.44s/it]
21%|ββ | 128/616 [2:01:09<7:30:42, 55.41s/it]
{'loss': 1.7671, 'learning_rate': 1.8399579393955893e-05, 'epoch': 1.66} |
|
21%|ββ | 128/616 [2:01:09<7:30:42, 55.41s/it]
21%|ββ | 129/616 [2:02:04<7:28:38, 55.27s/it]
{'loss': 1.8462, 'learning_rate': 1.837090728001764e-05, 'epoch': 1.68} |
|
21%|ββ | 129/616 [2:02:04<7:28:38, 55.27s/it]
21%|ββ | 130/616 [2:03:00<7:28:17, 55.34s/it]
{'loss': 1.8296, 'learning_rate': 1.834200336117918e-05, 'epoch': 1.69} |
|
21%|ββ | 130/616 [2:03:00<7:28:17, 55.34s/it]
21%|βββ | 131/616 [2:03:55<7:27:18, 55.34s/it]
{'loss': 1.8262, 'learning_rate': 1.8312868437840002e-05, 'epoch': 1.7} |
|
21%|βββ | 131/616 [2:03:55<7:27:18, 55.34s/it]
21%|βββ | 132/616 [2:04:50<7:25:41, 55.25s/it]
{'loss': 1.835, 'learning_rate': 1.8283503316796536e-05, 'epoch': 1.71} |
|
21%|βββ | 132/616 [2:04:50<7:25:41, 55.25s/it]
22%|βββ | 133/616 [2:05:46<7:26:05, 55.42s/it]
{'loss': 1.8979, 'learning_rate': 1.8253908811219764e-05, 'epoch': 1.73} |
|
22%|βββ | 133/616 [2:05:46<7:26:05, 55.42s/it]
22%|βββ | 134/616 [2:06:43<7:28:04, 55.78s/it]
{'loss': 1.8496, 'learning_rate': 1.822408574063273e-05, 'epoch': 1.74} |
|
22%|βββ | 134/616 [2:06:43<7:28:04, 55.78s/it]
22%|βββ | 135/616 [2:07:39<7:27:34, 55.83s/it]
{'loss': 1.8252, 'learning_rate': 1.8194034930887842e-05, 'epoch': 1.75} |
|
22%|βββ | 135/616 [2:07:39<7:27:34, 55.83s/it]
22%|βββ | 136/616 [2:08:34<7:26:39, 55.83s/it]
{'loss': 1.7812, 'learning_rate': 1.8163757214143993e-05, 'epoch': 1.77} |
|
22%|βββ | 136/616 [2:08:34<7:26:39, 55.83s/it]
22%|βββ | 137/616 [2:09:29<7:23:29, 55.55s/it]
{'loss': 1.8364, 'learning_rate': 1.8133253428843524e-05, 'epoch': 1.78} |
|
22%|βββ | 137/616 [2:09:29<7:23:29, 55.55s/it]
22%|βββ | 138/616 [2:10:25<7:21:53, 55.47s/it]
{'loss': 1.8013, 'learning_rate': 1.810252441968901e-05, 'epoch': 1.79} |
|
22%|βββ | 138/616 [2:10:25<7:21:53, 55.47s/it]
23%|βββ | 139/616 [2:11:20<7:21:37, 55.55s/it]
{'loss': 1.8203, 'learning_rate': 1.8071571037619856e-05, 'epoch': 1.81} |
|
23%|βββ | 139/616 [2:11:20<7:21:37, 55.55s/it]
23%|βββ | 140/616 [2:12:17<7:22:24, 55.77s/it]
{'loss': 1.7729, 'learning_rate': 1.804039413978875e-05, 'epoch': 1.82} |
|
23%|βββ | 140/616 [2:12:17<7:22:24, 55.77s/it]
23%|βββ | 141/616 [2:13:12<7:20:17, 55.62s/it]
{'loss': 1.8491, 'learning_rate': 1.8008994589537913e-05, 'epoch': 1.83} |
|
23%|βββ | 141/616 [2:13:12<7:20:17, 55.62s/it]
23%|βββ | 142/616 [2:14:08<7:20:03, 55.70s/it]
{'loss': 1.7998, 'learning_rate': 1.7977373256375194e-05, 'epoch': 1.84} |
|
23%|βββ | 142/616 [2:14:08<7:20:03, 55.70s/it]
23%|βββ | 143/616 [2:15:03<7:18:17, 55.60s/it]
{'loss': 1.8364, 'learning_rate': 1.7945531015950008e-05, 'epoch': 1.86} |
|
23%|βββ | 143/616 [2:15:03<7:18:17, 55.60s/it]
23%|βββ | 144/616 [2:16:00<7:19:32, 55.87s/it]
{'loss': 1.8125, 'learning_rate': 1.791346875002905e-05, 'epoch': 1.87} |
|
23%|βββ | 144/616 [2:16:00<7:19:32, 55.87s/it]
24%|βββ | 145/616 [2:16:56<7:20:03, 56.06s/it]
{'loss': 1.832, 'learning_rate': 1.7881187346471924e-05, 'epoch': 1.88} |
|
24%|βββ | 145/616 [2:16:56<7:20:03, 56.06s/it]
24%|βββ | 146/616 [2:17:52<7:19:22, 56.09s/it]
{'loss': 1.8271, 'learning_rate': 1.784868769920653e-05, 'epoch': 1.9} |
|
24%|βββ | 146/616 [2:17:52<7:19:22, 56.09s/it]
24%|βββ | 147/616 [2:18:48<7:18:03, 56.04s/it]
{'loss': 1.7959, 'learning_rate': 1.7815970708204296e-05, 'epoch': 1.91} |
|
24%|βββ | 147/616 [2:18:48<7:18:03, 56.04s/it]
24%|βββ | 148/616 [2:19:44<7:16:24, 55.95s/it]
{'loss': 1.7798, 'learning_rate': 1.77830372794553e-05, 'epoch': 1.92} |
|
24%|βββ | 148/616 [2:19:44<7:16:24, 55.95s/it]
24%|βββ | 149/616 [2:20:39<7:14:15, 55.79s/it]
{'loss': 1.7651, 'learning_rate': 1.774988832494314e-05, 'epoch': 1.94} |
|
24%|βββ | 149/616 [2:20:39<7:14:15, 55.79s/it]
24%|βββ | 150/616 [2:21:34<7:11:42, 55.58s/it]
{'loss': 1.8076, 'learning_rate': 1.7716524762619695e-05, 'epoch': 1.95} |
|
24%|βββ | 150/616 [2:21:34<7:11:42, 55.58s/it]
25%|βββ | 151/616 [2:22:30<7:09:34, 55.43s/it]
{'loss': 1.8379, 'learning_rate': 1.7682947516379706e-05, 'epoch': 1.96} |
|
25%|βββ | 151/616 [2:22:30<7:09:34, 55.43s/it]
25%|βββ | 152/616 [2:23:24<7:06:59, 55.21s/it]
{'loss': 1.8228, 'learning_rate': 1.7649157516035205e-05, 'epoch': 1.97} |
|
25%|βββ | 152/616 [2:23:24<7:06:59, 55.21s/it]
25%|βββ | 153/616 [2:24:20<7:06:55, 55.32s/it]
{'loss': 1.7783, 'learning_rate': 1.7615155697289734e-05, 'epoch': 1.99} |
|
25%|βββ | 153/616 [2:24:20<7:06:55, 55.32s/it]
25%|βββ | 154/616 [2:25:16<7:07:25, 55.51s/it]
{'loss': 1.8188, 'learning_rate': 1.7580943001712457e-05, 'epoch': 2.0} |
|
25%|βββ | 154/616 [2:25:16<7:07:25, 55.51s/it]
25%|βββ | 155/616 [2:26:40<8:11:56, 64.03s/it]
{'loss': 1.7974, 'learning_rate': 1.7546520376712093e-05, 'epoch': 2.01} |
|
25%|βββ | 155/616 [2:26:40<8:11:56, 64.03s/it]
25%|βββ | 156/616 [2:27:36<7:52:12, 61.59s/it]
{'loss': 1.7964, 'learning_rate': 1.7511888775510662e-05, 'epoch': 2.03} |
|
25%|βββ | 156/616 [2:27:36<7:52:12, 61.59s/it]
25%|βββ | 157/616 [2:28:31<7:36:15, 59.64s/it]
{'loss': 1.7515, 'learning_rate': 1.7477049157117093e-05, 'epoch': 2.04} |
|
25%|βββ | 157/616 [2:28:31<7:36:15, 59.64s/it]
26%|βββ | 158/616 [2:29:26<7:25:42, 58.39s/it]
{'loss': 1.7725, 'learning_rate': 1.744200248630068e-05, 'epoch': 2.05} |
|
26%|βββ | 158/616 [2:29:26<7:25:42, 58.39s/it]
26%|βββ | 159/616 [2:30:22<7:18:37, 57.59s/it]
{'loss': 1.7534, 'learning_rate': 1.7406749733564344e-05, 'epoch': 2.06} |
|
26%|βββ | 159/616 [2:30:22<7:18:37, 57.59s/it]
26%|βββ | 160/616 [2:31:18<7:13:47, 57.08s/it]
{'loss': 1.8408, 'learning_rate': 1.737129187511779e-05, 'epoch': 2.08} |
|
26%|βββ | 160/616 [2:31:18<7:13:47, 57.08s/it]
26%|βββ | 161/616 [2:32:13<7:09:24, 56.63s/it]
{'loss': 1.7686, 'learning_rate': 1.7335629892850436e-05, 'epoch': 2.09} |
|
26%|βββ | 161/616 [2:32:13<7:09:24, 56.63s/it]
26%|βββ | 162/616 [2:33:10<7:08:30, 56.63s/it]
{'loss': 1.7642, 'learning_rate': 1.729976477430425e-05, 'epoch': 2.1} |
|
26%|βββ | 162/616 [2:33:10<7:08:30, 56.63s/it]
26%|βββ | 163/616 [2:34:06<7:05:21, 56.34s/it]
{'loss': 1.8047, 'learning_rate': 1.7263697512646397e-05, 'epoch': 2.12} |
|
26%|βββ | 163/616 [2:34:06<7:05:21, 56.34s/it]
27%|βββ | 164/616 [2:35:02<7:03:59, 56.28s/it]
{'loss': 1.8301, 'learning_rate': 1.7227429106641726e-05, 'epoch': 2.13} |
|
27%|βββ | 164/616 [2:35:02<7:03:59, 56.28s/it]
27%|βββ | 165/616 [2:35:58<7:02:14, 56.17s/it]
{'loss': 1.7588, 'learning_rate': 1.7190960560625127e-05, 'epoch': 2.14} |
|
27%|βββ | 165/616 [2:35:58<7:02:14, 56.17s/it]
27%|βββ | 166/616 [2:36:53<6:59:31, 55.94s/it]
{'loss': 1.7749, 'learning_rate': 1.7154292884473712e-05, 'epoch': 2.16} |
|
27%|βββ | 166/616 [2:36:53<6:59:31, 55.94s/it]
27%|βββ | 167/616 [2:37:49<6:58:57, 55.99s/it]
{'loss': 1.7251, 'learning_rate': 1.711742709357886e-05, 'epoch': 2.17} |
|
27%|βββ | 167/616 [2:37:49<6:58:57, 55.99s/it]
27%|βββ | 168/616 [2:38:44<6:56:05, 55.73s/it]
{'loss': 1.7603, 'learning_rate': 1.708036420881807e-05, 'epoch': 2.18} |
|
27%|βββ | 168/616 [2:38:44<6:56:05, 55.73s/it]
27%|βββ | 169/616 [2:39:41<6:56:55, 55.96s/it]
{'loss': 1.7339, 'learning_rate': 1.7043105256526723e-05, 'epoch': 2.19} |
|
27%|βββ | 169/616 [2:39:41<6:56:55, 55.96s/it]
28%|βββ | 170/616 [2:40:38<6:58:19, 56.28s/it]
{'loss': 1.731, 'learning_rate': 1.7005651268469652e-05, 'epoch': 2.21} |
|
28%|βββ | 170/616 [2:40:38<6:58:19, 56.28s/it]
28%|βββ | 171/616 [2:41:33<6:54:13, 55.85s/it]
{'loss': 1.7598, 'learning_rate': 1.6968003281812563e-05, 'epoch': 2.22} |
|
28%|βββ | 171/616 [2:41:33<6:54:13, 55.85s/it]
28%|βββ | 172/616 [2:42:29<6:53:17, 55.85s/it]
{'loss': 1.7007, 'learning_rate': 1.693016233909332e-05, 'epoch': 2.23} |
|
28%|βββ | 172/616 [2:42:29<6:53:17, 55.85s/it]
28%|βββ | 173/616 [2:43:24<6:51:59, 55.80s/it]
{'loss': 1.7183, 'learning_rate': 1.689212948819307e-05, 'epoch': 2.25} |
|
28%|βββ | 173/616 [2:43:24<6:51:59, 55.80s/it]
28%|βββ | 174/616 [2:44:18<6:47:09, 55.27s/it]
{'loss': 1.7173, 'learning_rate': 1.6853905782307235e-05, 'epoch': 2.26} |
|
28%|βββ | 174/616 [2:44:18<6:47:09, 55.27s/it]
28%|βββ | 175/616 [2:45:16<6:51:57, 56.05s/it]
{'loss': 1.7856, 'learning_rate': 1.681549227991634e-05, 'epoch': 2.27} |
|
28%|βββ | 175/616 [2:45:16<6:51:57, 56.05s/it]
29%|βββ | 176/616 [2:46:11<6:48:50, 55.75s/it]
{'loss': 1.7329, 'learning_rate': 1.67768900447567e-05, 'epoch': 2.29} |
|
29%|βββ | 176/616 [2:46:11<6:48:50, 55.75s/it]
29%|βββ | 177/616 [2:47:07<6:46:59, 55.63s/it]
{'loss': 1.7578, 'learning_rate': 1.6738100145790977e-05, 'epoch': 2.3} |
|
29%|βββ | 177/616 [2:47:07<6:46:59, 55.63s/it]
29%|βββ | 178/616 [2:48:03<6:46:51, 55.73s/it]
{'loss': 1.6846, 'learning_rate': 1.6699123657178553e-05, 'epoch': 2.31} |
|
29%|βββ | 178/616 [2:48:03<6:46:51, 55.73s/it]
29%|βββ | 179/616 [2:48:57<6:43:53, 55.45s/it]
{'loss': 1.791, 'learning_rate': 1.6659961658245813e-05, 'epoch': 2.32} |
|
29%|βββ | 179/616 [2:48:57<6:43:53, 55.45s/it]
29%|βββ | 180/616 [2:49:53<6:43:27, 55.52s/it]
{'loss': 1.7798, 'learning_rate': 1.6620615233456235e-05, 'epoch': 2.34} |
|
29%|βββ | 180/616 [2:49:53<6:43:27, 55.52s/it]
29%|βββ | 181/616 [2:50:49<6:43:06, 55.60s/it]
{'loss': 1.6987, 'learning_rate': 1.658108547238038e-05, 'epoch': 2.35} |
|
29%|βββ | 181/616 [2:50:49<6:43:06, 55.60s/it]
30%|βββ | 182/616 [2:51:45<6:42:48, 55.69s/it]
{'loss': 1.7202, 'learning_rate': 1.6541373469665688e-05, 'epoch': 2.36} |
|
30%|βββ | 182/616 [2:51:45<6:42:48, 55.69s/it]
30%|βββ | 183/616 [2:52:40<6:40:16, 55.46s/it]
{'loss': 1.7285, 'learning_rate': 1.6501480325006206e-05, 'epoch': 2.38} |
|
30%|βββ | 183/616 [2:52:40<6:40:16, 55.46s/it]
30%|βββ | 184/616 [2:53:35<6:38:17, 55.32s/it]
{'loss': 1.7417, 'learning_rate': 1.64614071431121e-05, 'epoch': 2.39} |
|
30%|βββ | 184/616 [2:53:35<6:38:17, 55.32s/it]
30%|βββ | 185/616 [2:54:31<6:38:58, 55.54s/it]
{'loss': 1.79, 'learning_rate': 1.6421155033679085e-05, 'epoch': 2.4} |
|
30%|βββ | 185/616 [2:54:31<6:38:58, 55.54s/it]
30%|βββ | 186/616 [2:55:27<6:38:52, 55.66s/it]
{'loss': 1.7876, 'learning_rate': 1.6380725111357693e-05, 'epoch': 2.42} |
|
30%|βββ | 186/616 [2:55:27<6:38:52, 55.66s/it]
30%|βββ | 187/616 [2:56:23<6:39:32, 55.88s/it]
{'loss': 1.7734, 'learning_rate': 1.634011849572239e-05, 'epoch': 2.43} |
|
30%|βββ | 187/616 [2:56:23<6:39:32, 55.88s/it]
31%|βββ | 188/616 [2:57:18<6:37:16, 55.69s/it]
{'loss': 1.7686, 'learning_rate': 1.6299336311240593e-05, 'epoch': 2.44} |
|
31%|βββ | 188/616 [2:57:18<6:37:16, 55.69s/it]
31%|βββ | 189/616 [2:58:15<6:38:07, 55.94s/it]
{'loss': 1.7993, 'learning_rate': 1.6258379687241533e-05, 'epoch': 2.45} |
|
31%|βββ | 189/616 [2:58:15<6:38:07, 55.94s/it]
31%|βββ | 190/616 [2:59:09<6:34:19, 55.54s/it]
{'loss': 1.708, 'learning_rate': 1.6217249757884954e-05, 'epoch': 2.47} |
|
31%|βββ | 190/616 [2:59:09<6:34:19, 55.54s/it]
31%|βββ | 191/616 [3:00:05<6:33:15, 55.52s/it]
{'loss': 1.7065, 'learning_rate': 1.6175947662129735e-05, 'epoch': 2.48} |
|
31%|βββ | 191/616 [3:00:05<6:33:15, 55.52s/it]
31%|βββ | 192/616 [3:01:00<6:32:25, 55.53s/it]
{'loss': 1.7324, 'learning_rate': 1.6134474543702353e-05, 'epoch': 2.49} |
|
31%|βββ | 192/616 [3:01:00<6:32:25, 55.53s/it]
31%|ββββ | 193/616 [3:01:56<6:31:58, 55.60s/it]
{'loss': 1.7686, 'learning_rate': 1.609283155106517e-05, 'epoch': 2.51} |
|
31%|ββββ | 193/616 [3:01:56<6:31:58, 55.60s/it]
31%|ββββ | 194/616 [3:02:51<6:30:30, 55.52s/it]
{'loss': 1.7563, 'learning_rate': 1.605101983738468e-05, 'epoch': 2.52} |
|
31%|ββββ | 194/616 [3:02:51<6:30:30, 55.52s/it]
32%|ββββ | 195/616 [3:03:48<6:31:31, 55.80s/it]
{'loss': 1.7373, 'learning_rate': 1.6009040560499548e-05, 'epoch': 2.53} |
|
32%|ββββ | 195/616 [3:03:48<6:31:31, 55.80s/it]
32%|ββββ | 196/616 [3:04:44<6:32:05, 56.01s/it]
{'loss': 1.7104, 'learning_rate': 1.596689488288856e-05, 'epoch': 2.55} |
|
32%|ββββ | 196/616 [3:04:44<6:32:05, 56.01s/it]
32%|ββββ | 197/616 [3:05:40<6:29:58, 55.84s/it]
{'loss': 1.7368, 'learning_rate': 1.5924583971638416e-05, 'epoch': 2.56} |
|
32%|ββββ | 197/616 [3:05:40<6:29:58, 55.84s/it]
32%|ββββ | 198/616 [3:06:36<6:30:01, 55.99s/it]
{'loss': 1.7886, 'learning_rate': 1.5882108998411427e-05, 'epoch': 2.57} |
|
32%|ββββ | 198/616 [3:06:36<6:30:01, 55.99s/it]
32%|ββββ | 199/616 [3:07:32<6:28:20, 55.88s/it]
{'loss': 1.6855, 'learning_rate': 1.5839471139413065e-05, 'epoch': 2.58} |
|
32%|ββββ | 199/616 [3:07:32<6:28:20, 55.88s/it]
32%|ββββ | 200/616 [3:08:27<6:25:31, 55.60s/it]
{'loss': 1.7158, 'learning_rate': 1.5796671575359382e-05, 'epoch': 2.6} |
|
32%|ββββ | 200/616 [3:08:27<6:25:31, 55.60s/it]
33%|ββββ | 201/616 [3:10:31<8:46:36, 76.14s/it]
{'loss': 1.7144, 'learning_rate': 1.5753711491444336e-05, 'epoch': 2.61} |
|
33%|ββββ | 201/616 [3:10:31<8:46:36, 76.14s/it]
33%|ββββ | 202/616 [3:11:27<8:03:20, 70.05s/it]
{'loss': 1.6909, 'learning_rate': 1.571059207730695e-05, 'epoch': 2.62} |
|
33%|ββββ | 202/616 [3:11:27<8:03:20, 70.05s/it]
33%|ββββ | 203/616 [3:12:23<7:33:14, 65.85s/it]
{'loss': 1.8003, 'learning_rate': 1.5667314526998373e-05, 'epoch': 2.64} |
|
33%|ββββ | 203/616 [3:12:23<7:33:14, 65.85s/it]
33%|ββββ | 204/616 [3:13:19<7:11:50, 62.89s/it]
{'loss': 1.7231, 'learning_rate': 1.5623880038948828e-05, 'epoch': 2.65} |
|
33%|ββββ | 204/616 [3:13:19<7:11:50, 62.89s/it]
33%|ββββ | 205/616 [3:14:14<6:55:21, 60.64s/it]
{'loss': 1.6816, 'learning_rate': 1.55802898159344e-05, 'epoch': 2.66} |
|
33%|ββββ | 205/616 [3:14:14<6:55:21, 60.64s/it]
33%|ββββ | 206/616 [3:15:10<6:43:56, 59.11s/it]
{'loss': 1.6826, 'learning_rate': 1.553654506504377e-05, 'epoch': 2.68} |
|
33%|ββββ | 206/616 [3:15:10<6:43:56, 59.11s/it]
34%|ββββ | 207/616 [3:16:06<6:36:32, 58.17s/it]
{'loss': 1.7085, 'learning_rate': 1.5492646997644737e-05, 'epoch': 2.69} |
|
34%|ββββ | 207/616 [3:16:06<6:36:32, 58.17s/it]
34%|ββββ | 208/616 [3:17:01<6:29:54, 57.34s/it]
{'loss': 1.6797, 'learning_rate': 1.5448596829350706e-05, 'epoch': 2.7} |
|
34%|ββββ | 208/616 [3:17:01<6:29:54, 57.34s/it]
34%|ββββ | 209/616 [3:17:56<6:24:38, 56.70s/it]
{'loss': 1.708, 'learning_rate': 1.540439577998703e-05, 'epoch': 2.71} |
|
34%|ββββ | 209/616 [3:17:56<6:24:38, 56.70s/it]
34%|ββββ | 210/616 [3:18:51<6:20:13, 56.19s/it]
{'loss': 1.7036, 'learning_rate': 1.5360045073557214e-05, 'epoch': 2.73} |
|
34%|ββββ | 210/616 [3:18:51<6:20:13, 56.19s/it]
34%|ββββ | 211/616 [3:19:47<6:17:35, 55.94s/it]
{'loss': 1.7129, 'learning_rate': 1.5315545938209016e-05, 'epoch': 2.74} |
|
34%|ββββ | 211/616 [3:19:47<6:17:35, 55.94s/it]
34%|ββββ | 212/616 [3:20:42<6:15:56, 55.83s/it]
{'loss': 1.6855, 'learning_rate': 1.527089960620046e-05, 'epoch': 2.75} |
|
34%|ββββ | 212/616 [3:20:42<6:15:56, 55.83s/it]
35%|ββββ | 213/616 [3:21:37<6:12:54, 55.52s/it]
{'loss': 1.645, 'learning_rate': 1.5226107313865701e-05, 'epoch': 2.77} |
|
35%|ββββ | 213/616 [3:21:37<6:12:54, 55.52s/it]
35%|ββββ | 214/616 [3:22:32<6:11:06, 55.39s/it]
{'loss': 1.6982, 'learning_rate': 1.5181170301580776e-05, 'epoch': 2.78} |
|
35%|ββββ | 214/616 [3:22:32<6:11:06, 55.39s/it]
35%|ββββ | 215/616 [3:23:27<6:09:14, 55.25s/it]
{'loss': 1.731, 'learning_rate': 1.5136089813729276e-05, 'epoch': 2.79} |
|
35%|ββββ | 215/616 [3:23:27<6:09:14, 55.25s/it]
35%|ββββ | 216/616 [3:24:22<6:08:42, 55.31s/it]
{'loss': 1.7192, 'learning_rate': 1.509086709866788e-05, 'epoch': 2.81} |
|
35%|ββββ | 216/616 [3:24:22<6:08:42, 55.31s/it]
35%|ββββ | 217/616 [3:25:18<6:09:08, 55.51s/it]
{'loss': 1.6982, 'learning_rate': 1.5045503408691776e-05, 'epoch': 2.82} |
|
35%|ββββ | 217/616 [3:25:18<6:09:08, 55.51s/it]
35%|ββββ | 218/616 [3:26:15<6:10:32, 55.86s/it]
{'loss': 1.7266, 'learning_rate': 1.5000000000000002e-05, 'epoch': 2.83} |
|
35%|ββββ | 218/616 [3:26:15<6:10:32, 55.86s/it]
36%|ββββ | 219/616 [3:27:11<6:08:45, 55.73s/it]
{'loss': 1.6958, 'learning_rate': 1.495435813266064e-05, 'epoch': 2.84} |
|
36%|ββββ | 219/616 [3:27:11<6:08:45, 55.73s/it]
36%|ββββ | 220/616 [3:28:06<6:07:56, 55.75s/it]
{'loss': 1.7056, 'learning_rate': 1.4908579070575936e-05, 'epoch': 2.86} |
|
36%|ββββ | 220/616 [3:28:06<6:07:56, 55.75s/it]
36%|ββββ | 221/616 [3:29:02<6:07:44, 55.86s/it]
{'loss': 1.6943, 'learning_rate': 1.4862664081447297e-05, 'epoch': 2.87} |
|
36%|ββββ | 221/616 [3:29:02<6:07:44, 55.86s/it]
36%|ββββ | 222/616 [3:29:57<6:04:46, 55.55s/it]
{'loss': 1.6724, 'learning_rate': 1.4816614436740184e-05, 'epoch': 2.88} |
|
36%|ββββ | 222/616 [3:29:57<6:04:46, 55.55s/it]
36%|ββββ | 223/616 [3:30:52<6:02:26, 55.34s/it]
{'loss': 1.6641, 'learning_rate': 1.4770431411648898e-05, 'epoch': 2.9} |
|
36%|ββββ | 223/616 [3:30:52<6:02:26, 55.34s/it]
36%|ββββ | 224/616 [3:31:48<6:02:46, 55.53s/it]
{'loss': 1.7461, 'learning_rate': 1.4724116285061278e-05, 'epoch': 2.91} |
|
36%|ββββ | 224/616 [3:31:48<6:02:46, 55.53s/it]
37%|ββββ | 225/616 [3:32:43<5:59:56, 55.23s/it]
{'loss': 1.7207, 'learning_rate': 1.4677670339523285e-05, 'epoch': 2.92} |
|
37%|ββββ | 225/616 [3:32:43<5:59:56, 55.23s/it]
37%|ββββ | 226/616 [3:33:39<6:02:09, 55.72s/it]
{'loss': 1.7061, 'learning_rate': 1.4631094861203478e-05, 'epoch': 2.94} |
|
37%|ββββ | 226/616 [3:33:39<6:02:09, 55.72s/it]
37%|ββββ | 227/616 [3:34:35<6:00:28, 55.60s/it]
{'loss': 1.6758, 'learning_rate': 1.4584391139857407e-05, 'epoch': 2.95} |
|
37%|ββββ | 227/616 [3:34:35<6:00:28, 55.60s/it]
37%|ββββ | 228/616 [3:35:31<6:00:26, 55.74s/it]
{'loss': 1.73, 'learning_rate': 1.4537560468791889e-05, 'epoch': 2.96} |
|
37%|ββββ | 228/616 [3:35:31<6:00:26, 55.74s/it]
37%|ββββ | 229/616 [3:36:26<5:57:53, 55.49s/it]
{'loss': 1.7314, 'learning_rate': 1.4490604144829204e-05, 'epoch': 2.97} |
|
37%|ββββ | 229/616 [3:36:26<5:57:53, 55.49s/it]
37%|ββββ | 230/616 [3:37:21<5:56:16, 55.38s/it]
{'loss': 1.7114, 'learning_rate': 1.4443523468271168e-05, 'epoch': 2.99} |
|
37%|ββββ | 230/616 [3:37:21<5:56:16, 55.38s/it]
38%|ββββ | 231/616 [3:38:18<5:58:35, 55.89s/it]
{'loss': 1.7212, 'learning_rate': 1.4396319742863145e-05, 'epoch': 3.0} |
|
38%|ββββ | 231/616 [3:38:18<5:58:35, 55.89s/it]
38%|ββββ | 232/616 [3:39:42<6:51:47, 64.34s/it]
{'loss': 1.7036, 'learning_rate': 1.4348994275757933e-05, 'epoch': 3.01} |
|
38%|ββββ | 232/616 [3:39:42<6:51:47, 64.34s/it]
38%|ββββ | 233/616 [3:40:38<6:34:52, 61.86s/it]
{'loss': 1.71, 'learning_rate': 1.4301548377479562e-05, 'epoch': 3.03} |
|
38%|ββββ | 233/616 [3:40:38<6:34:52, 61.86s/it]
38%|ββββ | 234/616 [3:41:33<6:20:43, 59.80s/it]
{'loss': 1.7432, 'learning_rate': 1.4253983361887017e-05, 'epoch': 3.04} |
|
38%|ββββ | 234/616 [3:41:33<6:20:43, 59.80s/it]
38%|ββββ | 235/616 [3:42:29<6:12:23, 58.65s/it]
{'loss': 1.6992, 'learning_rate': 1.4206300546137844e-05, 'epoch': 3.05} |
|
38%|ββββ | 235/616 [3:42:29<6:12:23, 58.65s/it]
38%|ββββ | 236/616 [3:43:24<6:05:20, 57.69s/it]
{'loss': 1.7271, 'learning_rate': 1.415850125065168e-05, 'epoch': 3.06} |
|
38%|ββββ | 236/616 [3:43:24<6:05:20, 57.69s/it]
38%|ββββ | 237/616 [3:44:19<5:59:01, 56.84s/it]
{'loss': 1.6792, 'learning_rate': 1.4110586799073684e-05, 'epoch': 3.08} |
|
38%|ββββ | 237/616 [3:44:19<5:59:01, 56.84s/it]
39%|ββββ | 238/616 [3:45:15<5:56:01, 56.51s/it]
{'loss': 1.73, 'learning_rate': 1.4062558518237893e-05, 'epoch': 3.09} |
|
39%|ββββ | 238/616 [3:45:15<5:56:01, 56.51s/it]
39%|ββββ | 239/616 [3:46:11<5:53:55, 56.33s/it]
{'loss': 1.7192, 'learning_rate': 1.4014417738130464e-05, 'epoch': 3.1} |
|
39%|ββββ | 239/616 [3:46:11<5:53:55, 56.33s/it]
39%|ββββ | 240/616 [3:47:06<5:50:00, 55.85s/it]
{'loss': 1.7476, 'learning_rate': 1.3966165791852862e-05, 'epoch': 3.12} |
|
39%|ββββ | 240/616 [3:47:06<5:50:00, 55.85s/it]
39%|ββββ | 241/616 [3:48:02<5:49:47, 55.97s/it]
{'loss': 1.6958, 'learning_rate': 1.3917804015584932e-05, 'epoch': 3.13} |
|
39%|ββββ | 241/616 [3:48:02<5:49:47, 55.97s/it]
39%|ββββ | 242/616 [3:48:57<5:47:38, 55.77s/it]
{'loss': 1.6865, 'learning_rate': 1.3869333748547901e-05, 'epoch': 3.14} |
|
39%|ββββ | 242/616 [3:48:57<5:47:38, 55.77s/it]
39%|ββββ | 243/616 [3:49:53<5:46:15, 55.70s/it]
{'loss': 1.668, 'learning_rate': 1.3820756332967294e-05, 'epoch': 3.16} |
|
39%|ββββ | 243/616 [3:49:53<5:46:15, 55.70s/it]
40%|ββββ | 244/616 [3:50:48<5:44:15, 55.53s/it]
{'loss': 1.6826, 'learning_rate': 1.3772073114035762e-05, 'epoch': 3.17} |
|
40%|ββββ | 244/616 [3:50:48<5:44:15, 55.53s/it]
40%|ββββ | 245/616 [3:51:43<5:42:32, 55.40s/it]
{'loss': 1.7227, 'learning_rate': 1.3723285439875836e-05, 'epoch': 3.18} |
|
40%|ββββ | 245/616 [3:51:43<5:42:32, 55.40s/it]
40%|ββββ | 246/616 [3:52:39<5:41:59, 55.46s/it]
{'loss': 1.7163, 'learning_rate': 1.3674394661502595e-05, 'epoch': 3.19} |
|
40%|ββββ | 246/616 [3:52:39<5:41:59, 55.46s/it]
40%|ββββ | 247/616 [3:53:35<5:42:19, 55.66s/it]
{'loss': 1.6606, 'learning_rate': 1.3625402132786247e-05, 'epoch': 3.21} |
|
40%|ββββ | 247/616 [3:53:35<5:42:19, 55.66s/it]
40%|ββββ | 248/616 [3:54:31<5:42:14, 55.80s/it]
{'loss': 1.7085, 'learning_rate': 1.3576309210414646e-05, 'epoch': 3.22} |
|
40%|ββββ | 248/616 [3:54:31<5:42:14, 55.80s/it]
40%|ββββ | 249/616 [3:55:26<5:40:19, 55.64s/it]
{'loss': 1.668, 'learning_rate': 1.352711725385572e-05, 'epoch': 3.23} |
|
40%|ββββ | 249/616 [3:55:26<5:40:19, 55.64s/it]
41%|ββββ | 250/616 [3:56:22<5:39:13, 55.61s/it]
{'loss': 1.7173, 'learning_rate': 1.3477827625319826e-05, 'epoch': 3.25} |
|
41%|ββββ | 250/616 [3:56:22<5:39:13, 55.61s/it]
41%|ββββ | 251/616 [3:57:17<5:38:23, 55.63s/it]
{'loss': 1.7656, 'learning_rate': 1.3428441689722023e-05, 'epoch': 3.26} |
|
41%|ββββ | 251/616 [3:57:17<5:38:23, 55.63s/it]
41%|ββββ | 252/616 [3:58:14<5:38:25, 55.78s/it]
{'loss': 1.6812, 'learning_rate': 1.3378960814644283e-05, 'epoch': 3.27} |
|
41%|ββββ | 252/616 [3:58:14<5:38:25, 55.78s/it]
41%|ββββ | 253/616 [3:59:09<5:36:11, 55.57s/it]
{'loss': 1.6953, 'learning_rate': 1.3329386370297615e-05, 'epoch': 3.29} |
|
41%|ββββ | 253/616 [3:59:09<5:36:11, 55.57s/it]
41%|ββββ | 254/616 [4:00:04<5:35:02, 55.53s/it]
{'loss': 1.665, 'learning_rate': 1.3279719729484117e-05, 'epoch': 3.3} |
|
41%|ββββ | 254/616 [4:00:04<5:35:02, 55.53s/it]
41%|βββββ | 255/616 [4:00:59<5:33:43, 55.47s/it]
{'loss': 1.6587, 'learning_rate': 1.3229962267558982e-05, 'epoch': 3.31} |
|
41%|βββββ | 255/616 [4:00:59<5:33:43, 55.47s/it]
42%|βββββ | 256/616 [4:01:55<5:33:39, 55.61s/it]
{'loss': 1.6797, 'learning_rate': 1.3180115362392383e-05, 'epoch': 3.32} |
|
42%|βββββ | 256/616 [4:01:55<5:33:39, 55.61s/it]
42%|βββββ | 257/616 [4:02:51<5:32:48, 55.62s/it]
{'loss': 1.6992, 'learning_rate': 1.3130180394331335e-05, 'epoch': 3.34} |
|
42%|βββββ | 257/616 [4:02:51<5:32:48, 55.62s/it]
42%|βββββ | 258/616 [4:03:47<5:32:16, 55.69s/it]
{'loss': 1.6567, 'learning_rate': 1.3080158746161468e-05, 'epoch': 3.35} |
|
42%|βββββ | 258/616 [4:03:47<5:32:16, 55.69s/it]
42%|βββββ | 259/616 [4:04:42<5:31:01, 55.63s/it]
{'loss': 1.6641, 'learning_rate': 1.3030051803068729e-05, 'epoch': 3.36} |
|
42%|βββββ | 259/616 [4:04:42<5:31:01, 55.63s/it]
42%|βββββ | 260/616 [4:05:39<5:31:17, 55.84s/it]
{'loss': 1.6841, 'learning_rate': 1.2979860952601038e-05, 'epoch': 3.38} |
|
42%|βββββ | 260/616 [4:05:39<5:31:17, 55.84s/it]
42%|βββββ | 261/616 [4:06:33<5:28:37, 55.54s/it]
{'loss': 1.6777, 'learning_rate': 1.2929587584629845e-05, 'epoch': 3.39} |
|
42%|βββββ | 261/616 [4:06:33<5:28:37, 55.54s/it]
43%|βββββ | 262/616 [4:07:30<5:29:36, 55.87s/it]
{'loss': 1.7065, 'learning_rate': 1.2879233091311667e-05, 'epoch': 3.4} |
|
43%|βββββ | 262/616 [4:07:30<5:29:36, 55.87s/it]
43%|βββββ | 263/616 [4:08:26<5:28:11, 55.78s/it]
{'loss': 1.6997, 'learning_rate': 1.2828798867049504e-05, 'epoch': 3.42} |
|
43%|βββββ | 263/616 [4:08:26<5:28:11, 55.78s/it]
43%|βββββ | 264/616 [4:09:21<5:27:20, 55.80s/it]
{'loss': 1.6704, 'learning_rate': 1.2778286308454255e-05, 'epoch': 3.43} |
|
43%|βββββ | 264/616 [4:09:21<5:27:20, 55.80s/it]
43%|βββββ | 265/616 [4:10:16<5:24:37, 55.49s/it]
{'loss': 1.6489, 'learning_rate': 1.2727696814306034e-05, 'epoch': 3.44} |
|
43%|βββββ | 265/616 [4:10:16<5:24:37, 55.49s/it]
43%|βββββ | 266/616 [4:11:12<5:23:30, 55.46s/it]
{'loss': 1.6777, 'learning_rate': 1.2677031785515423e-05, 'epoch': 3.45} |
|
43%|βββββ | 266/616 [4:11:12<5:23:30, 55.46s/it]
43%|βββββ | 267/616 [4:12:07<5:22:50, 55.50s/it]
{'loss': 1.6284, 'learning_rate': 1.26262926250847e-05, 'epoch': 3.47} |
|
43%|βββββ | 267/616 [4:12:07<5:22:50, 55.50s/it]
44%|βββββ | 268/616 [4:13:03<5:21:36, 55.45s/it]
{'loss': 1.6445, 'learning_rate': 1.2575480738068971e-05, 'epoch': 3.48} |
|
44%|βββββ | 268/616 [4:13:03<5:21:36, 55.45s/it]
44%|βββββ | 269/616 [4:13:58<5:20:21, 55.39s/it]
{'loss': 1.626, 'learning_rate': 1.2524597531537261e-05, 'epoch': 3.49} |
|
44%|βββββ | 269/616 [4:13:58<5:20:21, 55.39s/it]
44%|βββββ | 270/616 [4:14:54<5:19:56, 55.48s/it]
{'loss': 1.626, 'learning_rate': 1.2473644414533573e-05, 'epoch': 3.51} |
|
44%|βββββ | 270/616 [4:14:54<5:19:56, 55.48s/it]
44%|βββββ | 271/616 [4:15:50<5:20:41, 55.77s/it]
{'loss': 1.6919, 'learning_rate': 1.2422622798037833e-05, 'epoch': 3.52} |
|
44%|βββββ | 271/616 [4:15:50<5:20:41, 55.77s/it]
44%|βββββ | 272/616 [4:16:46<5:20:14, 55.86s/it]
{'loss': 1.6602, 'learning_rate': 1.2371534094926852e-05, 'epoch': 3.53} |
|
44%|βββββ | 272/616 [4:16:46<5:20:14, 55.86s/it]
44%|βββββ | 273/616 [4:17:42<5:18:58, 55.80s/it]
{'loss': 1.6401, 'learning_rate': 1.232037971993517e-05, 'epoch': 3.55} |
|
44%|βββββ | 273/616 [4:17:42<5:18:58, 55.80s/it]
44%|βββββ | 274/616 [4:18:36<5:16:22, 55.50s/it]
{'loss': 1.7026, 'learning_rate': 1.2269161089615902e-05, 'epoch': 3.56} |
|
44%|βββββ | 274/616 [4:18:37<5:16:22, 55.50s/it]
45%|βββββ | 275/616 [4:19:32<5:15:51, 55.58s/it]
{'loss': 1.6875, 'learning_rate': 1.2217879622301514e-05, 'epoch': 3.57} |
|
45%|βββββ | 275/616 [4:19:32<5:15:51, 55.58s/it]
45%|βββββ | 276/616 [4:20:27<5:14:12, 55.45s/it]
{'loss': 1.6646, 'learning_rate': 1.2166536738064523e-05, 'epoch': 3.58} |
|
45%|βββββ | 276/616 [4:20:27<5:14:12, 55.45s/it]
45%|βββββ | 277/616 [4:21:23<5:13:32, 55.49s/it]
{'loss': 1.6631, 'learning_rate': 1.2115133858678192e-05, 'epoch': 3.6} |
|
45%|βββββ | 277/616 [4:21:23<5:13:32, 55.49s/it]
45%|βββββ | 278/616 [4:22:19<5:13:43, 55.69s/it]
{'loss': 1.6196, 'learning_rate': 1.2063672407577154e-05, 'epoch': 3.61} |
|
45%|βββββ | 278/616 [4:22:19<5:13:43, 55.69s/it]
45%|βββββ | 279/616 [4:23:14<5:11:50, 55.52s/it]
{'loss': 1.6606, 'learning_rate': 1.2012153809817992e-05, 'epoch': 3.62} |
|
45%|βββββ | 279/616 [4:23:14<5:11:50, 55.52s/it]
45%|βββββ | 280/616 [4:24:10<5:11:51, 55.69s/it]
{'loss': 1.6719, 'learning_rate': 1.1960579492039783e-05, 'epoch': 3.64} |
|
45%|βββββ | 280/616 [4:24:10<5:11:51, 55.69s/it]
46%|βββββ | 281/616 [4:25:07<5:11:43, 55.83s/it]
{'loss': 1.6958, 'learning_rate': 1.1908950882424581e-05, 'epoch': 3.65} |
|
46%|βββββ | 281/616 [4:25:07<5:11:43, 55.83s/it]
46%|βββββ | 282/616 [4:26:03<5:12:04, 56.06s/it]
{'loss': 1.645, 'learning_rate': 1.1857269410657883e-05, 'epoch': 3.66} |
|
46%|βββββ | 282/616 [4:26:03<5:12:04, 56.06s/it]
46%|βββββ | 283/616 [4:27:01<5:13:38, 56.51s/it]
{'loss': 1.6782, 'learning_rate': 1.1805536507889021e-05, 'epoch': 3.68} |
|
46%|βββββ | 283/616 [4:27:01<5:13:38, 56.51s/it]
46%|βββββ | 284/616 [4:27:56<5:10:37, 56.14s/it]
{'loss': 1.6724, 'learning_rate': 1.1753753606691554e-05, 'epoch': 3.69} |
|
46%|βββββ | 284/616 [4:27:56<5:10:37, 56.14s/it]
46%|βββββ | 285/616 [4:28:52<5:09:53, 56.17s/it]
{'loss': 1.6108, 'learning_rate': 1.1701922141023566e-05, 'epoch': 3.7} |
|
46%|βββββ | 285/616 [4:28:52<5:09:53, 56.17s/it]
46%|βββββ | 286/616 [4:29:47<5:06:06, 55.66s/it]
{'loss': 1.6313, 'learning_rate': 1.1650043546187994e-05, 'epoch': 3.71} |
|
46%|βββββ | 286/616 [4:29:47<5:06:06, 55.66s/it]
47%|βββββ | 287/616 [4:30:42<5:05:23, 55.70s/it]
{'loss': 1.647, 'learning_rate': 1.1598119258792848e-05, 'epoch': 3.73} |
|
47%|βββββ | 287/616 [4:30:42<5:05:23, 55.70s/it]
47%|βββββ | 288/616 [4:31:38<5:04:18, 55.67s/it]
{'loss': 1.6816, 'learning_rate': 1.1546150716711448e-05, 'epoch': 3.74} |
|
47%|βββββ | 288/616 [4:31:38<5:04:18, 55.67s/it]
47%|βββββ | 289/616 [4:32:34<5:03:48, 55.74s/it]
{'loss': 1.6846, 'learning_rate': 1.1494139359042612e-05, 'epoch': 3.75} |
|
47%|βββββ | 289/616 [4:32:34<5:03:48, 55.74s/it]
47%|βββββ | 290/616 [4:33:30<5:04:10, 55.98s/it]
{'loss': 1.6602, 'learning_rate': 1.1442086626070781e-05, 'epoch': 3.77} |
|
47%|βββββ | 290/616 [4:33:30<5:04:10, 55.98s/it]
47%|βββββ | 291/616 [4:34:26<5:02:43, 55.89s/it]
{'loss': 1.6133, 'learning_rate': 1.1389993959226163e-05, 'epoch': 3.78} |
|
47%|βββββ | 291/616 [4:34:26<5:02:43, 55.89s/it]
47%|βββββ | 292/616 [4:35:22<5:01:18, 55.80s/it]
{'loss': 1.6997, 'learning_rate': 1.1337862801044792e-05, 'epoch': 3.79} |
|
47%|βββββ | 292/616 [4:35:22<5:01:18, 55.80s/it]
48%|βββββ | 293/616 [4:36:18<5:00:40, 55.85s/it]
{'loss': 1.6172, 'learning_rate': 1.1285694595128606e-05, 'epoch': 3.81} |
|
48%|βββββ | 293/616 [4:36:18<5:00:40, 55.85s/it]
48%|βββββ | 294/616 [4:37:13<4:59:35, 55.82s/it]
{'loss': 1.6479, 'learning_rate': 1.123349078610545e-05, 'epoch': 3.82} |
|
48%|βββββ | 294/616 [4:37:13<4:59:35, 55.82s/it]
48%|βββββ | 295/616 [4:38:10<4:59:14, 55.93s/it]
{'loss': 1.6851, 'learning_rate': 1.1181252819589081e-05, 'epoch': 3.83} |
|
48%|βββββ | 295/616 [4:38:10<4:59:14, 55.93s/it]
48%|βββββ | 296/616 [4:39:06<4:58:40, 56.00s/it]
{'loss': 1.6533, 'learning_rate': 1.1128982142139142e-05, 'epoch': 3.84} |
|
48%|βββββ | 296/616 [4:39:06<4:58:40, 56.00s/it]
48%|βββββ | 297/616 [4:40:02<4:58:04, 56.06s/it]
{'loss': 1.6367, 'learning_rate': 1.1076680201221093e-05, 'epoch': 3.86} |
|
48%|βββββ | 297/616 [4:40:02<4:58:04, 56.06s/it]
48%|βββββ | 298/616 [4:40:58<4:56:22, 55.92s/it]
{'loss': 1.6426, 'learning_rate': 1.1024348445166133e-05, 'epoch': 3.87} |
|
48%|βββββ | 298/616 [4:40:58<4:56:22, 55.92s/it]
49%|βββββ | 299/616 [4:41:54<4:56:48, 56.18s/it]
{'loss': 1.6509, 'learning_rate': 1.0971988323131099e-05, 'epoch': 3.88} |
|
49%|βββββ | 299/616 [4:41:54<4:56:48, 56.18s/it]
49%|βββββ | 300/616 [4:42:49<4:53:36, 55.75s/it]
{'loss': 1.6997, 'learning_rate': 1.091960128505833e-05, 'epoch': 3.9} |
|
49%|βββββ | 300/616 [4:42:49<4:53:36, 55.75s/it]
49%|βββββ | 301/616 [4:44:56<6:44:24, 77.03s/it]
{'loss': 1.6187, 'learning_rate': 1.086718878163551e-05, 'epoch': 3.91} |
|
49%|βββββ | 301/616 [4:44:56<6:44:24, 77.03s/it]
49%|βββββ | 302/616 [4:45:52<6:09:55, 70.69s/it]
{'loss': 1.6914, 'learning_rate': 1.0814752264255508e-05, 'epoch': 3.92} |
|
49%|βββββ | 302/616 [4:45:52<6:09:55, 70.69s/it]
49%|βββββ | 303/616 [4:46:47<5:44:48, 66.10s/it]
{'loss': 1.6421, 'learning_rate': 1.0762293184976178e-05, 'epoch': 3.94} |
|
49%|βββββ | 303/616 [4:46:47<5:44:48, 66.10s/it]
49%|βββββ | 304/616 [4:47:42<5:26:46, 62.84s/it]
{'loss': 1.6631, 'learning_rate': 1.070981299648016e-05, 'epoch': 3.95} |
|
49%|βββββ | 304/616 [4:47:42<5:26:46, 62.84s/it]
50%|βββββ | 305/616 [4:48:38<5:14:34, 60.69s/it]
{'loss': 1.7046, 'learning_rate': 1.0657313152034634e-05, 'epoch': 3.96} |
|
50%|βββββ | 305/616 [4:48:38<5:14:34, 60.69s/it]
50%|βββββ | 306/616 [4:49:33<5:04:42, 58.97s/it]
{'loss': 1.5845, 'learning_rate': 1.0604795105451096e-05, 'epoch': 3.97} |
|
50%|βββββ | 306/616 [4:49:33<5:04:42, 58.97s/it]
50%|βββββ | 307/616 [4:50:29<4:58:34, 57.97s/it]
{'loss': 1.6621, 'learning_rate': 1.0552260311045082e-05, 'epoch': 3.99} |
|
50%|βββββ | 307/616 [4:50:29<4:58:34, 57.97s/it]
50%|βββββ | 308/616 [4:51:24<4:53:57, 57.26s/it]
{'loss': 1.6782, 'learning_rate': 1.0499710223595913e-05, 'epoch': 4.0} |
|
50%|βββββ | 308/616 [4:51:24<4:53:57, 57.26s/it]
50%|βββββ | 309/616 [4:52:56<5:46:30, 67.72s/it]
{'loss': 1.6611, 'learning_rate': 1.0447146298306394e-05, 'epoch': 4.01} |
|
50%|βββββ | 309/616 [4:52:56<5:46:30, 67.72s/it]
50%|βββββ | 310/616 [4:53:52<5:26:19, 63.98s/it]
{'loss': 1.6626, 'learning_rate': 1.0394569990762528e-05, 'epoch': 4.03} |
|
50%|βββββ | 310/616 [4:53:52<5:26:19, 63.98s/it]
50%|βββββ | 311/616 [4:54:47<5:11:51, 61.35s/it]
{'loss': 1.6406, 'learning_rate': 1.0341982756893203e-05, 'epoch': 4.04} |
|
50%|βββββ | 311/616 [4:54:47<5:11:51, 61.35s/it]
51%|βββββ | 312/616 [4:55:42<5:01:12, 59.45s/it]
{'loss': 1.6455, 'learning_rate': 1.0289386052929874e-05, 'epoch': 4.05} |
|
51%|βββββ | 312/616 [4:55:42<5:01:12, 59.45s/it]
51%|βββββ | 313/616 [4:56:37<4:53:24, 58.10s/it]
{'loss': 1.7051, 'learning_rate': 1.0236781335366239e-05, 'epoch': 4.06} |
|
51%|βββββ | 313/616 [4:56:37<4:53:24, 58.10s/it]
51%|βββββ | 314/616 [4:57:32<4:47:47, 57.18s/it]
{'loss': 1.5967, 'learning_rate': 1.0184170060917914e-05, 'epoch': 4.08} |
|
51%|βββββ | 314/616 [4:57:32<4:47:47, 57.18s/it]
51%|βββββ | 315/616 [4:58:28<4:45:07, 56.84s/it]
{'loss': 1.6772, 'learning_rate': 1.0131553686482077e-05, 'epoch': 4.09} |
|
51%|βββββ | 315/616 [4:58:28<4:45:07, 56.84s/it]
51%|ββββββ | 316/616 [4:59:24<4:42:42, 56.54s/it]
{'loss': 1.625, 'learning_rate': 1.0078933669097135e-05, 'epoch': 4.1} |
|
51%|ββββββ | 316/616 [4:59:24<4:42:42, 56.54s/it]
51%|ββββββ | 317/616 [5:00:19<4:40:23, 56.27s/it]
{'loss': 1.6572, 'learning_rate': 1.002631146590238e-05, 'epoch': 4.12} |
|
51%|ββββββ | 317/616 [5:00:19<4:40:23, 56.27s/it]
52%|ββββββ | 318/616 [5:01:17<4:41:30, 56.68s/it]
{'loss': 1.6694, 'learning_rate': 9.973688534097624e-06, 'epoch': 4.13} |
|
52%|ββββββ | 318/616 [5:01:17<4:41:30, 56.68s/it]
52%|ββββββ | 319/616 [5:02:13<4:39:49, 56.53s/it]
{'loss': 1.6377, 'learning_rate': 9.92106633090287e-06, 'epoch': 4.14} |
|
52%|ββββββ | 319/616 [5:02:13<4:39:49, 56.53s/it]
52%|ββββββ | 320/616 [5:03:08<4:36:51, 56.12s/it]
{'loss': 1.6782, 'learning_rate': 9.868446313517927e-06, 'epoch': 4.16} |
|
52%|ββββββ | 320/616 [5:03:08<4:36:51, 56.12s/it]
52%|ββββββ | 321/616 [5:04:04<4:35:33, 56.04s/it]
{'loss': 1.6147, 'learning_rate': 9.815829939082087e-06, 'epoch': 4.17} |
|
52%|ββββββ | 321/616 [5:04:04<4:35:33, 56.04s/it]
52%|ββββββ | 322/616 [5:05:00<4:33:42, 55.86s/it]
{'loss': 1.6826, 'learning_rate': 9.763218664633763e-06, 'epoch': 4.18} |
|
52%|ββββββ | 322/616 [5:05:00<4:33:42, 55.86s/it]
52%|ββββββ | 323/616 [5:05:56<4:32:53, 55.88s/it]
{'loss': 1.7041, 'learning_rate': 9.710613947070127e-06, 'epoch': 4.19} |
|
52%|ββββββ | 323/616 [5:05:56<4:32:53, 55.88s/it]
53%|ββββββ | 324/616 [5:06:51<4:31:22, 55.76s/it]
{'loss': 1.6343, 'learning_rate': 9.658017243106802e-06, 'epoch': 4.21} |
|
53%|ββββββ | 324/616 [5:06:51<4:31:22, 55.76s/it]
53%|ββββββ | 325/616 [5:07:47<4:30:07, 55.69s/it]
{'loss': 1.6724, 'learning_rate': 9.605430009237474e-06, 'epoch': 4.22} |
|
53%|ββββββ | 325/616 [5:07:47<4:30:07, 55.69s/it]
53%|ββββββ | 326/616 [5:08:42<4:28:06, 55.47s/it]
{'loss': 1.6812, 'learning_rate': 9.552853701693606e-06, 'epoch': 4.23} |
|
53%|ββββββ | 326/616 [5:08:42<4:28:06, 55.47s/it]
53%|ββββββ | 327/616 [5:09:37<4:27:10, 55.47s/it]
{'loss': 1.6289, 'learning_rate': 9.50028977640409e-06, 'epoch': 4.25} |
|
53%|ββββββ | 327/616 [5:09:37<4:27:10, 55.47s/it]
53%|ββββββ | 328/616 [5:10:33<4:26:54, 55.60s/it]
{'loss': 1.6313, 'learning_rate': 9.44773968895492e-06, 'epoch': 4.26} |
|
53%|ββββββ | 328/616 [5:10:33<4:26:54, 55.60s/it]
53%|ββββββ | 329/616 [5:11:29<4:26:11, 55.65s/it]
{'loss': 1.6274, 'learning_rate': 9.395204894548907e-06, 'epoch': 4.27} |
|
53%|ββββββ | 329/616 [5:11:29<4:26:11, 55.65s/it]
54%|ββββββ | 330/616 [5:12:24<4:24:35, 55.51s/it]
{'loss': 1.6572, 'learning_rate': 9.342686847965367e-06, 'epoch': 4.29} |
|
54%|ββββββ | 330/616 [5:12:24<4:24:35, 55.51s/it]
54%|ββββββ | 331/616 [5:13:20<4:25:05, 55.81s/it]
{'loss': 1.6333, 'learning_rate': 9.290187003519841e-06, 'epoch': 4.3} |
|
54%|ββββββ | 331/616 [5:13:20<4:25:05, 55.81s/it]
54%|ββββββ | 332/616 [5:14:15<4:22:49, 55.53s/it]
{'loss': 1.687, 'learning_rate': 9.237706815023824e-06, 'epoch': 4.31} |
|
54%|ββββββ | 332/616 [5:14:15<4:22:49, 55.53s/it]
54%|ββββββ | 333/616 [5:15:11<4:22:17, 55.61s/it]
{'loss': 1.6626, 'learning_rate': 9.185247735744495e-06, 'epoch': 4.32} |
|
54%|ββββββ | 333/616 [5:15:11<4:22:17, 55.61s/it]
54%|ββββββ | 334/616 [5:16:08<4:22:56, 55.94s/it]
{'loss': 1.6431, 'learning_rate': 9.132811218364494e-06, 'epoch': 4.34} |
|
54%|ββββββ | 334/616 [5:16:08<4:22:56, 55.94s/it]
54%|ββββββ | 335/616 [5:17:04<4:22:14, 56.00s/it]
{'loss': 1.6562, 'learning_rate': 9.080398714941672e-06, 'epoch': 4.35} |
|
54%|ββββββ | 335/616 [5:17:04<4:22:14, 56.00s/it]
55%|ββββββ | 336/616 [5:17:59<4:20:36, 55.84s/it]
{'loss': 1.6714, 'learning_rate': 9.028011676868901e-06, 'epoch': 4.36} |
|
55%|ββββββ | 336/616 [5:17:59<4:20:36, 55.84s/it]
55%|ββββββ | 337/616 [5:18:55<4:20:00, 55.92s/it]
{'loss': 1.604, 'learning_rate': 8.975651554833869e-06, 'epoch': 4.38} |
|
55%|ββββββ | 337/616 [5:18:55<4:20:00, 55.92s/it]
55%|ββββββ | 338/616 [5:19:51<4:18:46, 55.85s/it]
{'loss': 1.6719, 'learning_rate': 8.92331979877891e-06, 'epoch': 4.39} |
|
55%|ββββββ | 338/616 [5:19:51<4:18:46, 55.85s/it]
55%|ββββββ | 339/616 [5:20:47<4:18:26, 55.98s/it]
{'loss': 1.707, 'learning_rate': 8.871017857860863e-06, 'epoch': 4.4} |
|
55%|ββββββ | 339/616 [5:20:47<4:18:26, 55.98s/it]
55%|ββββββ | 340/616 [5:21:42<4:15:46, 55.60s/it]
{'loss': 1.647, 'learning_rate': 8.81874718041092e-06, 'epoch': 4.42} |
|
55%|ββββββ | 340/616 [5:21:42<4:15:46, 55.60s/it]
55%|ββββββ | 341/616 [5:22:38<4:14:55, 55.62s/it]
{'loss': 1.6675, 'learning_rate': 8.766509213894552e-06, 'epoch': 4.43} |
|
55%|ββββββ | 341/616 [5:22:38<4:14:55, 55.62s/it]
56%|ββββββ | 342/616 [5:23:34<4:14:39, 55.76s/it]
{'loss': 1.6636, 'learning_rate': 8.714305404871397e-06, 'epoch': 4.44} |
|
56%|ββββββ | 342/616 [5:23:34<4:14:39, 55.76s/it]
56%|ββββββ | 343/616 [5:24:29<4:12:18, 55.45s/it]
{'loss': 1.6768, 'learning_rate': 8.662137198955211e-06, 'epoch': 4.45} |
|
56%|ββββββ | 343/616 [5:24:29<4:12:18, 55.45s/it]
56%|ββββββ | 344/616 [5:25:23<4:10:08, 55.18s/it]
{'loss': 1.5864, 'learning_rate': 8.610006040773844e-06, 'epoch': 4.47} |
|
56%|ββββββ | 344/616 [5:25:23<4:10:08, 55.18s/it]
56%|ββββββ | 345/616 [5:26:19<4:10:41, 55.50s/it]
{'loss': 1.6304, 'learning_rate': 8.557913373929222e-06, 'epoch': 4.48} |
|
56%|ββββββ | 345/616 [5:26:19<4:10:41, 55.50s/it]
56%|ββββββ | 346/616 [5:27:15<4:10:23, 55.64s/it]
{'loss': 1.6289, 'learning_rate': 8.50586064095739e-06, 'epoch': 4.49} |
|
56%|ββββββ | 346/616 [5:27:15<4:10:23, 55.64s/it]
56%|ββββββ | 347/616 [5:28:11<4:09:24, 55.63s/it]
{'loss': 1.6436, 'learning_rate': 8.453849283288554e-06, 'epoch': 4.51} |
|
56%|ββββββ | 347/616 [5:28:11<4:09:24, 55.63s/it]
56%|ββββββ | 348/616 [5:29:07<4:08:28, 55.63s/it]
{'loss': 1.6221, 'learning_rate': 8.401880741207155e-06, 'epoch': 4.52} |
|
56%|ββββββ | 348/616 [5:29:07<4:08:28, 55.63s/it]
57%|ββββββ | 349/616 [5:30:03<4:08:31, 55.85s/it]
{'loss': 1.6904, 'learning_rate': 8.349956453812009e-06, 'epoch': 4.53} |
|
57%|ββββββ | 349/616 [5:30:03<4:08:31, 55.85s/it]
57%|ββββββ | 350/616 [5:30:58<4:06:06, 55.51s/it]
{'loss': 1.5898, 'learning_rate': 8.298077858976435e-06, 'epoch': 4.55} |
|
57%|ββββββ | 350/616 [5:30:58<4:06:06, 55.51s/it]
57%|ββββββ | 351/616 [5:31:53<4:04:15, 55.30s/it]
{'loss': 1.667, 'learning_rate': 8.246246393308448e-06, 'epoch': 4.56} |
|
57%|ββββββ | 351/616 [5:31:53<4:04:15, 55.30s/it]
57%|ββββββ | 352/616 [5:32:48<4:03:37, 55.37s/it]
{'loss': 1.6543, 'learning_rate': 8.194463492110982e-06, 'epoch': 4.57} |
|
57%|ββββββ | 352/616 [5:32:48<4:03:37, 55.37s/it]
57%|ββββββ | 353/616 [5:33:44<4:03:10, 55.48s/it]
{'loss': 1.6572, 'learning_rate': 8.142730589342119e-06, 'epoch': 4.58} |
|
57%|ββββββ | 353/616 [5:33:44<4:03:10, 55.48s/it]
57%|ββββββ | 354/616 [5:34:40<4:03:36, 55.79s/it]
{'loss': 1.6685, 'learning_rate': 8.091049117575424e-06, 'epoch': 4.6} |
|
57%|ββββββ | 354/616 [5:34:40<4:03:36, 55.79s/it]
58%|ββββββ | 355/616 [5:35:36<4:02:46, 55.81s/it]
{'loss': 1.6484, 'learning_rate': 8.03942050796022e-06, 'epoch': 4.61} |
|
58%|ββββββ | 355/616 [5:35:36<4:02:46, 55.81s/it]
58%|ββββββ | 356/616 [5:36:31<4:00:57, 55.61s/it]
{'loss': 1.5405, 'learning_rate': 7.98784619018201e-06, 'epoch': 4.62} |
|
58%|ββββββ | 356/616 [5:36:31<4:00:57, 55.61s/it]
58%|ββββββ | 357/616 [5:37:26<3:59:25, 55.46s/it]
{'loss': 1.644, 'learning_rate': 7.93632759242285e-06, 'epoch': 4.64} |
|
58%|ββββββ | 357/616 [5:37:26<3:59:25, 55.46s/it]
58%|ββββββ | 358/616 [5:38:22<3:58:25, 55.45s/it]
{'loss': 1.6206, 'learning_rate': 7.884866141321811e-06, 'epoch': 4.65} |
|
58%|ββββββ | 358/616 [5:38:22<3:58:25, 55.45s/it]
58%|ββββββ | 359/616 [5:39:17<3:57:30, 55.45s/it]
{'loss': 1.6079, 'learning_rate': 7.833463261935482e-06, 'epoch': 4.66} |
|
58%|ββββββ | 359/616 [5:39:17<3:57:30, 55.45s/it]
58%|ββββββ | 360/616 [5:40:13<3:56:23, 55.40s/it]
{'loss': 1.6108, 'learning_rate': 7.782120377698489e-06, 'epoch': 4.68} |
|
58%|ββββββ | 360/616 [5:40:13<3:56:23, 55.40s/it]
59%|ββββββ | 361/616 [5:41:09<3:56:28, 55.64s/it]
{'loss': 1.5625, 'learning_rate': 7.730838910384098e-06, 'epoch': 4.69} |
|
59%|ββββββ | 361/616 [5:41:09<3:56:28, 55.64s/it]
59%|ββββββ | 362/616 [5:42:04<3:54:35, 55.42s/it]
{'loss': 1.647, 'learning_rate': 7.679620280064837e-06, 'epoch': 4.7} |
|
59%|ββββββ | 362/616 [5:42:04<3:54:35, 55.42s/it]
59%|ββββββ | 363/616 [5:43:00<3:54:24, 55.59s/it]
{'loss': 1.5493, 'learning_rate': 7.6284659050731525e-06, 'epoch': 4.71} |
|
59%|ββββββ | 363/616 [5:43:00<3:54:24, 55.59s/it]
59%|ββββββ | 364/616 [5:43:55<3:53:37, 55.62s/it]
{'loss': 1.6362, 'learning_rate': 7.57737720196217e-06, 'epoch': 4.73} |
|
59%|ββββββ | 364/616 [5:43:55<3:53:37, 55.62s/it]
59%|ββββββ | 365/616 [5:44:51<3:53:05, 55.72s/it]
{'loss': 1.6294, 'learning_rate': 7.526355585466432e-06, 'epoch': 4.74} |
|
59%|ββββββ | 365/616 [5:44:51<3:53:05, 55.72s/it]
59%|ββββββ | 366/616 [5:45:48<3:53:24, 56.02s/it]
{'loss': 1.6675, 'learning_rate': 7.4754024684627405e-06, 'epoch': 4.75} |
|
59%|ββββββ | 366/616 [5:45:48<3:53:24, 56.02s/it]
60%|ββββββ | 367/616 [5:46:44<3:51:58, 55.90s/it]
{'loss': 1.6519, 'learning_rate': 7.424519261931036e-06, 'epoch': 4.77} |
|
60%|ββββββ | 367/616 [5:46:44<3:51:58, 55.90s/it]
60%|ββββββ | 368/616 [5:47:39<3:50:52, 55.86s/it]
{'loss': 1.6807, 'learning_rate': 7.373707374915303e-06, 'epoch': 4.78} |
|
60%|ββββββ | 368/616 [5:47:39<3:50:52, 55.86s/it]
60%|ββββββ | 369/616 [5:48:35<3:49:39, 55.79s/it]
{'loss': 1.6221, 'learning_rate': 7.322968214484583e-06, 'epoch': 4.79} |
|
60%|ββββββ | 369/616 [5:48:35<3:49:39, 55.79s/it]
60%|ββββββ | 370/616 [5:49:30<3:48:07, 55.64s/it]
{'loss': 1.6523, 'learning_rate': 7.27230318569397e-06, 'epoch': 4.81} |
|
60%|ββββββ | 370/616 [5:49:30<3:48:07, 55.64s/it]
60%|ββββββ | 371/616 [5:50:26<3:47:32, 55.72s/it]
{'loss': 1.6118, 'learning_rate': 7.221713691545746e-06, 'epoch': 4.82} |
|
60%|ββββββ | 371/616 [5:50:26<3:47:32, 55.72s/it]
60%|ββββββ | 372/616 [5:51:22<3:46:53, 55.79s/it]
{'loss': 1.6279, 'learning_rate': 7.171201132950502e-06, 'epoch': 4.83} |
|
60%|ββββββ | 372/616 [5:51:22<3:46:53, 55.79s/it]
61%|ββββββ | 373/616 [5:52:18<3:45:21, 55.64s/it]
{'loss': 1.6416, 'learning_rate': 7.1207669086883366e-06, 'epoch': 4.84} |
|
61%|ββββββ | 373/616 [5:52:18<3:45:21, 55.64s/it]
61%|ββββββ | 374/616 [5:53:13<3:44:41, 55.71s/it]
{'loss': 1.605, 'learning_rate': 7.070412415370158e-06, 'epoch': 4.86} |
|
61%|ββββββ | 374/616 [5:53:13<3:44:41, 55.71s/it]
61%|ββββββ | 375/616 [5:54:10<3:44:46, 55.96s/it]
{'loss': 1.627, 'learning_rate': 7.020139047398966e-06, 'epoch': 4.87} |
|
61%|ββββββ | 375/616 [5:54:10<3:44:46, 55.96s/it]
61%|ββββββ | 376/616 [5:55:04<3:41:28, 55.37s/it]
{'loss': 1.6123, 'learning_rate': 6.969948196931272e-06, 'epoch': 4.88} |
|
61%|ββββββ | 376/616 [5:55:04<3:41:28, 55.37s/it]
61%|ββββββ | 377/616 [5:55:59<3:40:43, 55.41s/it]
{'loss': 1.6333, 'learning_rate': 6.919841253838537e-06, 'epoch': 4.9} |
|
61%|ββββββ | 377/616 [5:55:59<3:40:43, 55.41s/it]
61%|βββββββ | 378/616 [5:56:56<3:40:57, 55.70s/it]
{'loss': 1.5981, 'learning_rate': 6.869819605668669e-06, 'epoch': 4.91} |
|
61%|βββββββ | 378/616 [5:56:56<3:40:57, 55.70s/it]
62%|βββββββ | 379/616 [5:57:51<3:39:35, 55.59s/it]
{'loss': 1.646, 'learning_rate': 6.819884637607619e-06, 'epoch': 4.92} |
|
62%|βββββββ | 379/616 [5:57:51<3:39:35, 55.59s/it]
62%|βββββββ | 380/616 [5:58:46<3:38:23, 55.52s/it]
{'loss': 1.6641, 'learning_rate': 6.770037732441019e-06, 'epoch': 4.94} |
|
62%|βββββββ | 380/616 [5:58:46<3:38:23, 55.52s/it]
62%|βββββββ | 381/616 [5:59:42<3:36:55, 55.38s/it]
{'loss': 1.6362, 'learning_rate': 6.720280270515882e-06, 'epoch': 4.95} |
|
62%|βββββββ | 381/616 [5:59:42<3:36:55, 55.38s/it]
62%|βββββββ | 382/616 [6:00:38<3:36:40, 55.56s/it]
{'loss': 1.6562, 'learning_rate': 6.670613629702391e-06, 'epoch': 4.96} |
|
62%|βββββββ | 382/616 [6:00:38<3:36:40, 55.56s/it]
62%|βββββββ | 383/616 [6:01:33<3:35:24, 55.47s/it]
{'loss': 1.6772, 'learning_rate': 6.62103918535572e-06, 'epoch': 4.97} |
|
62%|βββββββ | 383/616 [6:01:33<3:35:24, 55.47s/it]
62%|βββββββ | 384/616 [6:02:29<3:34:58, 55.60s/it]
{'loss': 1.6729, 'learning_rate': 6.5715583102779815e-06, 'epoch': 4.99} |
|
62%|βββββββ | 384/616 [6:02:29<3:34:58, 55.60s/it]
62%|βββββββ | 385/616 [6:03:24<3:33:52, 55.55s/it]
{'loss': 1.6597, 'learning_rate': 6.522172374680177e-06, 'epoch': 5.0} |
|
62%|βββββββ | 385/616 [6:03:24<3:33:52, 55.55s/it]
63%|βββββββ | 386/616 [6:04:53<4:11:03, 65.49s/it]
{'loss': 1.6348, 'learning_rate': 6.472882746144282e-06, 'epoch': 5.01} |
|
63%|βββββββ | 386/616 [6:04:53<4:11:03, 65.49s/it]
63%|βββββββ | 387/616 [6:05:49<3:59:32, 62.76s/it]
{'loss': 1.6108, 'learning_rate': 6.423690789585359e-06, 'epoch': 5.03} |
|
63%|βββββββ | 387/616 [6:05:49<3:59:32, 62.76s/it]
63%|βββββββ | 388/616 [6:06:45<3:50:45, 60.73s/it]
{'loss': 1.6421, 'learning_rate': 6.374597867213756e-06, 'epoch': 5.04} |
|
63%|βββββββ | 388/616 [6:06:45<3:50:45, 60.73s/it]
63%|βββββββ | 389/616 [6:07:41<3:43:48, 59.16s/it]
{'loss': 1.6455, 'learning_rate': 6.3256053384974105e-06, 'epoch': 5.05} |
|
63%|βββββββ | 389/616 [6:07:41<3:43:48, 59.16s/it]
63%|βββββββ | 390/616 [6:08:37<3:39:10, 58.19s/it]
{'loss': 1.6616, 'learning_rate': 6.276714560124166e-06, 'epoch': 5.06} |
|
63%|βββββββ | 390/616 [6:08:37<3:39:10, 58.19s/it]
63%|βββββββ | 391/616 [6:09:31<3:34:03, 57.08s/it]
{'loss': 1.6162, 'learning_rate': 6.2279268859642396e-06, 'epoch': 5.08} |
|
63%|βββββββ | 391/616 [6:09:31<3:34:03, 57.08s/it]
64%|βββββββ | 392/616 [6:10:27<3:31:15, 56.59s/it]
{'loss': 1.6646, 'learning_rate': 6.179243667032709e-06, 'epoch': 5.09} |
|
64%|βββββββ | 392/616 [6:10:27<3:31:15, 56.59s/it]
64%|βββββββ | 393/616 [6:11:22<3:29:23, 56.34s/it]
{'loss': 1.6445, 'learning_rate': 6.130666251452102e-06, 'epoch': 5.1} |
|
64%|βββββββ | 393/616 [6:11:22<3:29:23, 56.34s/it]
64%|βββββββ | 394/616 [6:12:18<3:28:01, 56.22s/it]
{'loss': 1.6299, 'learning_rate': 6.082195984415069e-06, 'epoch': 5.12} |
|
64%|βββββββ | 394/616 [6:12:18<3:28:01, 56.22s/it]
64%|βββββββ | 395/616 [6:13:13<3:25:56, 55.91s/it]
{'loss': 1.6221, 'learning_rate': 6.03383420814714e-06, 'epoch': 5.13} |
|
64%|βββββββ | 395/616 [6:13:13<3:25:56, 55.91s/it]
64%|βββββββ | 396/616 [6:14:08<3:24:04, 55.65s/it]
{'loss': 1.647, 'learning_rate': 5.9855822618695385e-06, 'epoch': 5.14} |
|
64%|βββββββ | 396/616 [6:14:08<3:24:04, 55.65s/it]
64%|βββββββ | 397/616 [6:15:04<3:22:35, 55.50s/it]
{'loss': 1.6147, 'learning_rate': 5.937441481762112e-06, 'epoch': 5.16} |
|
64%|βββββββ | 397/616 [6:15:04<3:22:35, 55.50s/it]
65%|βββββββ | 398/616 [6:15:59<3:21:06, 55.35s/it]
{'loss': 1.6025, 'learning_rate': 5.889413200926317e-06, 'epoch': 5.17} |
|
65%|βββββββ | 398/616 [6:15:59<3:21:06, 55.35s/it]
65%|βββββββ | 399/616 [6:16:54<3:19:48, 55.25s/it]
{'loss': 1.6064, 'learning_rate': 5.841498749348322e-06, 'epoch': 5.18} |
|
65%|βββββββ | 399/616 [6:16:54<3:19:48, 55.25s/it]
65%|βββββββ | 400/616 [6:17:50<3:19:49, 55.50s/it]
{'loss': 1.6587, 'learning_rate': 5.793699453862161e-06, 'epoch': 5.19} |
|
65%|βββββββ | 400/616 [6:17:50<3:19:49, 55.50s/it]
65%|βββββββ | 401/616 [6:19:54<4:33:18, 76.27s/it]
{'loss': 1.6255, 'learning_rate': 5.746016638112986e-06, 'epoch': 5.21} |
|
65%|βββββββ | 401/616 [6:19:54<4:33:18, 76.27s/it]
65%|βββββββ | 402/616 [6:20:50<4:09:42, 70.01s/it]
{'loss': 1.6523, 'learning_rate': 5.698451622520442e-06, 'epoch': 5.22} |
|
65%|βββββββ | 402/616 [6:20:50<4:09:42, 70.01s/it]
65%|βββββββ | 403/616 [6:21:45<3:52:28, 65.49s/it]
{'loss': 1.6367, 'learning_rate': 5.651005724242072e-06, 'epoch': 5.23} |
|
65%|βββββββ | 403/616 [6:21:45<3:52:28, 65.49s/it]
66%|βββββββ | 404/616 [6:22:40<3:40:53, 62.52s/it]
{'loss': 1.6006, 'learning_rate': 5.603680257136857e-06, 'epoch': 5.25} |
|
66%|βββββββ | 404/616 [6:22:40<3:40:53, 62.52s/it]
66%|βββββββ | 405/616 [6:23:37<3:33:15, 60.64s/it]
{'loss': 1.6294, 'learning_rate': 5.556476531728836e-06, 'epoch': 5.26} |
|
66%|βββββββ | 405/616 [6:23:37<3:33:15, 60.64s/it]
66%|βββββββ | 406/616 [6:24:33<3:27:23, 59.26s/it]
{'loss': 1.6284, 'learning_rate': 5.509395855170798e-06, 'epoch': 5.27} |
|
66%|βββββββ | 406/616 [6:24:33<3:27:23, 59.26s/it]
66%|βββββββ | 407/616 [6:25:29<3:23:21, 58.38s/it]
{'loss': 1.6392, 'learning_rate': 5.4624395312081125e-06, 'epoch': 5.29} |
|
66%|βββββββ | 407/616 [6:25:29<3:23:21, 58.38s/it]
66%|βββββββ | 408/616 [6:26:25<3:20:23, 57.80s/it]
{'loss': 1.625, 'learning_rate': 5.415608860142593e-06, 'epoch': 5.3} |
|
66%|βββββββ | 408/616 [6:26:25<3:20:23, 57.80s/it]
66%|βββββββ | 409/616 [6:27:21<3:17:10, 57.15s/it]
{'loss': 1.6162, 'learning_rate': 5.368905138796523e-06, 'epoch': 5.31} |
|
66%|βββββββ | 409/616 [6:27:21<3:17:10, 57.15s/it]
67%|βββββββ | 410/616 [6:28:17<3:14:41, 56.71s/it]
{'loss': 1.5752, 'learning_rate': 5.322329660476715e-06, 'epoch': 5.32} |
|
67%|βββββββ | 410/616 [6:28:17<3:14:41, 56.71s/it]
67%|βββββββ | 411/616 [6:29:12<3:12:29, 56.34s/it]
{'loss': 1.6655, 'learning_rate': 5.275883714938726e-06, 'epoch': 5.34} |
|
67%|βββββββ | 411/616 [6:29:12<3:12:29, 56.34s/it]
67%|βββββββ | 412/616 [6:30:08<3:10:27, 56.02s/it]
{'loss': 1.5972, 'learning_rate': 5.2295685883511086e-06, 'epoch': 5.35} |
|
67%|βββββββ | 412/616 [6:30:08<3:10:27, 56.02s/it]
67%|βββββββ | 413/616 [6:31:03<3:09:04, 55.88s/it]
{'loss': 1.6421, 'learning_rate': 5.183385563259819e-06, 'epoch': 5.36} |
|
67%|βββββββ | 413/616 [6:31:03<3:09:04, 55.88s/it]
67%|βββββββ | 414/616 [6:31:59<3:07:42, 55.76s/it]
{'loss': 1.5869, 'learning_rate': 5.137335918552702e-06, 'epoch': 5.38} |
|
67%|βββββββ | 414/616 [6:31:59<3:07:42, 55.76s/it]
67%|βββββββ | 415/616 [6:32:54<3:06:39, 55.72s/it]
{'loss': 1.6333, 'learning_rate': 5.091420929424065e-06, 'epoch': 5.39} |
|
67%|βββββββ | 415/616 [6:32:54<3:06:39, 55.72s/it]
68%|βββββββ | 416/616 [6:33:50<3:05:36, 55.68s/it]
{'loss': 1.6445, 'learning_rate': 5.045641867339361e-06, 'epoch': 5.4} |
|
68%|βββββββ | 416/616 [6:33:50<3:05:36, 55.68s/it]
68%|βββββββ | 417/616 [6:34:47<3:05:53, 56.05s/it]
{'loss': 1.6597, 'learning_rate': 5.000000000000003e-06, 'epoch': 5.42} |
|
68%|βββββββ | 417/616 [6:34:47<3:05:53, 56.05s/it]
68%|βββββββ | 418/616 [6:35:42<3:04:23, 55.88s/it]
{'loss': 1.6387, 'learning_rate': 4.954496591308227e-06, 'epoch': 5.43} |
|
68%|βββββββ | 418/616 [6:35:42<3:04:23, 55.88s/it]
68%|βββββββ | 419/616 [6:36:38<3:03:44, 55.96s/it]
{'loss': 1.6489, 'learning_rate': 4.909132901332122e-06, 'epoch': 5.44} |
|
68%|βββββββ | 419/616 [6:36:38<3:03:44, 55.96s/it]
68%|βββββββ | 420/616 [6:37:35<3:03:39, 56.22s/it]
{'loss': 1.6318, 'learning_rate': 4.863910186270726e-06, 'epoch': 5.45} |
|
68%|βββββββ | 420/616 [6:37:35<3:03:39, 56.22s/it]
68%|βββββββ | 421/616 [6:38:31<3:02:23, 56.12s/it]
{'loss': 1.6841, 'learning_rate': 4.818829698419225e-06, 'epoch': 5.47} |
|
68%|βββββββ | 421/616 [6:38:31<3:02:23, 56.12s/it]
69%|βββββββ | 422/616 [6:39:27<3:01:16, 56.07s/it]
{'loss': 1.666, 'learning_rate': 4.773892686134301e-06, 'epoch': 5.48} |
|
69%|βββββββ | 422/616 [6:39:27<3:01:16, 56.07s/it]
69%|βββββββ | 423/616 [6:40:22<2:59:42, 55.87s/it]
{'loss': 1.6162, 'learning_rate': 4.729100393799538e-06, 'epoch': 5.49} |
|
69%|βββββββ | 423/616 [6:40:22<2:59:42, 55.87s/it]
69%|βββββββ | 424/616 [6:41:19<2:59:17, 56.03s/it]
{'loss': 1.5957, 'learning_rate': 4.684454061790987e-06, 'epoch': 5.51} |
|
69%|βββββββ | 424/616 [6:41:19<2:59:17, 56.03s/it]
69%|βββββββ | 425/616 [6:42:13<2:56:55, 55.58s/it]
{'loss': 1.6201, 'learning_rate': 4.639954926442792e-06, 'epoch': 5.52} |
|
69%|βββββββ | 425/616 [6:42:13<2:56:55, 55.58s/it]
69%|βββββββ | 426/616 [6:43:09<2:55:42, 55.49s/it]
{'loss': 1.6533, 'learning_rate': 4.5956042200129725e-06, 'epoch': 5.53} |
|
69%|βββββββ | 426/616 [6:43:09<2:55:42, 55.49s/it]
69%|βββββββ | 427/616 [6:44:05<2:55:42, 55.78s/it]
{'loss': 1.624, 'learning_rate': 4.551403170649299e-06, 'epoch': 5.55} |
|
69%|βββββββ | 427/616 [6:44:05<2:55:42, 55.78s/it]
69%|βββββββ | 428/616 [6:45:00<2:53:57, 55.52s/it]
{'loss': 1.604, 'learning_rate': 4.507353002355269e-06, 'epoch': 5.56} |
|
69%|βββββββ | 428/616 [6:45:00<2:53:57, 55.52s/it]
70%|βββββββ | 429/616 [6:45:56<2:53:53, 55.80s/it]
{'loss': 1.6089, 'learning_rate': 4.4634549349562315e-06, 'epoch': 5.57} |
|
70%|βββββββ | 429/616 [6:45:56<2:53:53, 55.80s/it]
70%|βββββββ | 430/616 [6:46:52<2:52:24, 55.62s/it]
{'loss': 1.5962, 'learning_rate': 4.4197101840656e-06, 'epoch': 5.58} |
|
70%|βββββββ | 430/616 [6:46:52<2:52:24, 55.62s/it]
70%|βββββββ | 431/616 [6:47:48<2:51:48, 55.72s/it]
{'loss': 1.5962, 'learning_rate': 4.376119961051175e-06, 'epoch': 5.6} |
|
70%|βββββββ | 431/616 [6:47:48<2:51:48, 55.72s/it]
70%|βββββββ | 432/616 [6:48:43<2:50:43, 55.67s/it]
{'loss': 1.6313, 'learning_rate': 4.33268547300163e-06, 'epoch': 5.61} |
|
70%|βββββββ | 432/616 [6:48:43<2:50:43, 55.67s/it]
70%|βββββββ | 433/616 [6:49:38<2:48:53, 55.37s/it]
{'loss': 1.6626, 'learning_rate': 4.289407922693053e-06, 'epoch': 5.62} |
|
70%|βββββββ | 433/616 [6:49:38<2:48:53, 55.37s/it]
70%|βββββββ | 434/616 [6:50:34<2:48:44, 55.63s/it]
{'loss': 1.5796, 'learning_rate': 4.2462885085556635e-06, 'epoch': 5.64} |
|
70%|βββββββ | 434/616 [6:50:34<2:48:44, 55.63s/it]
71%|βββββββ | 435/616 [6:51:30<2:48:04, 55.72s/it]
{'loss': 1.6836, 'learning_rate': 4.203328424640619e-06, 'epoch': 5.65} |
|
71%|βββββββ | 435/616 [6:51:30<2:48:04, 55.72s/it]
71%|βββββββ | 436/616 [6:52:25<2:46:41, 55.56s/it]
{'loss': 1.6675, 'learning_rate': 4.1605288605869365e-06, 'epoch': 5.66} |
|
71%|βββββββ | 436/616 [6:52:25<2:46:41, 55.56s/it]
71%|βββββββ | 437/616 [6:53:21<2:46:12, 55.71s/it]
{'loss': 1.6807, 'learning_rate': 4.117891001588574e-06, 'epoch': 5.68} |
|
71%|βββββββ | 437/616 [6:53:21<2:46:12, 55.71s/it]
71%|βββββββ | 438/616 [6:54:17<2:45:16, 55.71s/it]
{'loss': 1.6167, 'learning_rate': 4.075416028361584e-06, 'epoch': 5.69} |
|
71%|βββββββ | 438/616 [6:54:17<2:45:16, 55.71s/it]
71%|ββββββββ | 439/616 [6:55:14<2:45:15, 56.02s/it]
{'loss': 1.6851, 'learning_rate': 4.033105117111441e-06, 'epoch': 5.7} |
|
71%|ββββββββ | 439/616 [6:55:14<2:45:15, 56.02s/it]
71%|ββββββββ | 440/616 [6:56:09<2:43:49, 55.85s/it]
{'loss': 1.6191, 'learning_rate': 3.9909594395004545e-06, 'epoch': 5.71} |
|
71%|ββββββββ | 440/616 [6:56:09<2:43:49, 55.85s/it]
72%|ββββββββ | 441/616 [6:57:05<2:43:19, 56.00s/it]
{'loss': 1.6362, 'learning_rate': 3.948980162615323e-06, 'epoch': 5.73} |
|
72%|ββββββββ | 441/616 [6:57:05<2:43:19, 56.00s/it]
72%|ββββββββ | 442/616 [6:58:01<2:41:55, 55.84s/it]
{'loss': 1.5825, 'learning_rate': 3.907168448934836e-06, 'epoch': 5.74} |
|
72%|ββββββββ | 442/616 [6:58:01<2:41:55, 55.84s/it]
72%|ββββββββ | 443/616 [6:58:56<2:40:39, 55.72s/it]
{'loss': 1.6182, 'learning_rate': 3.865525456297652e-06, 'epoch': 5.75} |
|
72%|ββββββββ | 443/616 [6:58:56<2:40:39, 55.72s/it]
72%|ββββββββ | 444/616 [6:59:51<2:39:13, 55.54s/it]
{'loss': 1.5908, 'learning_rate': 3.824052337870263e-06, 'epoch': 5.77} |
|
72%|ββββββββ | 444/616 [6:59:51<2:39:13, 55.54s/it]
72%|ββββββββ | 445/616 [7:00:47<2:38:13, 55.52s/it]
{'loss': 1.6162, 'learning_rate': 3.7827502421150497e-06, 'epoch': 5.78} |
|
72%|ββββββββ | 445/616 [7:00:47<2:38:13, 55.52s/it]
72%|ββββββββ | 446/616 [7:01:43<2:38:07, 55.81s/it]
{'loss': 1.6021, 'learning_rate': 3.741620312758469e-06, 'epoch': 5.79} |
|
72%|ββββββββ | 446/616 [7:01:43<2:38:07, 55.81s/it]
73%|ββββββββ | 447/616 [7:02:39<2:37:12, 55.81s/it]
{'loss': 1.6479, 'learning_rate': 3.7006636887594095e-06, 'epoch': 5.81} |
|
73%|ββββββββ | 447/616 [7:02:39<2:37:12, 55.81s/it]
73%|ββββββββ | 448/616 [7:03:34<2:35:15, 55.45s/it]
{'loss': 1.6294, 'learning_rate': 3.6598815042776135e-06, 'epoch': 5.82} |
|
73%|ββββββββ | 448/616 [7:03:34<2:35:15, 55.45s/it]
73%|ββββββββ | 449/616 [7:04:29<2:34:24, 55.48s/it]
{'loss': 1.6914, 'learning_rate': 3.619274888642309e-06, 'epoch': 5.83} |
|
73%|ββββββββ | 449/616 [7:04:29<2:34:24, 55.48s/it]
73%|ββββββββ | 450/616 [7:05:25<2:33:44, 55.57s/it]
{'loss': 1.6226, 'learning_rate': 3.578844966320917e-06, 'epoch': 5.84} |
|
73%|ββββββββ | 450/616 [7:05:25<2:33:44, 55.57s/it]
73%|ββββββββ | 451/616 [7:06:20<2:32:34, 55.48s/it]
{'loss': 1.6196, 'learning_rate': 3.5385928568879012e-06, 'epoch': 5.86} |
|
73%|ββββββββ | 451/616 [7:06:20<2:32:34, 55.48s/it]
73%|ββββββββ | 452/616 [7:07:16<2:31:53, 55.57s/it]
{'loss': 1.5977, 'learning_rate': 3.4985196749937976e-06, 'epoch': 5.87} |
|
73%|ββββββββ | 452/616 [7:07:16<2:31:53, 55.57s/it]
74%|ββββββββ | 453/616 [7:08:12<2:31:23, 55.73s/it]
{'loss': 1.5786, 'learning_rate': 3.458626530334316e-06, 'epoch': 5.88} |
|
74%|ββββββββ | 453/616 [7:08:12<2:31:23, 55.73s/it]
74%|ββββββββ | 454/616 [7:09:08<2:30:20, 55.68s/it]
{'loss': 1.6113, 'learning_rate': 3.4189145276196244e-06, 'epoch': 5.9} |
|
74%|ββββββββ | 454/616 [7:09:08<2:30:20, 55.68s/it]
74%|ββββββββ | 455/616 [7:10:03<2:28:44, 55.43s/it]
{'loss': 1.6025, 'learning_rate': 3.3793847665437674e-06, 'epoch': 5.91} |
|
74%|ββββββββ | 455/616 [7:10:03<2:28:44, 55.43s/it]
74%|ββββββββ | 456/616 [7:10:58<2:27:26, 55.29s/it]
{'loss': 1.6191, 'learning_rate': 3.340038341754189e-06, 'epoch': 5.92} |
|
74%|ββββββββ | 456/616 [7:10:58<2:27:26, 55.29s/it]
74%|ββββββββ | 457/616 [7:11:53<2:26:08, 55.15s/it]
{'loss': 1.604, 'learning_rate': 3.300876342821451e-06, 'epoch': 5.94} |
|
74%|ββββββββ | 457/616 [7:11:53<2:26:08, 55.15s/it]
74%|ββββββββ | 458/616 [7:12:48<2:25:24, 55.22s/it]
{'loss': 1.6274, 'learning_rate': 3.2618998542090263e-06, 'epoch': 5.95} |
|
74%|ββββββββ | 458/616 [7:12:48<2:25:24, 55.22s/it]
75%|ββββββββ | 459/616 [7:13:44<2:24:50, 55.36s/it]
{'loss': 1.6543, 'learning_rate': 3.2231099552433e-06, 'epoch': 5.96} |
|
75%|ββββββββ | 459/616 [7:13:44<2:24:50, 55.36s/it]
75%|ββββββββ | 460/616 [7:14:40<2:24:28, 55.57s/it]
{'loss': 1.6265, 'learning_rate': 3.1845077200836638e-06, 'epoch': 5.97} |
|
75%|ββββββββ | 460/616 [7:14:40<2:24:28, 55.57s/it]
75%|ββββββββ | 461/616 [7:15:35<2:23:23, 55.51s/it]
{'loss': 1.6123, 'learning_rate': 3.1460942176927666e-06, 'epoch': 5.99} |
|
75%|ββββββββ | 461/616 [7:15:35<2:23:23, 55.51s/it]
75%|ββββββββ | 462/616 [7:16:31<2:22:50, 55.66s/it]
{'loss': 1.6401, 'learning_rate': 3.107870511806934e-06, 'epoch': 6.0} |
|
75%|ββββββββ | 462/616 [7:16:31<2:22:50, 55.66s/it]
75%|ββββββββ | 463/616 [7:17:59<2:46:44, 65.39s/it]
{'loss': 1.6094, 'learning_rate': 3.0698376609066828e-06, 'epoch': 6.01} |
|
75%|ββββββββ | 463/616 [7:17:59<2:46:44, 65.39s/it]
75%|ββββββββ | 464/616 [7:18:54<2:37:50, 62.31s/it]
{'loss': 1.5859, 'learning_rate': 3.0319967181874366e-06, 'epoch': 6.03} |
|
75%|ββββββββ | 464/616 [7:18:54<2:37:50, 62.31s/it]
75%|ββββββββ | 465/616 [7:19:49<2:31:01, 60.01s/it]
{'loss': 1.6182, 'learning_rate': 2.9943487315303486e-06, 'epoch': 6.04} |
|
75%|ββββββββ | 465/616 [7:19:49<2:31:01, 60.01s/it]
76%|ββββββββ | 466/616 [7:20:44<2:26:06, 58.44s/it]
{'loss': 1.6196, 'learning_rate': 2.9568947434732777e-06, 'epoch': 6.05} |
|
76%|ββββββββ | 466/616 [7:20:44<2:26:06, 58.44s/it]
76%|ββββββββ | 467/616 [7:21:39<2:22:56, 57.56s/it]
{'loss': 1.6367, 'learning_rate': 2.919635791181934e-06, 'epoch': 6.06} |
|
76%|ββββββββ | 467/616 [7:21:39<2:22:56, 57.56s/it]
76%|ββββββββ | 468/616 [7:22:34<2:19:59, 56.76s/it]
{'loss': 1.7124, 'learning_rate': 2.882572906421145e-06, 'epoch': 6.08} |
|
76%|ββββββββ | 468/616 [7:22:34<2:19:59, 56.76s/it]
76%|ββββββββ | 469/616 [7:23:29<2:17:48, 56.25s/it]
{'loss': 1.623, 'learning_rate': 2.8457071155262885e-06, 'epoch': 6.09} |
|
76%|ββββββββ | 469/616 [7:23:29<2:17:48, 56.25s/it]
76%|ββββββββ | 470/616 [7:24:25<2:16:44, 56.19s/it]
{'loss': 1.5874, 'learning_rate': 2.809039439374878e-06, 'epoch': 6.1} |
|
76%|ββββββββ | 470/616 [7:24:25<2:16:44, 56.19s/it]
76%|ββββββββ | 471/616 [7:25:22<2:16:04, 56.31s/it]
{'loss': 1.6362, 'learning_rate': 2.7725708933582785e-06, 'epoch': 6.12} |
|
76%|ββββββββ | 471/616 [7:25:22<2:16:04, 56.31s/it]
77%|ββββββββ | 472/616 [7:26:17<2:14:30, 56.05s/it]
{'loss': 1.6221, 'learning_rate': 2.7363024873536093e-06, 'epoch': 6.13} |
|
77%|ββββββββ | 472/616 [7:26:17<2:14:30, 56.05s/it]
77%|ββββββββ | 473/616 [7:27:13<2:13:28, 56.01s/it]
{'loss': 1.6416, 'learning_rate': 2.700235225695752e-06, 'epoch': 6.14} |
|
77%|ββββββββ | 473/616 [7:27:13<2:13:28, 56.01s/it]
77%|ββββββββ | 474/616 [7:28:08<2:11:56, 55.75s/it]
{'loss': 1.668, 'learning_rate': 2.6643701071495644e-06, 'epoch': 6.16} |
|
77%|ββββββββ | 474/616 [7:28:08<2:11:56, 55.75s/it]
77%|ββββββββ | 475/616 [7:29:04<2:10:54, 55.70s/it]
{'loss': 1.5928, 'learning_rate': 2.628708124882212e-06, 'epoch': 6.17} |
|
77%|ββββββββ | 475/616 [7:29:04<2:10:54, 55.70s/it]
77%|ββββββββ | 476/616 [7:30:00<2:09:59, 55.71s/it]
{'loss': 1.6172, 'learning_rate': 2.5932502664356553e-06, 'epoch': 6.18} |
|
77%|ββββββββ | 476/616 [7:30:00<2:09:59, 55.71s/it]
77%|ββββββββ | 477/616 [7:30:56<2:09:20, 55.83s/it]
{'loss': 1.6162, 'learning_rate': 2.5579975136993253e-06, 'epoch': 6.19} |
|
77%|ββββββββ | 477/616 [7:30:56<2:09:20, 55.83s/it]
78%|ββββββββ | 478/616 [7:31:51<2:07:52, 55.60s/it]
{'loss': 1.6636, 'learning_rate': 2.52295084288291e-06, 'epoch': 6.21} |
|
78%|ββββββββ | 478/616 [7:31:51<2:07:52, 55.60s/it]
78%|ββββββββ | 479/616 [7:32:47<2:07:11, 55.70s/it]
{'loss': 1.6748, 'learning_rate': 2.4881112244893403e-06, 'epoch': 6.22} |
|
78%|ββββββββ | 479/616 [7:32:47<2:07:11, 55.70s/it]
78%|ββββββββ | 480/616 [7:33:42<2:06:17, 55.71s/it]
{'loss': 1.6167, 'learning_rate': 2.453479623287909e-06, 'epoch': 6.23} |
|
78%|ββββββββ | 480/616 [7:33:42<2:06:17, 55.71s/it]
78%|ββββββββ | 481/616 [7:34:39<2:05:45, 55.90s/it]
{'loss': 1.6763, 'learning_rate': 2.419056998287547e-06, 'epoch': 6.25} |
|
78%|ββββββββ | 481/616 [7:34:39<2:05:45, 55.90s/it]
78%|ββββββββ | 482/616 [7:35:36<2:05:33, 56.22s/it]
{'loss': 1.6587, 'learning_rate': 2.3848443027102706e-06, 'epoch': 6.26} |
|
78%|ββββββββ | 482/616 [7:35:36<2:05:33, 56.22s/it]
78%|ββββββββ | 483/616 [7:36:31<2:03:57, 55.92s/it]
{'loss': 1.6538, 'learning_rate': 2.3508424839647994e-06, 'epoch': 6.27} |
|
78%|ββββββββ | 483/616 [7:36:31<2:03:57, 55.92s/it]
79%|ββββββββ | 484/616 [7:37:27<2:03:00, 55.91s/it]
{'loss': 1.5952, 'learning_rate': 2.3170524836202936e-06, 'epoch': 6.29} |
|
79%|ββββββββ | 484/616 [7:37:27<2:03:00, 55.91s/it]
79%|ββββββββ | 485/616 [7:38:24<2:02:39, 56.18s/it]
{'loss': 1.6348, 'learning_rate': 2.2834752373803094e-06, 'epoch': 6.3} |
|
79%|ββββββββ | 485/616 [7:38:24<2:02:39, 56.18s/it]
79%|ββββββββ | 486/616 [7:39:20<2:01:47, 56.21s/it]
{'loss': 1.6074, 'learning_rate': 2.250111675056863e-06, 'epoch': 6.31} |
|
79%|ββββββββ | 486/616 [7:39:20<2:01:47, 56.21s/it]
79%|ββββββββ | 487/616 [7:40:17<2:01:07, 56.34s/it]
{'loss': 1.6284, 'learning_rate': 2.216962720544703e-06, 'epoch': 6.32} |
|
79%|ββββββββ | 487/616 [7:40:17<2:01:07, 56.34s/it]
79%|ββββββββ | 488/616 [7:41:12<1:59:49, 56.16s/it]
{'loss': 1.6143, 'learning_rate': 2.184029291795705e-06, 'epoch': 6.34} |
|
79%|ββββββββ | 488/616 [7:41:12<1:59:49, 56.16s/it]
79%|ββββββββ | 489/616 [7:42:08<1:58:41, 56.07s/it]
{'loss': 1.6323, 'learning_rate': 2.151312300793473e-06, 'epoch': 6.35} |
|
79%|ββββββββ | 489/616 [7:42:08<1:58:41, 56.07s/it]
80%|ββββββββ | 490/616 [7:43:04<1:57:44, 56.07s/it]
{'loss': 1.6387, 'learning_rate': 2.118812653528077e-06, 'epoch': 6.36} |
|
80%|ββββββββ | 490/616 [7:43:04<1:57:44, 56.07s/it]
80%|ββββββββ | 491/616 [7:43:59<1:56:07, 55.74s/it]
{'loss': 1.6016, 'learning_rate': 2.086531249970952e-06, 'epoch': 6.38} |
|
80%|ββββββββ | 491/616 [7:43:59<1:56:07, 55.74s/it]
80%|ββββββββ | 492/616 [7:44:56<1:55:33, 55.91s/it]
{'loss': 1.6616, 'learning_rate': 2.0544689840499988e-06, 'epoch': 6.39} |
|
80%|ββββββββ | 492/616 [7:44:56<1:55:33, 55.91s/it]
80%|ββββββββ | 493/616 [7:45:51<1:54:35, 55.90s/it]
{'loss': 1.6211, 'learning_rate': 2.022626743624807e-06, 'epoch': 6.4} |
|
80%|ββββββββ | 493/616 [7:45:51<1:54:35, 55.90s/it]
80%|ββββββββ | 494/616 [7:46:47<1:53:29, 55.82s/it]
{'loss': 1.6504, 'learning_rate': 1.991005410462089e-06, 'epoch': 6.42} |
|
80%|ββββββββ | 494/616 [7:46:47<1:53:29, 55.82s/it]
80%|ββββββββ | 495/616 [7:47:44<1:52:57, 56.01s/it]
{'loss': 1.6748, 'learning_rate': 1.9596058602112533e-06, 'epoch': 6.43} |
|
80%|ββββββββ | 495/616 [7:47:44<1:52:57, 56.01s/it]
81%|ββββββββ | 496/616 [7:48:41<1:53:07, 56.56s/it]
{'loss': 1.6597, 'learning_rate': 1.928428962380148e-06, 'epoch': 6.44} |
|
81%|ββββββββ | 496/616 [7:48:41<1:53:07, 56.56s/it]
81%|ββββββββ | 497/616 [7:49:37<1:51:38, 56.29s/it]
{'loss': 1.6133, 'learning_rate': 1.8974755803109968e-06, 'epoch': 6.45} |
|
81%|ββββββββ | 497/616 [7:49:37<1:51:38, 56.29s/it]
81%|ββββββββ | 498/616 [7:50:33<1:50:23, 56.14s/it]
{'loss': 1.6294, 'learning_rate': 1.866746571156479e-06, 'epoch': 6.47} |
|
81%|ββββββββ | 498/616 [7:50:33<1:50:23, 56.14s/it]
81%|ββββββββ | 499/616 [7:51:29<1:49:28, 56.14s/it]
{'loss': 1.6074, 'learning_rate': 1.8362427858560094e-06, 'epoch': 6.48} |
|
81%|ββββββββ | 499/616 [7:51:29<1:49:28, 56.14s/it]
81%|ββββββββ | 500/616 [7:52:25<1:48:36, 56.18s/it]
{'loss': 1.645, 'learning_rate': 1.8059650691121611e-06, 'epoch': 6.49} |
|
81%|ββββββββ | 500/616 [7:52:25<1:48:36, 56.18s/it]
81%|βββββββββ | 501/616 [7:54:18<2:20:08, 73.11s/it]
{'loss': 1.5884, 'learning_rate': 1.7759142593672707e-06, 'epoch': 6.51} |
|
81%|βββββββββ | 501/616 [7:54:18<2:20:08, 73.11s/it]
81%|βββββββββ | 502/616 [7:55:13<2:08:26, 67.60s/it]
{'loss': 1.6245, 'learning_rate': 1.74609118878024e-06, 'epoch': 6.52} |
|
81%|βββββββββ | 502/616 [7:55:13<2:08:26, 67.60s/it]
82%|βββββββββ | 503/616 [7:56:08<2:00:33, 64.02s/it]
{'loss': 1.6309, 'learning_rate': 1.7164966832034668e-06, 'epoch': 6.53} |
|
82%|βββββββββ | 503/616 [7:56:08<2:00:33, 64.02s/it]
82%|βββββββββ | 504/616 [7:57:05<1:55:25, 61.83s/it]
{'loss': 1.6035, 'learning_rate': 1.6871315621599982e-06, 'epoch': 6.55} |
|
82%|βββββββββ | 504/616 [7:57:05<1:55:25, 61.83s/it]
82%|βββββββββ | 505/616 [7:58:01<1:51:20, 60.18s/it]
{'loss': 1.5688, 'learning_rate': 1.6579966388208257e-06, 'epoch': 6.56} |
|
82%|βββββββββ | 505/616 [7:58:01<1:51:20, 60.18s/it]
82%|βββββββββ | 506/616 [7:58:57<1:47:46, 58.79s/it]
{'loss': 1.5762, 'learning_rate': 1.6290927199823604e-06, 'epoch': 6.57} |
|
82%|βββββββββ | 506/616 [7:58:57<1:47:46, 58.79s/it]
82%|βββββββββ | 507/616 [7:59:52<1:44:51, 57.72s/it]
{'loss': 1.6323, 'learning_rate': 1.6004206060441096e-06, 'epoch': 6.58} |
|
82%|βββββββββ | 507/616 [7:59:52<1:44:51, 57.72s/it]
82%|βββββββββ | 508/616 [8:00:48<1:42:45, 57.09s/it]
{'loss': 1.5884, 'learning_rate': 1.5719810909864941e-06, 'epoch': 6.6} |
|
82%|βββββββββ | 508/616 [8:00:48<1:42:45, 57.09s/it]
83%|βββββββββ | 509/616 [8:01:44<1:41:20, 56.83s/it]
{'loss': 1.6382, 'learning_rate': 1.543774962348874e-06, 'epoch': 6.61} |
|
83%|βββββββββ | 509/616 [8:01:44<1:41:20, 56.83s/it]
83%|βββββββββ | 510/616 [8:02:40<1:39:51, 56.52s/it]
{'loss': 1.6279, 'learning_rate': 1.5158030012077329e-06, 'epoch': 6.62} |
|
83%|βββββββββ | 510/616 [8:02:40<1:39:51, 56.52s/it]
83%|βββββββββ | 511/616 [8:03:37<1:39:08, 56.65s/it]
{'loss': 1.6304, 'learning_rate': 1.4880659821550547e-06, 'epoch': 6.64} |
|
83%|βββββββββ | 511/616 [8:03:37<1:39:08, 56.65s/it]
83%|βββββββββ | 512/616 [8:04:32<1:37:24, 56.20s/it]
{'loss': 1.6289, 'learning_rate': 1.4605646732768685e-06, 'epoch': 6.65} |
|
83%|βββββββββ | 512/616 [8:04:32<1:37:24, 56.20s/it]
83%|βββββββββ | 513/616 [8:05:27<1:36:08, 56.00s/it]
{'loss': 1.5889, 'learning_rate': 1.4332998361319783e-06, 'epoch': 6.66} |
|
83%|βββββββββ | 513/616 [8:05:27<1:36:08, 56.00s/it]
83%|βββββββββ | 514/616 [8:06:24<1:35:25, 56.13s/it]
{'loss': 1.6221, 'learning_rate': 1.4062722257308803e-06, 'epoch': 6.68} |
|
83%|βββββββββ | 514/616 [8:06:24<1:35:25, 56.13s/it]
84%|βββββββββ | 515/616 [8:07:20<1:34:25, 56.09s/it]
{'loss': 1.604, 'learning_rate': 1.3794825905148557e-06, 'epoch': 6.69} |
|
84%|βββββββββ | 515/616 [8:07:20<1:34:25, 56.09s/it]
84%|βββββββββ | 516/616 [8:08:15<1:33:07, 55.88s/it]
{'loss': 1.6099, 'learning_rate': 1.3529316723352303e-06, 'epoch': 6.7} |
|
84%|βββββββββ | 516/616 [8:08:15<1:33:07, 55.88s/it]
84%|βββββββββ | 517/616 [8:09:11<1:32:09, 55.85s/it]
{'loss': 1.6045, 'learning_rate': 1.3266202064328548e-06, 'epoch': 6.71} |
|
84%|βββββββββ | 517/616 [8:09:11<1:32:09, 55.85s/it]
84%|βββββββββ | 518/616 [8:10:06<1:31:00, 55.72s/it]
{'loss': 1.6289, 'learning_rate': 1.3005489214177213e-06, 'epoch': 6.73} |
|
84%|βββββββββ | 518/616 [8:10:06<1:31:00, 55.72s/it]
84%|βββββββββ | 519/616 [8:11:03<1:30:18, 55.86s/it]
{'loss': 1.6519, 'learning_rate': 1.2747185392488048e-06, 'epoch': 6.74} |
|
84%|βββββββββ | 519/616 [8:11:03<1:30:18, 55.86s/it]
84%|βββββββββ | 520/616 [8:11:57<1:28:46, 55.48s/it]
{'loss': 1.6338, 'learning_rate': 1.249129775214064e-06, 'epoch': 6.75} |
|
84%|βββββββββ | 520/616 [8:11:57<1:28:46, 55.48s/it]
85%|βββββββββ | 521/616 [8:12:53<1:28:06, 55.64s/it]
{'loss': 1.6196, 'learning_rate': 1.2237833379106257e-06, 'epoch': 6.77} |
|
85%|βββββββββ | 521/616 [8:12:53<1:28:06, 55.64s/it]
85%|βββββββββ | 522/616 [8:13:48<1:26:42, 55.35s/it]
{'loss': 1.6104, 'learning_rate': 1.1986799292251816e-06, 'epoch': 6.78} |
|
85%|βββββββββ | 522/616 [8:13:48<1:26:42, 55.35s/it]
85%|βββββββββ | 523/616 [8:14:44<1:26:06, 55.56s/it]
{'loss': 1.6309, 'learning_rate': 1.1738202443145307e-06, 'epoch': 6.79} |
|
85%|βββββββββ | 523/616 [8:14:44<1:26:06, 55.56s/it]
85%|βββββββββ | 524/616 [8:15:40<1:25:39, 55.86s/it]
{'loss': 1.5845, 'learning_rate': 1.1492049715863464e-06, 'epoch': 6.81} |
|
85%|βββββββββ | 524/616 [8:15:40<1:25:39, 55.86s/it]
85%|βββββββββ | 525/616 [8:16:35<1:24:11, 55.52s/it]
{'loss': 1.582, 'learning_rate': 1.1248347926801029e-06, 'epoch': 6.82} |
|
85%|βββββββββ | 525/616 [8:16:35<1:24:11, 55.52s/it]
85%|βββββββββ | 526/616 [8:17:31<1:23:34, 55.71s/it]
{'loss': 1.6553, 'learning_rate': 1.100710382448198e-06, 'epoch': 6.83} |
|
85%|βββββββββ | 526/616 [8:17:31<1:23:34, 55.71s/it]
86%|βββββββββ | 527/616 [8:18:27<1:22:36, 55.69s/it]
{'loss': 1.5771, 'learning_rate': 1.0768324089372816e-06, 'epoch': 6.84} |
|
86%|βββββββββ | 527/616 [8:18:27<1:22:36, 55.69s/it]
86%|βββββββββ | 528/616 [8:19:22<1:21:27, 55.54s/it]
{'loss': 1.6611, 'learning_rate': 1.053201533369731e-06, 'epoch': 6.86} |
|
86%|βββββββββ | 528/616 [8:19:22<1:21:27, 55.54s/it]
86%|βββββββββ | 529/616 [8:20:18<1:20:36, 55.59s/it]
{'loss': 1.6128, 'learning_rate': 1.029818410125365e-06, 'epoch': 6.87} |
|
86%|βββββββββ | 529/616 [8:20:18<1:20:36, 55.59s/it]
86%|βββββββββ | 530/616 [8:21:14<1:19:49, 55.69s/it]
{'loss': 1.5957, 'learning_rate': 1.0066836867233087e-06, 'epoch': 6.88} |
|
86%|βββββββββ | 530/616 [8:21:14<1:19:49, 55.69s/it]
86%|βββββββββ | 531/616 [8:22:10<1:19:18, 55.98s/it]
{'loss': 1.6299, 'learning_rate': 9.837980038040607e-07, 'epoch': 6.9} |
|
86%|βββββββββ | 531/616 [8:22:10<1:19:18, 55.98s/it]
86%|βββββββββ | 532/616 [8:23:07<1:18:41, 56.21s/it]
{'loss': 1.6147, 'learning_rate': 9.611619951117657e-07, 'epoch': 6.91} |
|
86%|βββββββββ | 532/616 [8:23:07<1:18:41, 56.21s/it]
87%|βββββββββ | 533/616 [8:24:03<1:17:41, 56.16s/it]
{'loss': 1.5864, 'learning_rate': 9.387762874766515e-07, 'epoch': 6.92} |
|
87%|βββββββββ | 533/616 [8:24:03<1:17:41, 56.16s/it]
87%|βββββββββ | 534/616 [8:24:59<1:16:26, 55.94s/it]
{'loss': 1.6245, 'learning_rate': 9.166415007976803e-07, 'epoch': 6.94} |
|
87%|βββββββββ | 534/616 [8:24:59<1:16:26, 55.94s/it]
87%|βββββββββ | 535/616 [8:25:55<1:15:36, 56.01s/it]
{'loss': 1.5781, 'learning_rate': 8.94758248025378e-07, 'epoch': 6.95} |
|
87%|βββββββββ | 535/616 [8:25:55<1:15:36, 56.01s/it]
87%|βββββββββ | 536/616 [8:26:51<1:14:55, 56.19s/it]
{'loss': 1.5845, 'learning_rate': 8.7312713514486e-07, 'epoch': 6.96} |
|
87%|βββββββββ | 536/616 [8:26:51<1:14:55, 56.19s/it]
87%|βββββββββ | 537/616 [8:27:46<1:13:23, 55.74s/it]
{'loss': 1.624, 'learning_rate': 8.517487611590558e-07, 'epoch': 6.97} |
|
87%|βββββββββ | 537/616 [8:27:46<1:13:23, 55.74s/it]
87%|βββββββββ | 538/616 [8:28:41<1:12:04, 55.45s/it]
{'loss': 1.5811, 'learning_rate': 8.306237180721121e-07, 'epoch': 6.99} |
|
87%|βββββββββ | 538/616 [8:28:41<1:12:04, 55.45s/it]
88%|βββββββββ | 539/616 [8:29:37<1:11:36, 55.80s/it]
{'loss': 1.5898, 'learning_rate': 8.097525908730108e-07, 'epoch': 7.0} |
|
88%|βββββββββ | 539/616 [8:29:38<1:11:36, 55.80s/it]
88%|βββββββββ | 540/616 [8:30:59<1:20:35, 63.62s/it]
{'loss': 1.5542, 'learning_rate': 7.891359575193613e-07, 'epoch': 7.01} |
|
88%|βββββββββ | 540/616 [8:30:59<1:20:35, 63.62s/it]
88%|βββββββββ | 541/616 [8:31:55<1:16:28, 61.19s/it]
{'loss': 1.6382, 'learning_rate': 7.687743889213939e-07, 'epoch': 7.03} |
|
88%|βββββββββ | 541/616 [8:31:55<1:16:28, 61.19s/it]
88%|βββββββββ | 542/616 [8:32:51<1:13:34, 59.65s/it]
{'loss': 1.6597, 'learning_rate': 7.486684489261609e-07, 'epoch': 7.04} |
|
88%|βββββββββ | 542/616 [8:32:51<1:13:34, 59.65s/it]
88%|βββββββββ | 543/616 [8:33:46<1:10:46, 58.18s/it]
{'loss': 1.5918, 'learning_rate': 7.288186943019171e-07, 'epoch': 7.05} |
|
88%|βββββββββ | 543/616 [8:33:46<1:10:46, 58.18s/it]
88%|βββββββββ | 544/616 [8:34:42<1:09:00, 57.51s/it]
{'loss': 1.6226, 'learning_rate': 7.092256747226944e-07, 'epoch': 7.06} |
|
88%|βββββββββ | 544/616 [8:34:42<1:09:00, 57.51s/it]
88%|βββββββββ | 545/616 [8:35:38<1:07:30, 57.04s/it]
{'loss': 1.563, 'learning_rate': 6.89889932753095e-07, 'epoch': 7.08} |
|
88%|βββββββββ | 545/616 [8:35:38<1:07:30, 57.04s/it]
89%|βββββββββ | 546/616 [8:36:33<1:06:06, 56.67s/it]
{'loss': 1.6348, 'learning_rate': 6.708120038332533e-07, 'epoch': 7.09} |
|
89%|βββββββββ | 546/616 [8:36:33<1:06:06, 56.67s/it]
89%|βββββββββ | 547/616 [8:37:29<1:04:41, 56.25s/it]
{'loss': 1.6089, 'learning_rate': 6.519924162640168e-07, 'epoch': 7.1} |
|
89%|βββββββββ | 547/616 [8:37:29<1:04:41, 56.25s/it]
89%|βββββββββ | 548/616 [8:38:25<1:03:42, 56.22s/it]
{'loss': 1.6143, 'learning_rate': 6.334316911923155e-07, 'epoch': 7.12} |
|
89%|βββββββββ | 548/616 [8:38:25<1:03:42, 56.22s/it]
89%|βββββββββ | 549/616 [8:39:21<1:02:51, 56.29s/it]
{'loss': 1.6396, 'learning_rate': 6.151303425967259e-07, 'epoch': 7.13} |
|
89%|βββββββββ | 549/616 [8:39:21<1:02:51, 56.29s/it]
89%|βββββββββ | 550/616 [8:40:18<1:01:58, 56.34s/it]
{'loss': 1.6387, 'learning_rate': 5.970888772732453e-07, 'epoch': 7.14} |
|
89%|βββββββββ | 550/616 [8:40:18<1:01:58, 56.34s/it]
89%|βββββββββ | 551/616 [8:41:13<1:00:45, 56.08s/it]
{'loss': 1.5835, 'learning_rate': 5.793077948212478e-07, 'epoch': 7.16} |
|
89%|βββββββββ | 551/616 [8:41:13<1:00:45, 56.08s/it]
90%|βββββββββ | 552/616 [8:42:09<59:35, 55.87s/it]
{'loss': 1.6489, 'learning_rate': 5.617875876296641e-07, 'epoch': 7.17} |
|
90%|βββββββββ | 552/616 [8:42:09<59:35, 55.87s/it]
90%|βββββββββ | 553/616 [8:43:05<58:43, 55.92s/it]
{'loss': 1.6318, 'learning_rate': 5.445287408633304e-07, 'epoch': 7.18} |
|
90%|βββββββββ | 553/616 [8:43:05<58:43, 55.92s/it]
90%|βββββββββ | 554/616 [8:44:00<57:42, 55.85s/it]
{'loss': 1.645, 'learning_rate': 5.27531732449561e-07, 'epoch': 7.19} |
|
90%|βββββββββ | 554/616 [8:44:00<57:42, 55.85s/it]
90%|βββββββββ | 555/616 [8:44:56<56:50, 55.91s/it]
{'loss': 1.5996, 'learning_rate': 5.107970330649204e-07, 'epoch': 7.21} |
|
90%|βββββββββ | 555/616 [8:44:56<56:50, 55.91s/it]
90%|βββββββββ | 556/616 [8:45:52<55:49, 55.83s/it]
{'loss': 1.5962, 'learning_rate': 4.943251061221721e-07, 'epoch': 7.22} |
|
90%|βββββββββ | 556/616 [8:45:52<55:49, 55.83s/it]
90%|βββββββββ | 557/616 [8:46:48<54:52, 55.81s/it]
{'loss': 1.6211, 'learning_rate': 4.78116407757464e-07, 'epoch': 7.23} |
|
90%|βββββββββ | 557/616 [8:46:48<54:52, 55.81s/it]
91%|βββββββββ | 558/616 [8:47:45<54:16, 56.15s/it]
{'loss': 1.6011, 'learning_rate': 4.6217138681769026e-07, 'epoch': 7.25} |
|
91%|βββββββββ | 558/616 [8:47:45<54:16, 56.15s/it]
91%|βββββββββ | 559/616 [8:48:40<53:13, 56.03s/it]
{'loss': 1.6392, 'learning_rate': 4.464904848480522e-07, 'epoch': 7.26} |
|
91%|βββββββββ | 559/616 [8:48:40<53:13, 56.03s/it]
91%|βββββββββ | 560/616 [8:49:38<52:36, 56.37s/it]
{'loss': 1.6265, 'learning_rate': 4.310741360798498e-07, 'epoch': 7.27} |
|
91%|βββββββββ | 560/616 [8:49:38<52:36, 56.37s/it]
91%|βββββββββ | 561/616 [8:50:33<51:24, 56.08s/it]
{'loss': 1.6255, 'learning_rate': 4.1592276741844075e-07, 'epoch': 7.29} |
|
91%|βββββββββ | 561/616 [8:50:33<51:24, 56.08s/it]
91%|βββββββββ | 562/616 [8:51:29<50:22, 55.97s/it]
{'loss': 1.6196, 'learning_rate': 4.0103679843142895e-07, 'epoch': 7.3} |
|
91%|βββββββββ | 562/616 [8:51:29<50:22, 55.97s/it]
91%|ββββββββββ| 563/616 [8:52:26<49:45, 56.33s/it]
{'loss': 1.6201, 'learning_rate': 3.864166413370429e-07, 'epoch': 7.31} |
|
91%|ββββββββββ| 563/616 [8:52:26<49:45, 56.33s/it]
92%|ββββββββββ| 564/616 [8:53:22<48:45, 56.25s/it]
{'loss': 1.6396, 'learning_rate': 3.720627009927158e-07, 'epoch': 7.32} |
|
92%|ββββββββββ| 564/616 [8:53:22<48:45, 56.25s/it]
92%|ββββββββββ| 565/616 [8:54:18<47:48, 56.25s/it]
{'loss': 1.6553, 'learning_rate': 3.5797537488388326e-07, 'epoch': 7.34} |
|
92%|ββββββββββ| 565/616 [8:54:18<47:48, 56.25s/it]
92%|ββββββββββ| 566/616 [8:55:15<46:58, 56.37s/it]
{'loss': 1.6431, 'learning_rate': 3.441550531129667e-07, 'epoch': 7.35} |
|
92%|ββββββββββ| 566/616 [8:55:15<46:58, 56.37s/it]
92%|ββββββββββ| 567/616 [8:56:10<45:50, 56.13s/it]
{'loss': 1.6362, 'learning_rate': 3.3060211838858104e-07, 'epoch': 7.36} |
|
92%|ββββββββββ| 567/616 [8:56:10<45:50, 56.13s/it]
92%|ββββββββββ| 568/616 [8:57:05<44:38, 55.80s/it]
{'loss': 1.5654, 'learning_rate': 3.1731694601492834e-07, 'epoch': 7.38} |
|
92%|ββββββββββ| 568/616 [8:57:05<44:38, 55.80s/it]
92%|ββββββββββ| 569/616 [8:58:01<43:33, 55.61s/it]
{'loss': 1.6074, 'learning_rate': 3.042999038814076e-07, 'epoch': 7.39} |
|
92%|ββββββββββ| 569/616 [8:58:01<43:33, 55.61s/it]
93%|ββββββββββ| 570/616 [8:58:56<42:35, 55.55s/it]
{'loss': 1.6094, 'learning_rate': 2.915513524524294e-07, 'epoch': 7.4} |
|
93%|ββββββββββ| 570/616 [8:58:56<42:35, 55.55s/it]
93%|ββββββββββ| 571/616 [8:59:52<41:47, 55.72s/it]
{'loss': 1.6758, 'learning_rate': 2.790716447574304e-07, 'epoch': 7.42} |
|
93%|ββββββββββ| 571/616 [8:59:52<41:47, 55.72s/it]
93%|ββββββββββ| 572/616 [9:00:49<41:08, 56.11s/it]
{'loss': 1.6313, 'learning_rate': 2.668611263811016e-07, 'epoch': 7.43} |
|
93%|ββββββββββ| 572/616 [9:00:49<41:08, 56.11s/it]
93%|ββββββββββ| 573/616 [9:01:45<40:05, 55.95s/it]
{'loss': 1.5835, 'learning_rate': 2.5492013545381666e-07, 'epoch': 7.44} |
|
93%|ββββββββββ| 573/616 [9:01:45<40:05, 55.95s/it]
93%|ββββββββββ| 574/616 [9:02:41<39:14, 56.06s/it]
{'loss': 1.5972, 'learning_rate': 2.4324900264226405e-07, 'epoch': 7.45} |
|
93%|ββββββββββ| 574/616 [9:02:41<39:14, 56.06s/it]
93%|ββββββββββ| 575/616 [9:03:37<38:16, 56.02s/it]
{'loss': 1.6689, 'learning_rate': 2.3184805114029872e-07, 'epoch': 7.47} |
|
93%|ββββββββββ| 575/616 [9:03:37<38:16, 56.02s/it]
94%|ββββββββββ| 576/616 [9:04:32<37:07, 55.68s/it]
{'loss': 1.6304, 'learning_rate': 2.2071759665998282e-07, 'epoch': 7.48} |
|
94%|ββββββββββ| 576/616 [9:04:32<37:07, 55.68s/it]
94%|ββββββββββ| 577/616 [9:05:27<36:06, 55.55s/it]
{'loss': 1.6035, 'learning_rate': 2.098579474228546e-07, 'epoch': 7.49} |
|
94%|ββββββββββ| 577/616 [9:05:27<36:06, 55.55s/it]
94%|ββββββββββ| 578/616 [9:06:23<35:13, 55.63s/it]
{'loss': 1.5952, 'learning_rate': 1.9926940415138206e-07, 'epoch': 7.51} |
|
94%|ββββββββββ| 578/616 [9:06:23<35:13, 55.63s/it]
94%|ββββββββββ| 579/616 [9:07:20<34:30, 55.97s/it]
{'loss': 1.584, 'learning_rate': 1.8895226006064084e-07, 'epoch': 7.52} |
|
94%|ββββββββββ| 579/616 [9:07:20<34:30, 55.97s/it]
94%|ββββββββββ| 580/616 [9:08:16<33:42, 56.18s/it]
{'loss': 1.6064, 'learning_rate': 1.7890680085019597e-07, 'epoch': 7.53} |
|
94%|ββββββββββ| 580/616 [9:08:16<33:42, 56.18s/it]
94%|ββββββββββ| 581/616 [9:09:13<32:50, 56.30s/it]
{'loss': 1.6235, 'learning_rate': 1.6913330469618628e-07, 'epoch': 7.55} |
|
94%|ββββββββββ| 581/616 [9:09:13<32:50, 56.30s/it]
94%|ββββββββββ| 582/616 [9:10:10<32:00, 56.48s/it]
{'loss': 1.6294, 'learning_rate': 1.5963204224362261e-07, 'epoch': 7.56} |
|
94%|ββββββββββ| 582/616 [9:10:10<32:00, 56.48s/it]
95%|ββββββββββ| 583/616 [9:11:06<30:57, 56.29s/it]
{'loss': 1.6055, 'learning_rate': 1.504032765988961e-07, 'epoch': 7.57} |
|
95%|ββββββββββ| 583/616 [9:11:06<30:57, 56.29s/it]
95%|ββββββββββ| 584/616 [9:12:02<29:59, 56.23s/it]
{'loss': 1.6353, 'learning_rate': 1.4144726332248726e-07, 'epoch': 7.58} |
|
95%|ββββββββββ| 584/616 [9:12:02<29:59, 56.23s/it]
95%|ββββββββββ| 585/616 [9:12:58<29:03, 56.25s/it]
{'loss': 1.6108, 'learning_rate': 1.327642504218951e-07, 'epoch': 7.6} |
|
95%|ββββββββββ| 585/616 [9:12:58<29:03, 56.25s/it]
95%|ββββββββββ| 586/616 [9:13:54<28:00, 56.00s/it]
{'loss': 1.6201, 'learning_rate': 1.2435447834476254e-07, 'epoch': 7.61} |
|
95%|ββββββββββ| 586/616 [9:13:54<28:00, 56.00s/it]
95%|ββββββββββ| 587/616 [9:14:50<27:10, 56.22s/it]
{'loss': 1.6128, 'learning_rate': 1.1621817997222507e-07, 'epoch': 7.62} |
|
95%|ββββββββββ| 587/616 [9:14:50<27:10, 56.22s/it]
95%|ββββββββββ| 588/616 [9:15:46<26:06, 55.95s/it]
{'loss': 1.6196, 'learning_rate': 1.0835558061245587e-07, 'epoch': 7.64} |
|
95%|ββββββββββ| 588/616 [9:15:46<26:06, 55.95s/it]
96%|ββββββββββ| 589/616 [9:16:42<25:16, 56.17s/it]
{'loss': 1.6621, 'learning_rate': 1.0076689799442874e-07, 'epoch': 7.65} |
|
96%|ββββββββββ| 589/616 [9:16:42<25:16, 56.17s/it]
96%|ββββββββββ| 590/616 [9:17:38<24:14, 55.95s/it]
{'loss': 1.6216, 'learning_rate': 9.34523422618916e-08, 'epoch': 7.66} |
|
96%|ββββββββββ| 590/616 [9:17:38<24:14, 55.95s/it]
96%|ββββββββββ| 591/616 [9:18:35<23:27, 56.31s/it]
{'loss': 1.6289, 'learning_rate': 8.641211596754129e-08, 'epoch': 7.68} |
|
96%|ββββββββββ| 591/616 [9:18:35<23:27, 56.31s/it]
96%|ββββββββββ| 592/616 [9:19:31<22:28, 56.19s/it]
{'loss': 1.6279, 'learning_rate': 7.964641406742135e-08, 'epoch': 7.69} |
|
96%|ββββββββββ| 592/616 [9:19:31<22:28, 56.19s/it]
96%|ββββββββββ| 593/616 [9:20:28<21:37, 56.42s/it]
{'loss': 1.6187, 'learning_rate': 7.315542391551966e-08, 'epoch': 7.7} |
|
96%|ββββββββββ| 593/616 [9:20:28<21:37, 56.42s/it]
96%|ββββββββββ| 594/616 [9:21:24<20:42, 56.46s/it]
{'loss': 1.6445, 'learning_rate': 6.693932525857927e-08, 'epoch': 7.71} |
|
96%|ββββββββββ| 594/616 [9:21:24<20:42, 56.46s/it]
97%|ββββββββββ| 595/616 [9:22:21<19:45, 56.45s/it]
{'loss': 1.6226, 'learning_rate': 6.099829023112236e-08, 'epoch': 7.73} |
|
97%|ββββββββββ| 595/616 [9:22:21<19:45, 56.45s/it]
97%|ββββββββββ| 596/616 [9:23:17<18:47, 56.35s/it]
{'loss': 1.6025, 'learning_rate': 5.533248335068409e-08, 'epoch': 7.74} |
|
97%|ββββββββββ| 596/616 [9:23:17<18:47, 56.35s/it]
97%|ββββββββββ| 597/616 [9:24:13<17:49, 56.29s/it]
{'loss': 1.5981, 'learning_rate': 4.994206151325509e-08, 'epoch': 7.75} |
|
97%|ββββββββββ| 597/616 [9:24:13<17:49, 56.29s/it]
97%|ββββββββββ| 598/616 [9:25:10<16:58, 56.58s/it]
{'loss': 1.6479, 'learning_rate': 4.482717398894165e-08, 'epoch': 7.77} |
|
97%|ββββββββββ| 598/616 [9:25:10<16:58, 56.58s/it]
97%|ββββββββββ| 599/616 [9:26:07<16:03, 56.66s/it]
{'loss': 1.6494, 'learning_rate': 3.998796241782232e-08, 'epoch': 7.78} |
|
97%|ββββββββββ| 599/616 [9:26:07<16:03, 56.66s/it]
97%|ββββββββββ| 600/616 [9:27:02<14:58, 56.13s/it]
{'loss': 1.6328, 'learning_rate': 3.5424560806036625e-08, 'epoch': 7.79} |
|
97%|ββββββββββ| 600/616 [9:27:02<14:58, 56.13s/it]
98%|ββββββββββ| 601/616 [9:28:54<18:13, 72.93s/it]
{'loss': 1.5732, 'learning_rate': 3.1137095522068006e-08, 'epoch': 7.81} |
|
98%|ββββββββββ| 601/616 [9:28:54<18:13, 72.93s/it]
98%|ββββββββββ| 602/616 [9:29:49<15:47, 67.67s/it]
{'loss': 1.6196, 'learning_rate': 2.7125685293245552e-08, 'epoch': 7.82} |
|
98%|ββββββββββ| 602/616 [9:29:49<15:47, 67.67s/it]
98%|ββββββββββ| 603/616 [9:30:46<13:56, 64.33s/it]
{'loss': 1.5894, 'learning_rate': 2.3390441202455484e-08, 'epoch': 7.83} |
|
98%|ββββββββββ| 603/616 [9:30:46<13:56, 64.33s/it]
98%|ββββββββββ| 604/616 [9:31:42<12:22, 61.86s/it]
{'loss': 1.6172, 'learning_rate': 1.993146668506585e-08, 'epoch': 7.84} |
|
98%|ββββββββββ| 604/616 [9:31:42<12:22, 61.86s/it]
98%|ββββββββββ| 605/616 [9:32:40<11:07, 60.66s/it]
{'loss': 1.604, 'learning_rate': 1.6748857526066588e-08, 'epoch': 7.86} |
|
98%|ββββββββββ| 605/616 [9:32:40<11:07, 60.66s/it]
98%|ββββββββββ| 606/616 [9:33:36<09:53, 59.31s/it]
{'loss': 1.6172, 'learning_rate': 1.3842701857406104e-08, 'epoch': 7.87} |
|
98%|ββββββββββ| 606/616 [9:33:36<09:53, 59.31s/it]
99%|ββββββββββ| 607/616 [9:34:32<08:43, 58.22s/it]
{'loss': 1.6377, 'learning_rate': 1.1213080155564327e-08, 'epoch': 7.88} |
|
99%|ββββββββββ| 607/616 [9:34:32<08:43, 58.22s/it]
99%|ββββββββββ| 608/616 [9:35:27<07:39, 57.40s/it]
{'loss': 1.6064, 'learning_rate': 8.860065239311155e-09, 'epoch': 7.9} |
|
99%|ββββββββββ| 608/616 [9:35:27<07:39, 57.40s/it]
99%|ββββββββββ| 609/616 [9:36:23<06:38, 56.99s/it]
{'loss': 1.6211, 'learning_rate': 6.783722267701409e-09, 'epoch': 7.91} |
|
99%|ββββββββββ| 609/616 [9:36:23<06:38, 56.99s/it]
99%|ββββββββββ| 610/616 [9:37:19<05:39, 56.62s/it]
{'loss': 1.6274, 'learning_rate': 4.984108738261828e-09, 'epoch': 7.92} |
|
99%|ββββββββββ| 610/616 [9:37:19<05:39, 56.62s/it]
99%|ββββββββββ| 611/616 [9:38:16<04:43, 56.63s/it]
{'loss': 1.6328, 'learning_rate': 3.4612744854045645e-09, 'epoch': 7.94} |
|
99%|ββββββββββ| 611/616 [9:38:16<04:43, 56.63s/it]
99%|ββββββββββ| 612/616 [9:39:11<03:45, 56.36s/it]
{'loss': 1.583, 'learning_rate': 2.215261679042735e-09, 'epoch': 7.95} |
|
99%|ββββββββββ| 612/616 [9:39:11<03:45, 56.36s/it]
100%|ββββββββββ| 613/616 [9:40:08<02:49, 56.35s/it]
{'loss': 1.6318, 'learning_rate': 1.246104823426908e-09, 'epoch': 7.96} |
|
100%|ββββββββββ| 613/616 [9:40:08<02:49, 56.35s/it]
100%|ββββββββββ| 614/616 [9:41:03<01:52, 56.01s/it]
{'loss': 1.6265, 'learning_rate': 5.538307561858691e-10, 'epoch': 7.97} |
|
100%|ββββββββββ| 614/616 [9:41:03<01:52, 56.01s/it]
100%|ββββββββββ| 615/616 [9:41:58<00:55, 55.78s/it]
{'loss': 1.605, 'learning_rate': 1.3845864758610384e-10, 'epoch': 7.99} |
|
100%|ββββββββββ| 615/616 [9:41:58<00:55, 55.78s/it]
100%|ββββββββββ| 616/616 [9:42:54<00:00, 55.92s/it]
{'loss': 1.6025, 'learning_rate': 0.0, 'epoch': 8.0} |
|
100%|ββββββββββ| 616/616 [9:42:54<00:00, 55.92s/it]
{'train_runtime': 34978.7912, 'train_samples_per_second': 2.252, 'train_steps_per_second': 0.018, 'train_loss': 2.115578391335227, 'epoch': 8.0} |
|
100%|ββββββββββ| 616/616 [9:42:54<00:00, 55.92s/it]
100%|ββββββββββ| 616/616 [9:42:54<00:00, 56.78s/it] |
|
Non lora weights: dict_keys(['base_model.model.model.mm_projector.weight', 'base_model.model.model.mm_projector.bias', 'base_model.model.model.frames_conv.weight', 'base_model.model.model.frames_conv.bias']) |
|
Non lora weights: dict_keys(['base_model.model.model.mm_projector.weight', 'base_model.model.model.mm_projector.bias', 'base_model.model.model.frames_conv.weight', 'base_model.model.model.frames_conv.bias']) |
|
wandb: Waiting for W&B process to finish... (success). |
|
[2023-10-13 12:46:18,400] [INFO] [launch.py:347:main] Process 1707 exits successfully. |
|
wandb: |
|
wandb: Run history: |
|
wandb: train/epoch βββββββββββββββββββββ
β
β
β
β
β
ββββββββββββββ |
|
wandb: train/global_step βββββββββββββββββββββ
β
β
β
β
β
ββββββββββββββ |
|
wandb: train/learning_rate ββββββββββββββββββ
β
β
β
βββββββββββββββββββ |
|
wandb: train/loss ββ
ββββββββββββββββββββββββββββββββββββββ |
|
wandb: train/total_flos β |
|
wandb: train/train_loss β |
|
wandb: train/train_runtime β |
|
wandb: train/train_samples_per_second β |
|
wandb: train/train_steps_per_second β |
|
wandb: |
|
wandb: Run summary: |
|
wandb: train/epoch 8.0 |
|
wandb: train/global_step 616 |
|
wandb: train/learning_rate 0.0 |
|
wandb: train/loss 1.6025 |
|
wandb: train/total_flos 1.5114021399418634e+18 |
|
wandb: train/train_loss 2.11558 |
|
wandb: train/train_runtime 34978.7912 |
|
wandb: train/train_samples_per_second 2.252 |
|
wandb: train/train_steps_per_second 0.018 |
|
wandb: |
|
wandb: π View run fiery-dew-9 at: https://wandb.ai/wanghao-cst/huggingface/runs/30lhy90r |
|
wandb: οΈβ‘ View job at https://wandb.ai/wanghao-cst/huggingface/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjEwNTk0Mjk1MA==/version_details/v2 |
|
wandb: Synced 5 W&B file(s), 0 media file(s), 2 artifact file(s) and 0 other file(s) |
|
wandb: Find logs at: ./wandb/run-20231013_030309-30lhy90r/logs |
|
[2023-10-13 12:46:56,444] [INFO] [launch.py:347:main] Process 1706 exits successfully. |
|
|