zzz99's picture
Training in progress, epoch 1
ca0cb2e verified
raw
history blame
32.9 kB
2024-02-08 18:45:21,177 INFO StreamThr :789 [internal.py:wandb_internal():86] W&B internal server running at pid: 789, started at: 2024-02-08 18:45:21.176939
2024-02-08 18:45:21,181 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: status
2024-02-08 18:45:21,182 INFO WriterThread:789 [datastore.py:open_for_write():85] open: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/run-kk66dbgv.wandb
2024-02-08 18:45:21,183 DEBUG SenderThread:789 [sender.py:send():382] send: header
2024-02-08 18:45:21,183 DEBUG SenderThread:789 [sender.py:send():382] send: run
2024-02-08 18:45:21,486 INFO SenderThread:789 [dir_watcher.py:__init__():211] watching files in: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files
2024-02-08 18:45:21,486 INFO SenderThread:789 [sender.py:_start_run_threads():1136] run started: kk66dbgv with start time 1707417921.176525
2024-02-08 18:45:21,490 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: check_version
2024-02-08 18:45:21,491 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: check_version
2024-02-08 18:45:21,534 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: run_start
2024-02-08 18:45:21,565 DEBUG HandlerThread:789 [system_info.py:__init__():32] System info init
2024-02-08 18:45:21,565 DEBUG HandlerThread:789 [system_info.py:__init__():47] System info init done
2024-02-08 18:45:21,565 INFO HandlerThread:789 [system_monitor.py:start():194] Starting system monitor
2024-02-08 18:45:21,565 INFO SystemMonitor:789 [system_monitor.py:_start():158] Starting system asset monitoring threads
2024-02-08 18:45:21,566 INFO HandlerThread:789 [system_monitor.py:probe():214] Collecting system info
2024-02-08 18:45:21,566 INFO SystemMonitor:789 [interfaces.py:start():190] Started cpu monitoring
2024-02-08 18:45:21,567 INFO SystemMonitor:789 [interfaces.py:start():190] Started disk monitoring
2024-02-08 18:45:21,568 INFO SystemMonitor:789 [interfaces.py:start():190] Started gpu monitoring
2024-02-08 18:45:21,568 INFO SystemMonitor:789 [interfaces.py:start():190] Started memory monitoring
2024-02-08 18:45:21,569 INFO SystemMonitor:789 [interfaces.py:start():190] Started network monitoring
2024-02-08 18:45:21,624 DEBUG HandlerThread:789 [system_info.py:probe():196] Probing system
2024-02-08 18:45:21,626 DEBUG HandlerThread:789 [gitlib.py:_init_repo():56] git repository is invalid
2024-02-08 18:45:21,626 DEBUG HandlerThread:789 [system_info.py:probe():244] Probing system done
2024-02-08 18:45:21,626 DEBUG HandlerThread:789 [system_monitor.py:probe():223] {'os': 'Linux-4.14.336-253.554.amzn2.x86_64-x86_64-with-glibc2.35', 'python': '3.10.13', 'heartbeatAt': '2024-02-08T18:45:21.624647', 'startedAt': '2024-02-08T18:45:21.172989', 'docker': None, 'cuda': None, 'args': (), 'state': 'running', 'program': '/home/sagemaker-user/output-7b-26k-lora/../lora_finetuning_push_to_hub_save_local_latest.py', 'codePathLocal': None, 'host': 'default', 'username': 'sagemaker-user', 'executable': '/opt/conda/bin/python3', 'cpu_count': 96, 'cpu_count_logical': 192, 'cpu_freq': {'current': 3165.433677083333, 'min': 0.0, 'max': 0.0}, 'cpu_freq_per_core': [{'current': 3299.529, 'min': 0.0, 'max': 0.0}, {'current': 3299.574, 'min': 0.0, 'max': 0.0}, {'current': 3299.907, 'min': 0.0, 'max': 0.0}, {'current': 3300.263, 'min': 0.0, 'max': 0.0}, {'current': 3300.897, 'min': 0.0, 'max': 0.0}, {'current': 3300.385, 'min': 0.0, 'max': 0.0}, {'current': 3298.81, 'min': 0.0, 'max': 0.0}, {'current': 3299.625, 'min': 0.0, 'max': 0.0}, {'current': 3299.926, 'min': 0.0, 'max': 0.0}, {'current': 3300.158, 'min': 0.0, 'max': 0.0}, {'current': 3300.475, 'min': 0.0, 'max': 0.0}, {'current': 3300.688, 'min': 0.0, 'max': 0.0}, {'current': 3295.206, 'min': 0.0, 'max': 0.0}, {'current': 3294.95, 'min': 0.0, 'max': 0.0}, {'current': 3296.334, 'min': 0.0, 'max': 0.0}, {'current': 3297.722, 'min': 0.0, 'max': 0.0}, {'current': 3296.096, 'min': 0.0, 'max': 0.0}, {'current': 3298.885, 'min': 0.0, 'max': 0.0}, {'current': 3297.66, 'min': 0.0, 'max': 0.0}, {'current': 3297.613, 'min': 0.0, 'max': 0.0}, {'current': 3300.244, 'min': 0.0, 'max': 0.0}, {'current': 3241.28, 'min': 0.0, 'max': 0.0}, {'current': 3298.967, 'min': 0.0, 'max': 0.0}, {'current': 3298.457, 'min': 0.0, 'max': 0.0}, {'current': 3298.049, 'min': 0.0, 'max': 0.0}, {'current': 3299.552, 'min': 0.0, 'max': 0.0}, {'current': 3299.807, 'min': 0.0, 'max': 0.0}, {'current': 3242.538, 'min': 0.0, 'max': 0.0}, {'current': 3299.129, 'min': 0.0, 'max': 0.0}, {'current': 3263.29, 'min': 0.0, 'max': 0.0}, {'current': 3298.421, 'min': 0.0, 'max': 0.0}, {'current': 3299.256, 'min': 0.0, 'max': 0.0}, {'current': 3298.723, 'min': 0.0, 'max': 0.0}, {'current': 3299.38, 'min': 0.0, 'max': 0.0}, {'current': 3299.22, 'min': 0.0, 'max': 0.0}, {'current': 3298.243, 'min': 0.0, 'max': 0.0}, {'current': 3259.228, 'min': 0.0, 'max': 0.0}, {'current': 3297.656, 'min': 0.0, 'max': 0.0}, {'current': 3299.572, 'min': 0.0, 'max': 0.0}, {'current': 3299.246, 'min': 0.0, 'max': 0.0}, {'current': 3299.507, 'min': 0.0, 'max': 0.0}, {'current': 3298.177, 'min': 0.0, 'max': 0.0}, {'current': 3299.762, 'min': 0.0, 'max': 0.0}, {'current': 3300.244, 'min': 0.0, 'max': 0.0}, {'current': 3299.764, 'min': 0.0, 'max': 0.0}, {'current': 3299.71, 'min': 0.0, 'max': 0.0}, {'current': 3299.323, 'min': 0.0, 'max': 0.0}, {'current': 3298.972, 'min': 0.0, 'max': 0.0}, {'current': 2825.298, 'min': 0.0, 'max': 0.0}, {'current': 3300.031, 'min': 0.0, 'max': 0.0}, {'current': 3299.524, 'min': 0.0, 'max': 0.0}, {'current': 3300.753, 'min': 0.0, 'max': 0.0}, {'current': 3300.281, 'min': 0.0, 'max': 0.0}, {'current': 3300.549, 'min': 0.0, 'max': 0.0}, {'current': 3299.256, 'min': 0.0, 'max': 0.0}, {'current': 3300.719, 'min': 0.0, 'max': 0.0}, {'current': 3299.975, 'min': 0.0, 'max': 0.0}, {'current': 3300.721, 'min': 0.0, 'max': 0.0}, {'current': 3300.6, 'min': 0.0, 'max': 0.0}, {'current': 3300.408, 'min': 0.0, 'max': 0.0}, {'current': 3299.691, 'min': 0.0, 'max': 0.0}, {'current': 3299.817, 'min': 0.0, 'max': 0.0}, {'current': 3044.848, 'min': 0.0, 'max': 0.0}, {'current': 3299.416, 'min': 0.0, 'max': 0.0}, {'current': 3300.089, 'min': 0.0, 'max': 0.0}, {'current': 3299.351, 'min': 0.0, 'max': 0.0}, {'current': 2807.753, 'min': 0.0, 'max': 0.0}, {'current': 2853.085, 'min': 0.0, 'max': 0.0}, {'current': 3299.456, 'min': 0.0, 'max': 0.0}, {'current': 3300.145, 'min': 0.0, 'max': 0.0}, {'current': 3299.532, 'min': 0.0, 'max': 0.0}, {'current': 3300.121, 'min': 0.0, 'max': 0.0}, {'current': 3298.716, 'min': 0.0, 'max': 0.0}, {'current': 2964.818, 'min': 0.0, 'max': 0.0}, {'current': 3299.325, 'min': 0.0, 'max': 0.0}, {'current': 3053.968, 'min': 0.0, 'max': 0.0}, {'current': 3027.575, 'min': 0.0, 'max': 0.0}, {'current': 3034.933, 'min': 0.0, 'max': 0.0}, {'current': 3046.955, 'min': 0.0, 'max': 0.0}, {'current': 3017.189, 'min': 0.0, 'max': 0.0}, {'current': 3052.512, 'min': 0.0, 'max': 0.0}, {'current': 3049.645, 'min': 0.0, 'max': 0.0}, {'current': 3056.957, 'min': 0.0, 'max': 0.0}, {'current': 3063.442, 'min': 0.0, 'max': 0.0}, {'current': 3026.186, 'min': 0.0, 'max': 0.0}, {'current': 3059.995, 'min': 0.0, 'max': 0.0}, {'current': 3058.868, 'min': 0.0, 'max': 0.0}, {'current': 3059.978, 'min': 0.0, 'max': 0.0}, {'current': 2639.982, 'min': 0.0, 'max': 0.0}, {'current': 3043.356, 'min': 0.0, 'max': 0.0}, {'current': 3032.312, 'min': 0.0, 'max': 0.0}, {'current': 3024.784, 'min': 0.0, 'max': 0.0}, {'current': 3309.767, 'min': 0.0, 'max': 0.0}, {'current': 3044.167, 'min': 0.0, 'max': 0.0}, {'current': 3074.821, 'min': 0.0, 'max': 0.0}, {'current': 2744.486, 'min': 0.0, 'max': 0.0}, {'current': 2948.546, 'min': 0.0, 'max': 0.0}, {'current': 3265.131, 'min': 0.0, 'max': 0.0}, {'current': 3260.141, 'min': 0.0, 'max': 0.0}, {'current': 3264.163, 'min': 0.0, 'max': 0.0}, {'current': 3299.133, 'min': 0.0, 'max': 0.0}, {'current': 3260.992, 'min': 0.0, 'max': 0.0}, {'current': 3299.601, 'min': 0.0, 'max': 0.0}, {'current': 3266.096, 'min': 0.0, 'max': 0.0}, {'current': 3299.245, 'min': 0.0, 'max': 0.0}, {'current': 3298.423, 'min': 0.0, 'max': 0.0}, {'current': 3262.508, 'min': 0.0, 'max': 0.0}, {'current': 3270.751, 'min': 0.0, 'max': 0.0}, {'current': 3265.57, 'min': 0.0, 'max': 0.0}, {'current': 3268.221, 'min': 0.0, 'max': 0.0}, {'current': 3262.709, 'min': 0.0, 'max': 0.0}, {'current': 3262.206, 'min': 0.0, 'max': 0.0}, {'current': 3270.565, 'min': 0.0, 'max': 0.0}, {'current': 3298.66, 'min': 0.0, 'max': 0.0}, {'current': 3271.159, 'min': 0.0, 'max': 0.0}, {'current': 3269.543, 'min': 0.0, 'max': 0.0}, {'current': 2891.532, 'min': 0.0, 'max': 0.0}, {'current': 3299.121, 'min': 0.0, 'max': 0.0}, {'current': 3267.57, 'min': 0.0, 'max': 0.0}, {'current': 3273.911, 'min': 0.0, 'max': 0.0}, {'current': 3271.579, 'min': 0.0, 'max': 0.0}, {'current': 3271.885, 'min': 0.0, 'max': 0.0}, {'current': 3269.181, 'min': 0.0, 'max': 0.0}, {'current': 3299.12, 'min': 0.0, 'max': 0.0}, {'current': 3272.274, 'min': 0.0, 'max': 0.0}, {'current': 3298.966, 'min': 0.0, 'max': 0.0}, {'current': 3298.849, 'min': 0.0, 'max': 0.0}, {'current': 3298.555, 'min': 0.0, 'max': 0.0}, {'current': 3298.44, 'min': 0.0, 'max': 0.0}, {'current': 3299.027, 'min': 0.0, 'max': 0.0}, {'current': 3299.417, 'min': 0.0, 'max': 0.0}, {'current': 3298.561, 'min': 0.0, 'max': 0.0}, {'current': 3298.684, 'min': 0.0, 'max': 0.0}, {'current': 3298.308, 'min': 0.0, 'max': 0.0}, {'current': 3299.07, 'min': 0.0, 'max': 0.0}, {'current': 3297.982, 'min': 0.0, 'max': 0.0}, {'current': 3298.738, 'min': 0.0, 'max': 0.0}, {'current': 3297.558, 'min': 0.0, 'max': 0.0}, {'current': 3297.74, 'min': 0.0, 'max': 0.0}, {'current': 3299.099, 'min': 0.0, 'max': 0.0}, {'current': 3299.072, 'min': 0.0, 'max': 0.0}, {'current': 3298.608, 'min': 0.0, 'max': 0.0}, {'current': 3299.045, 'min': 0.0, 'max': 0.0}, {'current': 3293.695, 'min': 0.0, 'max': 0.0}, {'current': 3299.228, 'min': 0.0, 'max': 0.0}, {'current': 3299.509, 'min': 0.0, 'max': 0.0}, {'current': 3298.722, 'min': 0.0, 'max': 0.0}, {'current': 3299.9, 'min': 0.0, 'max': 0.0}, {'current': 3299.551, 'min': 0.0, 'max': 0.0}, {'current': 3299.029, 'min': 0.0, 'max': 0.0}, {'current': 3299.307, 'min': 0.0, 'max': 0.0}, {'current': 3298.752, 'min': 0.0, 'max': 0.0}, {'current': 3299.526, 'min': 0.0, 'max': 0.0}, {'current': 3299.18, 'min': 0.0, 'max': 0.0}, {'current': 3299.048, 'min': 0.0, 'max': 0.0}, {'current': 3299.113, 'min': 0.0, 'max': 0.0}, {'current': 3299.319, 'min': 0.0, 'max': 0.0}, {'current': 3299.493, 'min': 0.0, 'max': 0.0}, {'current': 3299.269, 'min': 0.0, 'max': 0.0}, {'current': 3299.472, 'min': 0.0, 'max': 0.0}, {'current': 3299.484, 'min': 0.0, 'max': 0.0}, {'current': 3299.416, 'min': 0.0, 'max': 0.0}, {'current': 3299.596, 'min': 0.0, 'max': 0.0}, {'current': 3299.52, 'min': 0.0, 'max': 0.0}, {'current': 3298.897, 'min': 0.0, 'max': 0.0}, {'current': 3299.216, 'min': 0.0, 'max': 0.0}, {'current': 3299.001, 'min': 0.0, 'max': 0.0}, {'current': 3300.316, 'min': 0.0, 'max': 0.0}, {'current': 2995.097, 'min': 0.0, 'max': 0.0}, {'current': 2690.969, 'min': 0.0, 'max': 0.0}, {'current': 3300.22, 'min': 0.0, 'max': 0.0}, {'current': 3008.014, 'min': 0.0, 'max': 0.0}, {'current': 3299.622, 'min': 0.0, 'max': 0.0}, {'current': 2987.966, 'min': 0.0, 'max': 0.0}, {'current': 3021.177, 'min': 0.0, 'max': 0.0}, {'current': 3032.724, 'min': 0.0, 'max': 0.0}, {'current': 2997.024, 'min': 0.0, 'max': 0.0}, {'current': 3036.103, 'min': 0.0, 'max': 0.0}, {'current': 2998.071, 'min': 0.0, 'max': 0.0}, {'current': 3298.959, 'min': 0.0, 'max': 0.0}, {'current': 3043.183, 'min': 0.0, 'max': 0.0}, {'current': 3299.567, 'min': 0.0, 'max': 0.0}, {'current': 3027.171, 'min': 0.0, 'max': 0.0}, {'current': 2961.029, 'min': 0.0, 'max': 0.0}, {'current': 3059.873, 'min': 0.0, 'max': 0.0}, {'current': 3037.985, 'min': 0.0, 'max': 0.0}, {'current': 3009.778, 'min': 0.0, 'max': 0.0}, {'current': 3032.565, 'min': 0.0, 'max': 0.0}, {'current': 3272.763, 'min': 0.0, 'max': 0.0}, {'current': 3109.523, 'min': 0.0, 'max': 0.0}, {'current': 3299.902, 'min': 0.0, 'max': 0.0}, {'current': 3283.894, 'min': 0.0, 'max': 0.0}], 'disk': {'/': {'total': 32.0, 'used': 0.012481689453125}}, 'gpu': 'NVIDIA A10G', 'gpu_count': 8, 'gpu_devices': [{'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}], 'memory': {'total': 747.9597625732422}}
2024-02-08 18:45:21,626 INFO HandlerThread:789 [system_monitor.py:probe():224] Finished collecting system info
2024-02-08 18:45:21,626 INFO HandlerThread:789 [system_monitor.py:probe():227] Publishing system info
2024-02-08 18:45:21,626 DEBUG HandlerThread:789 [system_info.py:_save_pip():52] Saving list of pip packages installed into the current environment
2024-02-08 18:45:21,627 DEBUG HandlerThread:789 [system_info.py:_save_pip():68] Saving pip packages done
2024-02-08 18:45:21,627 DEBUG HandlerThread:789 [system_info.py:_save_conda():75] Saving list of conda packages installed into the current environment
2024-02-08 18:45:22,487 INFO Thread-12 :789 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/conda-environment.yaml
2024-02-08 18:45:22,487 INFO Thread-12 :789 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/requirements.txt
2024-02-08 18:45:35,889 DEBUG HandlerThread:789 [system_info.py:_save_conda():87] Saving conda packages done
2024-02-08 18:45:35,890 INFO HandlerThread:789 [system_monitor.py:probe():229] Finished publishing system info
2024-02-08 18:45:35,894 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 18:45:35,894 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: keepalive
2024-02-08 18:45:35,894 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 18:45:35,894 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: keepalive
2024-02-08 18:45:35,895 DEBUG SenderThread:789 [sender.py:send():382] send: files
2024-02-08 18:45:35,895 INFO SenderThread:789 [sender.py:_save_file():1392] saving file wandb-metadata.json with policy now
2024-02-08 18:45:35,899 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: stop_status
2024-02-08 18:45:35,900 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: stop_status
2024-02-08 18:45:35,906 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: internal_messages
2024-02-08 18:45:36,041 DEBUG SenderThread:789 [sender.py:send():382] send: telemetry
2024-02-08 18:45:36,042 DEBUG SenderThread:789 [sender.py:send():382] send: config
2024-02-08 18:45:36,042 DEBUG SenderThread:789 [sender.py:send():382] send: metric
2024-02-08 18:45:36,042 DEBUG SenderThread:789 [sender.py:send():382] send: telemetry
2024-02-08 18:45:36,042 DEBUG SenderThread:789 [sender.py:send():382] send: metric
2024-02-08 18:45:36,042 WARNING SenderThread:789 [sender.py:send_metric():1343] Seen metric with glob (shouldn't happen)
2024-02-08 18:45:36,244 INFO wandb-upload_0:789 [upload_job.py:push():131] Uploaded file /tmp/tmph6r9wm0rwandb/nitc481h-wandb-metadata.json
2024-02-08 18:45:36,488 INFO Thread-12 :789 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/conda-environment.yaml
2024-02-08 18:45:36,488 INFO Thread-12 :789 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/wandb-metadata.json
2024-02-08 18:45:36,488 INFO Thread-12 :789 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/output.log
2024-02-08 18:45:36,754 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 18:45:38,488 INFO Thread-12 :789 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/output.log
2024-02-08 18:45:38,842 DEBUG SenderThread:789 [sender.py:send():382] send: exit
2024-02-08 18:45:38,842 INFO SenderThread:789 [sender.py:send_exit():589] handling exit code: 1
2024-02-08 18:45:38,842 INFO SenderThread:789 [sender.py:send_exit():591] handling runtime: 17
2024-02-08 18:45:38,842 INFO SenderThread:789 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
2024-02-08 18:45:38,842 INFO SenderThread:789 [sender.py:send_exit():597] send defer
2024-02-08 18:45:38,843 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:38,843 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 0
2024-02-08 18:45:38,843 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:38,843 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 0
2024-02-08 18:45:38,843 INFO SenderThread:789 [sender.py:transition_state():617] send defer: 1
2024-02-08 18:45:38,843 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:38,843 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 1
2024-02-08 18:45:38,843 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:38,843 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 1
2024-02-08 18:45:38,843 INFO SenderThread:789 [sender.py:transition_state():617] send defer: 2
2024-02-08 18:45:38,843 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:38,843 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 2
2024-02-08 18:45:38,843 INFO HandlerThread:789 [system_monitor.py:finish():203] Stopping system monitor
2024-02-08 18:45:38,844 DEBUG SystemMonitor:789 [system_monitor.py:_start():172] Starting system metrics aggregation loop
2024-02-08 18:45:38,844 INFO HandlerThread:789 [interfaces.py:finish():202] Joined cpu monitor
2024-02-08 18:45:38,844 DEBUG SystemMonitor:789 [system_monitor.py:_start():179] Finished system metrics aggregation loop
2024-02-08 18:45:38,845 INFO HandlerThread:789 [interfaces.py:finish():202] Joined disk monitor
2024-02-08 18:45:38,845 DEBUG SystemMonitor:789 [system_monitor.py:_start():183] Publishing last batch of metrics
2024-02-08 18:45:38,884 INFO HandlerThread:789 [interfaces.py:finish():202] Joined gpu monitor
2024-02-08 18:45:38,884 INFO HandlerThread:789 [interfaces.py:finish():202] Joined memory monitor
2024-02-08 18:45:38,884 INFO HandlerThread:789 [interfaces.py:finish():202] Joined network monitor
2024-02-08 18:45:38,885 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:38,885 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 2
2024-02-08 18:45:38,885 INFO SenderThread:789 [sender.py:transition_state():617] send defer: 3
2024-02-08 18:45:38,885 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:38,885 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 3
2024-02-08 18:45:38,885 DEBUG SenderThread:789 [sender.py:send():382] send: stats
2024-02-08 18:45:38,886 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:38,886 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 3
2024-02-08 18:45:38,886 INFO SenderThread:789 [sender.py:transition_state():617] send defer: 4
2024-02-08 18:45:38,886 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:38,886 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 4
2024-02-08 18:45:38,886 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:38,886 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 4
2024-02-08 18:45:38,886 INFO SenderThread:789 [sender.py:transition_state():617] send defer: 5
2024-02-08 18:45:38,886 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:38,886 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 5
2024-02-08 18:45:38,887 DEBUG SenderThread:789 [sender.py:send():382] send: summary
2024-02-08 18:45:38,888 INFO SenderThread:789 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
2024-02-08 18:45:38,888 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:38,888 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 5
2024-02-08 18:45:38,888 INFO SenderThread:789 [sender.py:transition_state():617] send defer: 6
2024-02-08 18:45:38,888 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:38,888 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 6
2024-02-08 18:45:38,888 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:38,888 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 6
2024-02-08 18:45:38,893 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 18:45:39,019 INFO SenderThread:789 [sender.py:transition_state():617] send defer: 7
2024-02-08 18:45:39,019 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:39,019 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 7
2024-02-08 18:45:39,019 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:39,020 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 7
2024-02-08 18:45:39,489 INFO Thread-12 :789 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/config.yaml
2024-02-08 18:45:39,489 INFO Thread-12 :789 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/wandb-summary.json
2024-02-08 18:45:39,842 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 18:45:40,054 INFO SenderThread:789 [sender.py:transition_state():617] send defer: 8
2024-02-08 18:45:40,054 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 18:45:40,054 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:40,055 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 8
2024-02-08 18:45:40,055 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:40,055 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 8
2024-02-08 18:45:40,055 INFO SenderThread:789 [job_builder.py:build():298] Attempting to build job artifact
2024-02-08 18:45:40,056 INFO SenderThread:789 [job_builder.py:_get_source_type():439] no source found
2024-02-08 18:45:40,056 INFO SenderThread:789 [sender.py:transition_state():617] send defer: 9
2024-02-08 18:45:40,056 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:40,056 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 9
2024-02-08 18:45:40,056 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:40,057 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 9
2024-02-08 18:45:40,057 INFO SenderThread:789 [dir_watcher.py:finish():358] shutting down directory watcher
2024-02-08 18:45:40,489 INFO Thread-12 :789 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/output.log
2024-02-08 18:45:40,489 INFO SenderThread:789 [dir_watcher.py:finish():388] scan: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files
2024-02-08 18:45:40,490 INFO SenderThread:789 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/config.yaml config.yaml
2024-02-08 18:45:40,490 INFO SenderThread:789 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/requirements.txt requirements.txt
2024-02-08 18:45:40,490 INFO SenderThread:789 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/conda-environment.yaml conda-environment.yaml
2024-02-08 18:45:40,490 INFO SenderThread:789 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/wandb-metadata.json wandb-metadata.json
2024-02-08 18:45:40,490 INFO SenderThread:789 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/output.log output.log
2024-02-08 18:45:40,490 INFO SenderThread:789 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/wandb-summary.json wandb-summary.json
2024-02-08 18:45:40,493 INFO SenderThread:789 [sender.py:transition_state():617] send defer: 10
2024-02-08 18:45:40,493 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:40,493 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 10
2024-02-08 18:45:40,502 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:40,502 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 10
2024-02-08 18:45:40,502 INFO SenderThread:789 [file_pusher.py:finish():175] shutting down file pusher
2024-02-08 18:45:40,709 INFO wandb-upload_1:789 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/config.yaml
2024-02-08 18:45:40,784 INFO wandb-upload_2:789 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/conda-environment.yaml
2024-02-08 18:45:40,825 INFO wandb-upload_4:789 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/wandb-summary.json
2024-02-08 18:45:40,825 INFO wandb-upload_3:789 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/output.log
2024-02-08 18:45:40,843 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 18:45:40,843 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 18:45:40,852 INFO wandb-upload_0:789 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/files/requirements.txt
2024-02-08 18:45:41,053 INFO Thread-11 (_thread_body):789 [sender.py:transition_state():617] send defer: 11
2024-02-08 18:45:41,053 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:41,053 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 11
2024-02-08 18:45:41,053 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:41,053 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 11
2024-02-08 18:45:41,053 INFO SenderThread:789 [file_pusher.py:join():181] waiting for file pusher
2024-02-08 18:45:41,054 INFO SenderThread:789 [sender.py:transition_state():617] send defer: 12
2024-02-08 18:45:41,054 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:41,054 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 12
2024-02-08 18:45:41,054 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:41,054 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 12
2024-02-08 18:45:41,054 INFO SenderThread:789 [file_stream.py:finish():595] file stream finish called
2024-02-08 18:45:41,126 INFO SenderThread:789 [file_stream.py:finish():599] file stream finish is done
2024-02-08 18:45:41,126 INFO SenderThread:789 [sender.py:transition_state():617] send defer: 13
2024-02-08 18:45:41,126 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:41,126 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 13
2024-02-08 18:45:41,127 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:41,127 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 13
2024-02-08 18:45:41,127 INFO SenderThread:789 [sender.py:transition_state():617] send defer: 14
2024-02-08 18:45:41,127 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:45:41,127 INFO HandlerThread:789 [handler.py:handle_request_defer():172] handle defer: 14
2024-02-08 18:45:41,127 DEBUG SenderThread:789 [sender.py:send():382] send: final
2024-02-08 18:45:41,127 DEBUG SenderThread:789 [sender.py:send():382] send: footer
2024-02-08 18:45:41,127 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: defer
2024-02-08 18:45:41,127 INFO SenderThread:789 [sender.py:send_request_defer():613] handle sender defer: 14
2024-02-08 18:45:41,128 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 18:45:41,128 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 18:45:41,128 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 18:45:41,129 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 18:45:41,129 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: server_info
2024-02-08 18:45:41,129 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: get_summary
2024-02-08 18:45:41,130 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: server_info
2024-02-08 18:45:41,131 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: sampled_history
2024-02-08 18:45:41,132 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: internal_messages
2024-02-08 18:45:41,132 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: job_info
2024-02-08 18:45:41,196 DEBUG SenderThread:789 [sender.py:send_request():409] send_request: job_info
2024-02-08 18:45:41,197 INFO MainThread:789 [wandb_run.py:_footer_history_summary_info():3837] rendering history
2024-02-08 18:45:41,197 INFO MainThread:789 [wandb_run.py:_footer_history_summary_info():3869] rendering summary
2024-02-08 18:45:41,197 INFO MainThread:789 [wandb_run.py:_footer_sync_info():3796] logging synced files
2024-02-08 18:45:41,197 DEBUG HandlerThread:789 [handler.py:handle_request():146] handle_request: shutdown
2024-02-08 18:45:41,197 INFO HandlerThread:789 [handler.py:finish():866] shutting down handler
2024-02-08 18:45:42,132 INFO WriterThread:789 [datastore.py:close():294] close: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_184521-kk66dbgv/run-kk66dbgv.wandb
2024-02-08 18:45:42,197 INFO SenderThread:789 [sender.py:finish():1548] shutting down sender
2024-02-08 18:45:42,197 INFO SenderThread:789 [file_pusher.py:finish():175] shutting down file pusher
2024-02-08 18:45:42,197 INFO SenderThread:789 [file_pusher.py:join():181] waiting for file pusher