YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
# code
https://huggingface.co/vincentoh/llama3_70b_no_robot_fsdp_qlora
# model
wget "https://huggingface.co/vincentoh/llama3-70b-GGUF/blob/main/vincentoh/llama3-70b-GGUF"
# memory usage
Thu May 16 15:53:07 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 PCIe On | 00000000:08:00.0 Off | 0 |
| N/A 37C P0 76W / 350W | 40441MiB / 81559MiB | 24% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 17735 C ./main 40428MiB |
+---------------------------------------------------------------------------------------+
# token speed
<|begin_of_text|>Why is the sky blue? The sky is blue due to a phenomenon called Rayleigh scattering. This scattering refers to the scattering of electromagnetic radiation (light) by particles much smaller than the wavelength of the light. The short-wavelength blue light is scattered more than the other colors of visible light, resulting in more blue light reaching the observer than the other colors of light.<|end_of_text|> [end of text]
llama_print_timings: load time = 6244.37 ms
llama_print_timings: sample time = 4.39 ms / 69 runs ( 0.06 ms per token, 15710.38 tokens per second)
llama_print_timings: prompt eval time = 90.86 ms / 7 tokens ( 12.98 ms per token, 77.05 tokens per second)
llama_print_timings: eval time = 2334.73 ms / 68 runs ( 34.33 ms per token, 29.13 tokens per second)
llama_print_timings: total time = 2486.72 ms / 75 tokens
Log end