Iwan Kawrakow's picture

13

Iwan Kawrakow

ikawrakow

·

AI & ML interests

None yet

Recent Activity

replied to bartowski's post 2 months ago

Decided to try to check how many weights in a 70b F32 model would be squashed when converted to F16 (spoiler, it's shockingly few) The reason for this comparison is that it should represent the same percentage of squishing as bf16 to fp16 Had claude make me a script, using the new Reflection-70B, and these are the results: Total weights: 70553706496 Fully representable: 70530215524 Squashed: 23490972 Percentage squashed: 0.03% 0.03%!!!! A couple things to note, this uses a roundtrip of F32 -> F16 -> F32 and then torch.isclose to account for rounding errors that come up by the very nature of extremely accurate numbers, but it uses VERY small tolerances (rtol=1e-5, atol=1e-8) This is also examining EVERY weight that was stored at F32, and for most layers I was somewhere between 0% and 0.03% of weights being squashed, no major outliers. Overall, I feel even safer converting to F16 for llama.cpp, the extremely small number of weights that fall outside the range are likely so small that they don't actually play a role in the final output of the model at inference anyways.

replied to bartowski's post 2 months ago

Decided to try to check how many weights in a 70b F32 model would be squashed when converted to F16 (spoiler, it's shockingly few) The reason for this comparison is that it should represent the same percentage of squishing as bf16 to fp16 Had claude make me a script, using the new Reflection-70B, and these are the results: Total weights: 70553706496 Fully representable: 70530215524 Squashed: 23490972 Percentage squashed: 0.03% 0.03%!!!! A couple things to note, this uses a roundtrip of F32 -> F16 -> F32 and then torch.isclose to account for rounding errors that come up by the very nature of extremely accurate numbers, but it uses VERY small tolerances (rtol=1e-5, atol=1e-8) This is also examining EVERY weight that was stored at F32, and for most layers I was somewhere between 0% and 0.03% of weights being squashed, no major outliers. Overall, I feel even safer converting to F16 for llama.cpp, the extremely small number of weights that fall outside the range are likely so small that they don't actually play a role in the final output of the model at inference anyways.

replied to bartowski's post 2 months ago

Decided to try to check how many weights in a 70b F32 model would be squashed when converted to F16 (spoiler, it's shockingly few) The reason for this comparison is that it should represent the same percentage of squishing as bf16 to fp16 Had claude make me a script, using the new Reflection-70B, and these are the results: Total weights: 70553706496 Fully representable: 70530215524 Squashed: 23490972 Percentage squashed: 0.03% 0.03%!!!! A couple things to note, this uses a roundtrip of F32 -> F16 -> F32 and then torch.isclose to account for rounding errors that come up by the very nature of extremely accurate numbers, but it uses VERY small tolerances (rtol=1e-5, atol=1e-8) This is also examining EVERY weight that was stored at F32, and for most layers I was somewhere between 0% and 0.03% of weights being squashed, no major outliers. Overall, I feel even safer converting to F16 for llama.cpp, the extremely small number of weights that fall outside the range are likely so small that they don't actually play a role in the final output of the model at inference anyways.

View all activity

Organizations

None yet

models 9

ikawrakow/mixtral-instruct-8x7b-quantized-gguf

Updated Feb 1 • 94 • 22

ikawrakow/mixtral-8x7b-quantized-gguf

Updated Jan 31 • 112 • 7

ikawrakow/qwen-14b-chat-gguf

Updated Jan 12 • 17 • 5

ikawrakow/various-2bit-sota-gguf

Updated Jan 10 • 325 • 80

ikawrakow/open-hermes-2.5-mistral-7b-quantized-gguf

Updated Jan 8 • 42 • 3

ikawrakow/mistral-instruct-7b-quantized-gguf

Updated Jan 8 • 12

ikawrakow/mistral-7b-quantized-gguf

Updated Dec 7, 2023 • 45 • 5

ikawrakow/llama-v1-2bit-gguf

Updated Dec 7, 2023 • 15 • 1

ikawrakow/llama-v2-2bit-gguf

Updated Dec 6, 2023 • 12 • 8

datasets 3

ikawrakow/validation-datasets-for-llama.cpp

Updated Mar 11 • 247 • 14

ikawrakow/winogrande-eval-for-llama.cpp

Viewer • Updated Jan 18 • 1.27k • 54 • 1

ikawrakow/imatrix-from-wiki-train

Updated Jan 14 • 92 • 13