Inference
#1
by
GaneshMystic
- opened
Inference on colab for testing
HI,
Can you share the inference code that you use.
I am trying to compare performance of llama 3.1 awq quantized on both llama cpp and onnx.
Hi,
Do you know how to get Llama 3.1 8B f16 with the attention node? How do you convert it so you manage to have the attention node?
Thank you