Inference

by GaneshMystic - opened Sep 4

Sep 4

Inference on colab for testing

HI,
Can you share the inference code that you use.
I am trying to compare performance of llama 3.1 awq quantized on both llama cpp and onnx.

Sep 25

Hi,
Do you know how to get Llama 3.1 8B f16 with the attention node? How do you convert it so you manage to have the attention node?
Thank you

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment