Llama can now see and run on your device - welcome Llama 3.2
•
163
SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")
. Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later 😉from_model2vec
or with from_distillation
where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.