meta-llama
/

Llama-Guard-3-8B-INT8

Text Generation

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

Resources

View closed (6)

All responses come back as "!!!!!..." repeated like 100 times

#10 opened 14 days ago by

I Found Inference Speed for INT8 Quantized Model is Slower Than Non-Quantized Version

#9 opened 4 months ago by

Access request FAQ

#8 opened 4 months ago by

Anyone able to run this on vLLM ?

#7 opened 4 months ago by