hugging-quants
/

Meta-Llama-3.1-405B-Instruct-AWQ-INT4

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Resources

View closed (12)

num_key_value_heads=16 instead of 8 in the original model

#21 opened 24 days ago by

Fix eos_token and model_max_length in tokenizer_config

#20 opened about 2 months ago by

AshtonIsNotHere

Update README.md

#19 opened 3 months ago by

MironVeryanskiy

Update tokenizer_config.json

#18 opened 3 months ago by

Running on multi-node infrastructure

#17 opened 3 months ago by

Update generation_config

#16 opened 4 months ago by

error when quantizing my finetuned 405b model using autoawq

#13 opened 4 months ago by

Atomheart-Father

Any chance of an AWQ version of the 405B base model?

#12 opened 4 months ago by

lodrick-the-lafted

Cuda failure 1 'invalid argument'

#8 opened 4 months ago by