Error in model size?
#1
by
angel-luis
- opened
Currently indicates:
Model size
6.48B params
I can't inference it using TGI because I don't have enough RAM.
@angel-luis
its probably because this is quanted, safetensor thinks this is 6.48b param.
However it is the 8x7b model but just quanted.
I would reccomend using the gguf format with llama.cpp or exl2 format with exlamav2 since not only are they faster but they should take less ram.
Tgi is better with batching but you don't seem to have enough ram to run the model.
llama.cpp with gguf is better for cpu, mac(still 2nd fastest for gpu with single prompt)
exl2 with exllamav2 is better for gpu