Idefics3-8B-Llama3-bnb_nf4
BitsAndBytes NF4 quantization.
Quantization
Quantization created with:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
model_id = "HuggingFaceM4/Idefics3-8B-Llama3"
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
llm_int8_enable_fp32_cpu_offload=True,
llm_int8_skip_modules=["lm_head", "model.vision_model", "model.connector"],
)
model_nf4 = AutoModelForVision2Seq.from_pretrained(model_id, quantization_config=nf4_config)
- Downloads last month
- 107
Inference API (serverless) does not yet support transformers models for this pipeline type.
Model tree for thwin27/Idefics3-8B-Llama3-bnb_nf4
Base model
HuggingFaceM4/Idefics3-8B-Llama3