4bit nf4 quantized version, you can find the quantized version generation code below.

from transformers import BitsAndBytesConfig


nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)

model_nf4 = AutoModelForVision2Seq.from_pretrained("HuggingFaceTB/SmolVLM-Instruct", quantization_config=nf4_config)
Downloads last month
48
Safetensors
Model size
1.26B params
Tensor type
F32
FP16
U8
Inference Examples
Inference API (serverless) does not yet support transformers models for this pipeline type.

Model tree for uisikdag/SmolVLM-Instruct-4bit-bitsnbytes-nf4