from transformers import AutoModelForCausalLM | |
model = AutoModelForCausalLM.from_pretrained( | |
"mistralai/Mistral-7B-v0.1", device_map="auto", load_in_4bit=True | |
) | |
You'll notice two flags in the from_pretrained call: | |
device_map ensures the model is moved to your GPU(s) | |
load_in_4bit applies 4-bit dynamic quantization to massively reduce the resource requirements | |
There are other ways to initialize a model, but this is a good baseline to begin with an LLM. |