How to speed up model generation

#13
by LiPengtao12138 - opened

How to speed up model generation

No description provided.
Meta Llama org

@LiPengtao12138 please make sure you are using a GPU during inference

What I like to do is forcefully set the device to "cuda", for example device = torch.device("cuda") and then model.to(device) if model is an instance of, say, AutoModelForCausalLM(..) (I'm using huggingface's transformers library)

Sign up or log in to comment