I am trying to load the vicuna-7B-1.1-HF model on EC2 instance having GPU A10G, and has Build cuda_11.7.r11.7/compiler.31442593_0.The model is loading fine, however, the model load time is high 100-120 sec. Any suggestions on the topic?
· Sign up or log in to comment