Error using model whehn deployed on inference endpoints
I fine-tuned the falcon 7b model on google colab. As i wanted to use it in an application and created an endpoint, I tried deplying it on sagemaker. That gave me an error where it couldnt recognise the word falcon. I tried multiple ways but couldnt succeed.
I thought to move on to inference endpoints for deployment. I started with deployoing the initial model directly to check, but I am getting an error there which is
module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention'
i dont know how to solve this, can someone please guide
Hello, I have'nt experience with SageMaker. But about 'scaled_dot_product_attention', is a pytorch 2.0 feature
I know that AWS have support Text Generation Inference (TGI), you can try it using RunPod too
but how can i resolve pytorch 2.0 feature issue, i am directly uploading it without any changes just to check deployment. I have no where to interact with pytorch.
I would try to talk to the people at HuggingFace if it is their service
Which platform would u suggest for deployment ?
My pc is not powerful enough for this model.
For tests try it on google colab/kaggle kernels
for production, cloud
sorry for delay