403 Forbidden error when accessing the model
model_id = "elyza/ELYZA-japanese-Llama-2-7b-instruct"
llm_hub = HuggingFaceEndpoint(repo_id=model_id, temperature= 0.1, max_new_tokens=600, model_kwargs={"max_length": 600})
I am using the above code to load the model. Since the size of the model is more than my RAM I gues it won`t be possible to load it locally.
So I want to use the inference to load the model.
I am even setting the HuggingFace token using os.environ["HUGGINGFACEHUB_API_TOKEN"]
but getting the below error:requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api-inference.huggingface.co/models/elyza/ELYZA-japanese-Llama-2-7b-instruct
The same code works for other heavy models. I even tried changing the access token from Inference to Read & Write but did not work.
Does this have something to do with the HuggingFace plan?
Can anyone please help me with this?