Llama3 Not Running

#157
by Mattb0124 - opened

Hi, I recently downloaded llama3 and am trying to run it on vscode. I've installed of the prereqs and it seems my computer meets the hardware reqs (Lenovo ThinkPad with 16GB of RAM); however, when I execute the model.generate, it acts like it tries to run the model but nothing generates. I see my memory spike from 6GB to 13 and hovers here until I restart my computer. Any idea as to what is going on?

Here is my code:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
import os
from dotenv import load_dotenv

load_dotenv()

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

Load the Hugging Face token from the environment variable

huggingface_token = os.getenv("HUGGINGFACE_TOKEN")

tokenizer = AutoTokenizer.from_pretrained(model_id)

Explicitly set pad_token_id to eos_token_id (128001)

model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto", # This requires the accelerate library
use_auth_token=huggingface_token,
)

#response = text_generator(prompt)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]

input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)

if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token

outputs = model.generate(
input_ids,
max_new_tokens=256,
eos_token_id=[128001,128009],
pad_token_id=tokenizer.pad_token_id,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
2024-07-14 15_02_35-Anaconda Prompt (miniconda3) - python  ESG_doc_ready_llama.py.png

I am having the same issue. Any updates?

Any help would be appreciated!

Sign up or log in to comment