|
--- |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- code |
|
license: apache-2.0 |
|
--- |
|
`Orkhan/llama-2-7b-absa` is a fine-tuned version of the Llama-2-7b model, optimized for Aspect-Based Sentiment Analysis (ABSA) using a manually labelled dataset of 2000 sentences. |
|
This enhancement equips the model to adeptly identify aspects and accurately analyze sentiment, making it a valuable asset for nuanced sentiment analysis in diverse applications. |
|
Its advantage over traditional Aspect-Based Sentiment Analysis models is you do not need to train a model with domain-specific labeled data as the llama-2-7b-absa model generalizes very well. However, you may need more computing power. |
|
|
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62b58935593a2c49da6b0f5a/G8wDb1I2cWDQf1uo5qfGE.png) |
|
|
|
While inferencing, please note that the model has been trained on sentences, not on paragraphs. |
|
It fits T4-GPU-enabled free Google Colab Notebook. |
|
https://colab.research.google.com/drive/1OvfnrufTAwSv3OnVxR-j7o10OKCSM1X5?usp=sharing |
|
--- |
|
|
|
What does it do? |
|
You are prompting a sentence, and getting aspects, opinions, sentiments and phrases (opinion + aspect) in the sentence. |
|
``` |
|
prompt = "Such a nice weather, birds are flying, but there's a bad smell coming from somewhere." |
|
raw_result, output_dict = process_prompt(prompt, base_model) |
|
print(output_dict) |
|
|
|
>>>{'user_prompt': 'Such a nice weather, birds are flying, but there's a bad smell coming from somewhere.', |
|
'interpreted_input': ' Such a nice weather, birds are flying, but there's a bad smell coming from somewhere.', |
|
'aspects': ['weather', 'birds', 'smell'], |
|
'opinions': ['nice', 'flying', 'bad'], |
|
'sentiments': ['Positive', 'Positive', 'Negative'], |
|
'phrases': ['nice weather', 'flying birds', 'bad smell']} |
|
|
|
``` |
|
|
|
# Installing and usage: |
|
|
|
install: |
|
|
|
``` |
|
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7 |
|
``` |
|
|
|
import: |
|
``` |
|
from transformers import ( |
|
AutoModelForCausalLM, |
|
AutoTokenizer, |
|
BitsAndBytesConfig, |
|
HfArgumentParser, |
|
TrainingArguments, |
|
pipeline, |
|
logging, |
|
) |
|
from peft import LoraConfig, PeftModel |
|
import torch |
|
``` |
|
|
|
Load model and merge it with LoRa weights |
|
``` |
|
model_name = "Orkhan/llama-2-7b-absa" |
|
# load model in FP16 and merge it with LoRA weights |
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
low_cpu_mem_usage=True, |
|
return_dict=True, |
|
torch_dtype=torch.float16, |
|
device_map={"": 0}, |
|
) |
|
base_model.config.use_cache = False |
|
base_model.config.pretraining_tp = 1 |
|
``` |
|
|
|
tokenizer: |
|
``` |
|
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) |
|
tokenizer.pad_token = tokenizer.eos_token |
|
tokenizer.padding_side = "right" |
|
``` |
|
|
|
For processing input and output, it is recommended to use these ABSA related functions: |
|
``` |
|
def process_output(result, user_prompt): |
|
interpreted_input = result[0]['generated_text'].split('### Assistant:')[0].split('### Human:')[1] |
|
new_output = result[0]['generated_text'].split('### Assistant:')[1].split(')')[0].strip() |
|
|
|
new_output.split('## Opinion detected:') |
|
|
|
aspect_opinion_sentiment = new_output |
|
|
|
aspects = aspect_opinion_sentiment.split('Aspect detected:')[1].split('##')[0] |
|
opinions = aspect_opinion_sentiment.split('Opinion detected:')[1].split('## Sentiment detected:')[0] |
|
sentiments = aspect_opinion_sentiment.split('## Sentiment detected:')[1] |
|
|
|
|
|
aspect_list = [aspect.strip() for aspect in aspects.split(',') if ',' in aspects] |
|
opinion_list = [opinion.strip() for opinion in opinions.split(',') if ',' in opinions] |
|
sentiments_list = [sentiment.strip() for sentiment in sentiments.split(',') if ',' in sentiments] |
|
phrases = [opinion + ' ' + aspect for opinion, aspect in zip(opinion_list, aspect_list)] |
|
|
|
output_dict = { |
|
'user_prompt': user_prompt, |
|
'interpreted_input': interpreted_input, |
|
'aspects': aspect_list, |
|
'opinions': opinion_list, |
|
'sentiments': sentiments_list, |
|
'phrases': phrases |
|
} |
|
|
|
return output_dict |
|
|
|
|
|
def process_prompt(user_prompt, model): |
|
edited_prompt = "### Human: " + user_prompt + '.###' |
|
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=len(tokenizer.encode(user_prompt))*4) |
|
result = pipe(edited_prompt) |
|
|
|
output_dict = process_output(result, user_prompt) |
|
return result, output_dict |
|
|
|
``` |
|
|
|
inference: |
|
``` |
|
prompt = "Such a nice weather, birds are flying, but there's a bad smell coming from somewhere." |
|
raw_result, output_dict = process_prompt(prompt, base_model) |
|
print('raw_result: ', raw_result) |
|
print('output_dict: ', output_dict) |
|
``` |
|
|
|
Output: |
|
``` |
|
raw_result: |
|
[{'generated_text': '### Human: Such a nice weather, birds are flying, but there's a bad smell coming from somewhere.### Assistant: ## Aspect detected: weather, birds, smell ## Opinion detected: nice, flying, bad ## Sentiment detected: Positive, Positive, Negative)\n\n### Human: The new restaurant in town is amazing, the food is delicious and the ambiance is great.### Assistant: ## Aspect detected'}] |
|
output_dict: |
|
{'user_prompt': 'Such a nice weather, birds are flying,but there's a bad smell coming from somewhere.', |
|
'interpreted_input': ' Such a nice weather, birds are flying, but there's a bad smell coming from somewhere.', |
|
'aspects': ['weather', 'birds', 'smell'], |
|
'opinions': ['nice', 'flying', 'bad'], |
|
'sentiments': ['Positive', 'Positive', 'Negative'], |
|
'phrases': ['nice weather', 'flying birds', 'bad smell']} |
|
``` |
|
|
|
|
|
# Use the whole code in this colab: |
|
- https://colab.research.google.com/drive/1OvfnrufTAwSv3OnVxR-j7o10OKCSM1X5?usp=sharing |