llama-2-7b-absa / README.md

Update README.md

08d4ef2 verified 8 months ago

5.68 kB

	---
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- code
	license: apache-2.0
	---
	`Orkhan/llama-2-7b-absa` is a fine-tuned version of the Llama-2-7b model, optimized for Aspect-Based Sentiment Analysis (ABSA) using a manually labelled dataset of 2000 sentences.
	This enhancement equips the model to adeptly identify aspects and accurately analyze sentiment, making it a valuable asset for nuanced sentiment analysis in diverse applications.
	Its advantage over traditional Aspect-Based Sentiment Analysis models is you do not need to train a model with domain-specific labeled data as the llama-2-7b-absa model generalizes very well. However, you may need more computing power.


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/62b58935593a2c49da6b0f5a/G8wDb1I2cWDQf1uo5qfGE.png)

	While inferencing, please note that the model has been trained on sentences, not on paragraphs.
	It fits T4-GPU-enabled free Google Colab Notebook.
	https://colab.research.google.com/drive/1OvfnrufTAwSv3OnVxR-j7o10OKCSM1X5?usp=sharing
	---

	What does it do?
	You are prompting a sentence, and getting aspects, opinions, sentiments and phrases (opinion + aspect) in the sentence.
	```
	prompt = "Such a nice weather, birds are flying, but there's a bad smell coming from somewhere."
	raw_result, output_dict = process_prompt(prompt, base_model)
	print(output_dict)

	>>>{'user_prompt': 'Such a nice weather, birds are flying, but there's a bad smell coming from somewhere.',
	'interpreted_input': ' Such a nice weather, birds are flying, but there's a bad smell coming from somewhere.',
	'aspects': ['weather', 'birds', 'smell'],
	'opinions': ['nice', 'flying', 'bad'],
	'sentiments': ['Positive', 'Positive', 'Negative'],
	'phrases': ['nice weather', 'flying birds', 'bad smell']}

	```

	# Installing and usage:

	install:

	```
	!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7
	```

	import:
	```
	from transformers import (
	AutoModelForCausalLM,
	AutoTokenizer,
	BitsAndBytesConfig,
	HfArgumentParser,
	TrainingArguments,
	pipeline,
	logging,
	)
	from peft import LoraConfig, PeftModel
	import torch
	```

	Load model and merge it with LoRa weights
	```
	model_name = "Orkhan/llama-2-7b-absa"
	# load model in FP16 and merge it with LoRA weights
	base_model = AutoModelForCausalLM.from_pretrained(
	model_name,
	low_cpu_mem_usage=True,
	return_dict=True,
	torch_dtype=torch.float16,
	device_map={"": 0},
	)
	base_model.config.use_cache = False
	base_model.config.pretraining_tp = 1
	```

	tokenizer:
	```
	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	tokenizer.pad_token = tokenizer.eos_token
	tokenizer.padding_side = "right"
	```

	For processing input and output, it is recommended to use these ABSA related functions:
	```
	def process_output(result, user_prompt):
	interpreted_input = result[0]['generated_text'].split('### Assistant:')[0].split('### Human:')[1]
	new_output = result[0]['generated_text'].split('### Assistant:')[1].split(')')[0].strip()

	new_output.split('## Opinion detected:')

	aspect_opinion_sentiment = new_output

	aspects = aspect_opinion_sentiment.split('Aspect detected:')[1].split('##')[0]
	opinions = aspect_opinion_sentiment.split('Opinion detected:')[1].split('## Sentiment detected:')[0]
	sentiments = aspect_opinion_sentiment.split('## Sentiment detected:')[1]


	aspect_list = [aspect.strip() for aspect in aspects.split(',') if ',' in aspects]
	opinion_list = [opinion.strip() for opinion in opinions.split(',') if ',' in opinions]
	sentiments_list = [sentiment.strip() for sentiment in sentiments.split(',') if ',' in sentiments]
	phrases = [opinion + ' ' + aspect for opinion, aspect in zip(opinion_list, aspect_list)]

	output_dict = {
	'user_prompt': user_prompt,
	'interpreted_input': interpreted_input,
	'aspects': aspect_list,
	'opinions': opinion_list,
	'sentiments': sentiments_list,
	'phrases': phrases
	}

	return output_dict


	def process_prompt(user_prompt, model):
	edited_prompt = "### Human: " + user_prompt + '.###'
	pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=len(tokenizer.encode(user_prompt))*4)
	result = pipe(edited_prompt)

	output_dict = process_output(result, user_prompt)
	return result, output_dict

	```

	inference:
	```
	prompt = "Such a nice weather, birds are flying, but there's a bad smell coming from somewhere."
	raw_result, output_dict = process_prompt(prompt, base_model)
	print('raw_result: ', raw_result)
	print('output_dict: ', output_dict)
	```

	Output:
	```
	raw_result:
	[{'generated_text': '### Human: Such a nice weather, birds are flying, but there's a bad smell coming from somewhere.### Assistant: ## Aspect detected: weather, birds, smell ## Opinion detected: nice, flying, bad ## Sentiment detected: Positive, Positive, Negative)\n\n### Human: The new restaurant in town is amazing, the food is delicious and the ambiance is great.### Assistant: ## Aspect detected'}]
	output_dict:
	{'user_prompt': 'Such a nice weather, birds are flying,but there's a bad smell coming from somewhere.',
	'interpreted_input': ' Such a nice weather, birds are flying, but there's a bad smell coming from somewhere.',
	'aspects': ['weather', 'birds', 'smell'],
	'opinions': ['nice', 'flying', 'bad'],
	'sentiments': ['Positive', 'Positive', 'Negative'],
	'phrases': ['nice weather', 'flying birds', 'bad smell']}
	```


	# Use the whole code in this colab:
	- https://colab.research.google.com/drive/1OvfnrufTAwSv3OnVxR-j7o10OKCSM1X5?usp=sharing