|
--- |
|
library_name: peft |
|
base_model: xlm-roberta-base |
|
license: mit |
|
language: |
|
- am |
|
metrics: |
|
- accuracy |
|
- f1 |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# xlm-roberta-base-lora-amharic-news-classification |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
This repo contains LoRA adapters for the [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) model finetuned on the [Amharic-News-Text-classification-Dataset](https://huggingface.co/datasets/israel/Amharic-News-Text-classification-Dataset). |
|
|
|
The finetuned model classifies an Amharic news article into one of the following 6 categories. |
|
- ሀገር አቀፍ ዜና (Local News) |
|
- መዝናኛ (Entertainment) |
|
- ስፖርት (Sports) |
|
- ቢዝነስ (Business) |
|
- ዓለም አቀፍ ዜና (International News) |
|
- ፖለቲካ (Politics) |
|
|
|
It achieves the following results on the evaluation set: |
|
- Train Loss: **0.3447** |
|
- Validation Loss: **0.3947** |
|
- Validation Accuracy: **0.8541** |
|
- Validation F1 Score (macro): **0.8105** |
|
- Validation F1 Score (weighted): **0.8551** |
|
|
|
## How to use |
|
|
|
You can use this model with a pipeline for text classification. |
|
But first, you need to install the `peft` library like so: |
|
|
|
```console |
|
pip install peft |
|
``` |
|
|
|
Then, you can run the following code. |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
|
|
|
model_id = "xlm-roberta-base" |
|
peft_model_id = "rasyosef/xlm-roberta-base-lora-amharic-news-classification" |
|
|
|
categories = ['ሀገር አቀፍ ዜና', 'መዝናኛ', 'ስፖርት', 'ቢዝነስ', 'ዓለም አቀፍ ዜና', 'ፖለቲካ'] |
|
id2label = {i: lbl for i, lbl in enumerate(categories)} |
|
label2id = {lbl: i for i, lbl in enumerate(categories)} |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
model = AutoModelForSequenceClassification.from_pretrained( |
|
model_id, |
|
num_labels=len(categories), # 6 |
|
id2label=id2label, |
|
label2id=label2id |
|
) |
|
|
|
model.load_adapter(peft_model_id) |
|
|
|
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) |
|
|
|
classifier([ |
|
"""ቅርሶቹን ለመታደግ የተጀመረው የሙዚዬም ግንባታም በበጀት ምክንያት ተቋርጧል። |
|
|
|
በአፄ ቴዎድሮስ የንግስና ቦታ ደረስጌ ማሪያም ተጀምሮ የቆመው የሙዚየሙ ግንባታ ተጠናቀቆ ስራ |
|
እንዲጀምር ነዋሪዎች ጠይቀዋል። ዘመነ መሳፍንት መቋጫ ያገኘባት የኢትዮጵያ አንድነት የታወጀባት ዳግማዊ |
|
አፄ ቴዎድሮስ ከመንገሳቸው በፊት ደጃች ውቤን ቧሂት ከሚባል ቦታ ድል አድርገው ደጃች ውቤ ለንግስና ባዘጋጁት የንግስና ቦታና |
|
እቃዎች ንጉሰ ነገስት ዘኢትዮጵያ ተብለው የነገሱባት ቦታ ናት።""", # 'ሀገር አቀፍ ዜና' |
|
]) |
|
``` |
|
|
|
Output: |
|
```python |
|
[{'label': 'ሀገር አቀፍ ዜና', 'score': 0.977573037147522}] |
|
``` |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
|
|
- **Developed by:** [More Information Needed] |
|
- **Funded by [optional]:** [More Information Needed] |
|
- **Shared by [optional]:** [More Information Needed] |
|
- **Model type:** [More Information Needed] |
|
- **Language(s) (NLP):** [More Information Needed] |
|
- **License:** [More Information Needed] |
|
- **Finetuned from model [optional]:** [More Information Needed] |
|
|
|
|
|
### Framework versions |
|
|
|
- PEFT 0.7.1 |