|
--- |
|
library_name: peft |
|
base_model: xlm-roberta-base |
|
license: mit |
|
language: |
|
- am |
|
widget: |
|
- text: ኢትዮጵያ ፕሪምየር ሊግ 6ኛ ሳምንት የእሁድ ጨዋታዎች ቅድመ ዳሰሳ |
|
example_title: ስፖርት (Sports) |
|
metrics: |
|
- accuracy |
|
- f1 |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# xlm-roberta-base-lora-amharic-news-classification |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
This repo contains LoRA adapters for the [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) model finetuned on the [Amharic-News-Text-classification-Dataset](https://huggingface.co/datasets/israel/Amharic-News-Text-classification-Dataset). |
|
|
|
The finetuned model classifies an Amharic news article into one of the following 6 categories. |
|
- ሀገር አቀፍ ዜና (Local News) |
|
- መዝናኛ (Entertainment) |
|
- ስፖርት (Sports) |
|
- ቢዝነስ (Business) |
|
- ዓለም አቀፍ ዜና (International News) |
|
- ፖለቲካ (Politics) |
|
|
|
It achieves the following results on the evaluation set: |
|
- Train Loss: **0.3563** |
|
- Validation Loss: **0.3613** |
|
- Validation Accuracy: **0.8642** |
|
- Validation F1 Score (macro): **0.8220** |
|
- Validation F1 Score (weighted): **0.8648** |
|
|
|
## How to use |
|
|
|
You can use this model with a pipeline for text classification. |
|
But first, you need to install the `peft` library like so: |
|
|
|
```console |
|
pip install peft |
|
``` |
|
|
|
Then, you can run the following code. |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
|
|
|
model_id = "xlm-roberta-base" |
|
peft_model_id = "rasyosef/xlm-roberta-base-lora-amharic-news-classification" |
|
|
|
categories = ['ሀገር አቀፍ ዜና', 'መዝናኛ', 'ስፖርት', 'ቢዝነስ', 'ዓለም አቀፍ ዜና', 'ፖለቲካ'] |
|
id2label = {i: lbl for i, lbl in enumerate(categories)} |
|
label2id = {lbl: i for i, lbl in enumerate(categories)} |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
model = AutoModelForSequenceClassification.from_pretrained( |
|
model_id, |
|
num_labels=len(categories), # 6 |
|
id2label=id2label, |
|
label2id=label2id |
|
) |
|
|
|
model.load_adapter(peft_model_id) |
|
|
|
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) |
|
|
|
classifier([ |
|
"""ቅርሶቹን ለመታደግ የተጀመረው የሙዚዬም ግንባታም በበጀት ምክንያት ተቋርጧል። |
|
|
|
በአፄ ቴዎድሮስ የንግስና ቦታ ደረስጌ ማሪያም ተጀምሮ የቆመው የሙዚየሙ ግንባታ ተጠናቀቆ ስራ |
|
እንዲጀምር ነዋሪዎች ጠይቀዋል። ዘመነ መሳፍንት መቋጫ ያገኘባት የኢትዮጵያ አንድነት የታወጀባት ዳግማዊ |
|
አፄ ቴዎድሮስ ከመንገሳቸው በፊት ደጃች ውቤን ቧሂት ከሚባል ቦታ ድል አድርገው ደጃች ውቤ ለንግስና ባዘጋጁት የንግስና ቦታና |
|
እቃዎች ንጉሰ ነገስት ዘኢትዮጵያ ተብለው የነገሱባት ቦታ ናት።""", # 'ሀገር አቀፍ ዜና' |
|
]) |
|
``` |
|
|
|
Output: |
|
```python |
|
[{'label': 'ሀገር አቀፍ ዜና', 'score': 0.977573037147522}] |
|
``` |
|
|
|
## Demo |
|
|
|
Use the following demo to play around with the model: |
|
https://huggingface.co/spaces/rasyosef/amharic-news-classification |
|
|
|
### Framework versions |
|
|
|
- PEFT 0.7.1 |