|
--- |
|
datasets: |
|
- dru-ac/ArTopicDS |
|
- dru-ac/ArTopicDS-Books |
|
metrics: |
|
- accuracy |
|
- precision |
|
- recall |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
`ArGTClass` is a `bloomz` based classification model, finetuned to categorize a comprehensive spectrum |
|
of fourteen distinct subjects that are Religion, |
|
Finance and Economics, Politics, Medical, Cul- |
|
ture, Sports, Science and Technology, Anthro- |
|
pology and Sociology, Art and Literature, Edu- |
|
cation, History, Language and Linguistics, Law, |
|
as well as Philosophy in Arabic. |
|
|
|
|
|
For more details, check out our [paper](here) |
|
|
|
Finetuning code in the following notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/106oPnGhe8B_BCgV6LnJbvVZNv4mCu9Zv?usp=sharing) |
|
|
|
|
|
### Full classification example (CPU) |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass") |
|
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass") |
|
|
|
text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى" |
|
|
|
inputs = tokenizer(text, return_tensors= 'pt') |
|
outputs = model(**inputs) |
|
ind = outputs.logits.argmax(dim=-1)[0] |
|
predicted_class = model.config.id2label[ind.item()] |
|
``` |
|
|
|
### Full classification example (GPU) |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass") |
|
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass", device_map = 'auto') |
|
|
|
text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى" |
|
|
|
inputs = tokenizer(text, return_tensors= 'pt').to("cuda") |
|
outputs = model(**inputs) |
|
ind = outputs.logits.argmax(dim=-1)[0] |
|
predicted_class = model.config.id2label[ind.item()] |
|
``` |
|
|
|
|
|
### Pipeline example (CPU & GPU) |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass") |
|
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass", device_map = 'auto') |
|
|
|
classifier = pipeline("text-classification", model=model, tokenizer= tokenizer) |
|
|
|
text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى" |
|
|
|
classifier(text) |
|
``` |
|
|
|
|