File size: 2,529 Bytes
2f3f189 a073869 2f3f189 26426a1 803f669 26426a1 12cfd93 a00b355 2c39ccc a00b355 91324c8 f24a3fc a00b355 91324c8 5b1291a 803f669 a00b355 91324c8 5b1291a 91324c8 a00b355 803f669 91324c8 a00b355 f24a3fc a00b355 5b1291a a00b355 91324c8 a00b355 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
---
datasets:
- dru-ac/ArTopicDS
- dru-ac/ArTopicDS-Books
metrics:
- accuracy
- precision
- recall
pipeline_tag: text-classification
---
`ArGTClass` is a `bloomz` based classification model, finetuned to categorize a comprehensive spectrum
of fourteen distinct subjects that are Religion,
Finance and Economics, Politics, Medical, Cul-
ture, Sports, Science and Technology, Anthro-
pology and Sociology, Art and Literature, Edu-
cation, History, Language and Linguistics, Law,
as well as Philosophy in Arabic.
For more details, check out our [paper](here)
Finetuning code in the following notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/106oPnGhe8B_BCgV6LnJbvVZNv4mCu9Zv?usp=sharing)
### Full classification example (CPU)
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass")
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass")
text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى"
inputs = tokenizer(text, return_tensors= 'pt')
outputs = model(**inputs)
ind = outputs.logits.argmax(dim=-1)[0]
predicted_class = model.config.id2label[ind.item()]
```
### Full classification example (GPU)
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass")
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass", device_map = 'auto')
text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى"
inputs = tokenizer(text, return_tensors= 'pt').to("cuda")
outputs = model(**inputs)
ind = outputs.logits.argmax(dim=-1)[0]
predicted_class = model.config.id2label[ind.item()]
```
### Pipeline example (CPU & GPU)
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass")
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass", device_map = 'auto')
classifier = pipeline("text-classification", model=model, tokenizer= tokenizer)
text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى"
classifier(text)
```
|