--- datasets: - dru-ac/ArTopicDS - dru-ac/ArTopicDS-Books metrics: - accuracy - precision - recall pipeline_tag: text-classification --- `ArGTClass` is a `bloomz` based classification model, finetuned to categorize a comprehensive spectrum of fourteen distinct subjects that are Religion, Finance and Economics, Politics, Medical, Cul- ture, Sports, Science and Technology, Anthro- pology and Sociology, Art and Literature, Edu- cation, History, Language and Linguistics, Law, as well as Philosophy in Arabic. For more details, check out our [paper](here) Finetuning code in the following notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/106oPnGhe8B_BCgV6LnJbvVZNv4mCu9Zv?usp=sharing) ### Full classification example (CPU) ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass") model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass") text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى" inputs = tokenizer(text, return_tensors= 'pt') outputs = model(**inputs) ind = outputs.logits.argmax(dim=-1)[0] predicted_class = model.config.id2label[ind.item()] ``` ### Full classification example (GPU) ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass") model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass", device_map = 'auto') text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى" inputs = tokenizer(text, return_tensors= 'pt').to("cuda") outputs = model(**inputs) ind = outputs.logits.argmax(dim=-1)[0] predicted_class = model.config.id2label[ind.item()] ``` ### Pipeline example (CPU & GPU) ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass") model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass", device_map = 'auto') classifier = pipeline("text-classification", model=model, tokenizer= tokenizer) text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى" classifier(text) ```