--- license: apache-2.0 datasets: - multi_nli - xnli - dbpedia_14 - SetFit/bbc-news - squad_v2 - race language: - en metrics: - accuracy - f1 library_name: transformers pipeline_tag: zero-shot-classification tags: - classification - information-extraction - zero-shot --- **comprehend_it-base** This is a model based on [DeBERTaV3-base](https://huggingface.co/microsoft/deberta-v3-base) that was trained on natural language inference datasets as well as on multiple text classification datasets. It demonstrates better quality on the diverse set of text classification datasets in a zero-shot setting than [Bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) while being almost 3 times smaller. Moreover, the model can be used for multiple information extraction tasks in zero-shot setting, including: * Named-entity recognition; * Relation extraction; * Entity linking; * Question-answering; #### With the zero-shot classification pipeline The model can be loaded with the `zero-shot-classification` pipeline like so: ```python from transformers import pipeline classifier = pipeline("zero-shot-classification", model="knowledgator/comprehend_it-base") ``` You can then use this pipeline to classify sequences into any of the class names you specify. ```python sequence_to_classify = "one day I will see the world" candidate_labels = ['travel', 'cooking', 'dancing'] classifier(sequence_to_classify, candidate_labels) #{'labels': ['travel', 'dancing', 'cooking'], # 'scores': [0.9938651323318481, 0.0032737774308770895, 0.002861034357920289], # 'sequence': 'one day I will see the world'} ``` If more than one candidate label can be correct, pass `multi_label=True` to calculate each class independently: ```python candidate_labels = ['travel', 'cooking', 'dancing', 'exploration'] classifier(sequence_to_classify, candidate_labels, multi_label=True) #{'labels': ['travel', 'exploration', 'dancing', 'cooking'], # 'scores': [0.9945111274719238, # 0.9383890628814697, # 0.0057061901316046715, # 0.0018193122232332826], # 'sequence': 'one day I will see the world'} ``` #### With manual PyTorch ```python # pose sequence as a NLI premise and label as a hypothesis from transformers import AutoModelForSequenceClassification, AutoTokenizer nli_model = AutoModelForSequenceClassification.from_pretrained('knowledgator/comprehend_it-base') tokenizer = AutoTokenizer.from_pretrained('knowledgator/comprehend_it-base') premise = sequence hypothesis = f'This example is {label}.' # run through model pre-trained on MNLI x = tokenizer.encode(premise, hypothesis, return_tensors='pt', truncation_strategy='only_first') logits = nli_model(x.to(device))[0] # we throw away "neutral" (dim 1) and take the probability of # "entailment" (2) as the probability of the label being true entail_contradiction_logits = logits[:,[0,2]] probs = entail_contradiction_logits.softmax(dim=1) prob_label_is_true = probs[:,1] ``` ### Benchmarking Below, you can see the F1 score on several text classification datasets. All tested models were not fine-tuned on those datasets and were tested in a zero-shot setting. | Model | IMDB | AG_NEWS | Emotions | |-----------------------------|------|---------|----------| | [Bart-large-mnli (407 M)](https://huggingface.co/facebook/bart-large-mnli) | 0.89 | 0.6887 | 0.3765 | | [Deberta-base-v3 (184 M)](https://huggingface.co/cross-encoder/nli-deberta-v3-base) | 0.85 | 0.6455 | 0.5095 | | Comprehendo (184M) | 0.90 | 0.7982 | 0.5660 | ### Future reading Check our blogpost - ["The new milestone in zero-shot capabilities (it’s not Generative AI)."](https://medium.com/p/9b5a081fbf27), where we highlighted possible use-cases of the model and why next-token prediction is not the only way to achive amazing zero-shot capabilites. While most of the AI industry is focused on generative AI and decoder-based models, we are committed to developing encoder-based models. We aim to achieve the same level of generalization for such models as their decoder brothers. Encoders have several wonderful properties, such as bidirectional attention, and they are the best choice for many information extraction tasks in terms of efficiency and controllability.