Spaces:
Runtime error
Runtime error
<!--Copyright 2020 The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
--> | |
# DeBERTa | |
## Overview | |
The DeBERTa model was proposed in [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google's | |
BERT model released in 2018 and Facebook's RoBERTa model released in 2019. | |
It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in | |
RoBERTa. | |
The abstract from the paper is the following: | |
*Recent progress in pre-trained neural language models has significantly improved the performance of many natural | |
language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with | |
disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the | |
disentangled attention mechanism, where each word is represented using two vectors that encode its content and | |
position, respectively, and the attention weights among words are computed using disentangled matrices on their | |
contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to | |
predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency | |
of model pretraining and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of | |
the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9% | |
(90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and | |
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.* | |
This model was contributed by [DeBERTa](https://huggingface.co/DeBERTa). This model TF 2.0 implementation was | |
contributed by [kamalkraj](https://huggingface.co/kamalkraj) . The original code can be found [here](https://github.com/microsoft/DeBERTa). | |
## Resources | |
A list of official Hugging Face and community (indicated by π) resources to help you get started with DeBERTa. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. | |
<PipelineTag pipeline="text-classification"/> | |
- A blog post on how to [Accelerate Large Model Training using DeepSpeed](https://huggingface.co/blog/accelerate-deepspeed) with DeBERTa. | |
- A blog post on [Supercharged Customer Service with Machine Learning](https://huggingface.co/blog/supercharge-customer-service-with-machine-learning) with DeBERTa. | |
- [`DebertaForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb). | |
- [`TFDebertaForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb). | |
- [Text classification task guide](../tasks/sequence_classification) | |
<PipelineTag pipeline="token-classification" /> | |
- [`DebertaForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb). | |
- [`TFDebertaForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb). | |
- [Token classification](https://huggingface.co/course/chapter7/2?fw=pt) chapter of the π€ Hugging Face Course. | |
- [Byte-Pair Encoding tokenization](https://huggingface.co/course/chapter6/5?fw=pt) chapter of the π€ Hugging Face Course. | |
- [Token classification task guide](../tasks/token_classification) | |
<PipelineTag pipeline="fill-mask"/> | |
- [`DebertaForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb). | |
- [`TFDebertaForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb). | |
- [Masked language modeling](https://huggingface.co/course/chapter7/3?fw=pt) chapter of the π€ Hugging Face Course. | |
- [Masked language modeling task guide](../tasks/masked_language_modeling) | |
<PipelineTag pipeline="question-answering"/> | |
- [`DebertaForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb). | |
- [`TFDebertaForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb). | |
- [Question answering](https://huggingface.co/course/chapter7/7?fw=pt) chapter of the π€ Hugging Face Course. | |
- [Question answering task guide](../tasks/question_answering) | |
## DebertaConfig | |
[[autodoc]] DebertaConfig | |
## DebertaTokenizer | |
[[autodoc]] DebertaTokenizer | |
- build_inputs_with_special_tokens | |
- get_special_tokens_mask | |
- create_token_type_ids_from_sequences | |
- save_vocabulary | |
## DebertaTokenizerFast | |
[[autodoc]] DebertaTokenizerFast | |
- build_inputs_with_special_tokens | |
- create_token_type_ids_from_sequences | |
## DebertaModel | |
[[autodoc]] DebertaModel | |
- forward | |
## DebertaPreTrainedModel | |
[[autodoc]] DebertaPreTrainedModel | |
## DebertaForMaskedLM | |
[[autodoc]] DebertaForMaskedLM | |
- forward | |
## DebertaForSequenceClassification | |
[[autodoc]] DebertaForSequenceClassification | |
- forward | |
## DebertaForTokenClassification | |
[[autodoc]] DebertaForTokenClassification | |
- forward | |
## DebertaForQuestionAnswering | |
[[autodoc]] DebertaForQuestionAnswering | |
- forward | |
## TFDebertaModel | |
[[autodoc]] TFDebertaModel | |
- call | |
## TFDebertaPreTrainedModel | |
[[autodoc]] TFDebertaPreTrainedModel | |
- call | |
## TFDebertaForMaskedLM | |
[[autodoc]] TFDebertaForMaskedLM | |
- call | |
## TFDebertaForSequenceClassification | |
[[autodoc]] TFDebertaForSequenceClassification | |
- call | |
## TFDebertaForTokenClassification | |
[[autodoc]] TFDebertaForTokenClassification | |
- call | |
## TFDebertaForQuestionAnswering | |
[[autodoc]] TFDebertaForQuestionAnswering | |
- call | |