Spaces:
Runtime error
Runtime error
<!--Copyright 2020 The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
--> | |
# BERT | |
<div class="flex flex-wrap space-x-1"> | |
<a href="https://huggingface.co/models?filter=bert"> | |
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-bert-blueviolet"> | |
</a> | |
<a href="https://huggingface.co/spaces/docs-demos/bert-base-uncased"> | |
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"> | |
</a> | |
</div> | |
## Overview | |
The BERT model was proposed in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It's a | |
bidirectional transformer pretrained using a combination of masked language modeling objective and next sentence | |
prediction on a large corpus comprising the Toronto Book Corpus and Wikipedia. | |
The abstract from the paper is the following: | |
*We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations | |
from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional | |
representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, | |
the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models | |
for a wide range of tasks, such as question answering and language inference, without substantial task-specific | |
architecture modifications.* | |
*BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural | |
language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI | |
accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute | |
improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).* | |
Tips: | |
- BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than | |
the left. | |
- BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is | |
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. | |
- Corrupts the inputs by using random masking, more precisely, during pretraining, a given percentage of tokens (usually 15%) is masked by: | |
* a special mask token with probability 0.8 | |
* a random token different from the one masked with probability 0.1 | |
* the same token with probability 0.1 | |
- The model must predict the original sentence, but has a second objective: inputs are two sentences A and B (with a separation token in between). With probability 50%, the sentences are consecutive in the corpus, in the remaining 50% they are not related. The model has to predict if the sentences are consecutive or not. | |
This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The original code can be found [here](https://github.com/google-research/bert). | |
## Resources | |
A list of official Hugging Face and community (indicated by π) resources to help you get started with BERT. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. | |
<PipelineTag pipeline="text-classification"/> | |
- A blog post on [BERT Text Classification in a different language](https://www.philschmid.de/bert-text-classification-in-a-different-language). | |
- A notebook for [Finetuning BERT (and friends) for multi-label text classification](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/BERT/Fine_tuning_BERT_(and_friends)_for_multi_label_text_classification.ipynb). | |
- A notebook on how to [Finetune BERT for multi-label classification using PyTorch](https://colab.research.google.com/github/abhimishra91/transformers-tutorials/blob/master/transformers_multi_label_classification.ipynb). π | |
- A notebook on how to [warm-start an EncoderDecoder model with BERT for summarization](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb). | |
- [`BertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb). | |
- [`TFBertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb). | |
- [`FlaxBertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification_flax.ipynb). | |
- [Text classification task guide](../tasks/sequence_classification) | |
<PipelineTag pipeline="token-classification"/> | |
- A blog post on how to use [Hugging Face Transformers with Keras: Fine-tune a non-English BERT for Named Entity Recognition](https://www.philschmid.de/huggingface-transformers-keras-tf). | |
- A notebook for [Finetuning BERT for named-entity recognition](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Custom_Named_Entity_Recognition_with_BERT_only_first_wordpiece.ipynb) using only the first wordpiece of each word in the word label during tokenization. To propagate the label of the word to all wordpieces, see this [version](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/BERT/Custom_Named_Entity_Recognition_with_BERT.ipynb) of the notebook instead. | |
- [`BertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb). | |
- [`TFBertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb). | |
- [`FlaxBertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/token-classification). | |
- [Token classification](https://huggingface.co/course/chapter7/2?fw=pt) chapter of the π€ Hugging Face Course. | |
- [Token classification task guide](../tasks/token_classification) | |
<PipelineTag pipeline="fill-mask"/> | |
- [`BertForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb). | |
- [`TFBertForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb). | |
- [`FlaxBertForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling#masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/masked_language_modeling_flax.ipynb). | |
- [Masked language modeling](https://huggingface.co/course/chapter7/3?fw=pt) chapter of the π€ Hugging Face Course. | |
- [Masked language modeling task guide](../tasks/masked_language_modeling) | |
<PipelineTag pipeline="question-answering"/> | |
- [`BertForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb). | |
- [`TFBertForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb). | |
- [`FlaxBertForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/question-answering). | |
- [Question answering](https://huggingface.co/course/chapter7/7?fw=pt) chapter of the π€ Hugging Face Course. | |
- [Question answering task guide](../tasks/question_answering) | |
**Multiple choice** | |
- [`BertForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb). | |
- [`TFBertForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice-tf.ipynb). | |
- [Multiple choice task guide](../tasks/multiple_choice) | |
β‘οΈ **Inference** | |
- A blog post on how to [Accelerate BERT inference with Hugging Face Transformers and AWS Inferentia](https://huggingface.co/blog/bert-inferentia-sagemaker). | |
- A blog post on how to [Accelerate BERT inference with DeepSpeed-Inference on GPUs](https://www.philschmid.de/bert-deepspeed-inference). | |
βοΈ **Pretraining** | |
- A blog post on [Pre-Training BERT with Hugging Face Transformers and Habana Gaudi](https://www.philschmid.de/pre-training-bert-habana). | |
π **Deploy** | |
- A blog post on how to [Convert Transformers to ONNX with Hugging Face Optimum](https://www.philschmid.de/convert-transformers-to-onnx). | |
- A blog post on how to [Setup Deep Learning environment for Hugging Face Transformers with Habana Gaudi on AWS](https://www.philschmid.de/getting-started-habana-gaudi#conclusion). | |
- A blog post on [Autoscaling BERT with Hugging Face Transformers, Amazon SageMaker and Terraform module](https://www.philschmid.de/terraform-huggingface-amazon-sagemaker-advanced). | |
- A blog post on [Serverless BERT with HuggingFace, AWS Lambda, and Docker](https://www.philschmid.de/serverless-bert-with-huggingface-aws-lambda-docker). | |
- A blog post on [Hugging Face Transformers BERT fine-tuning using Amazon SageMaker and Training Compiler](https://www.philschmid.de/huggingface-amazon-sagemaker-training-compiler). | |
- A blog post on [Task-specific knowledge distillation for BERT using Transformers & Amazon SageMaker](https://www.philschmid.de/knowledge-distillation-bert-transformers). | |
## BertConfig | |
[[autodoc]] BertConfig | |
- all | |
## BertTokenizer | |
[[autodoc]] BertTokenizer | |
- build_inputs_with_special_tokens | |
- get_special_tokens_mask | |
- create_token_type_ids_from_sequences | |
- save_vocabulary | |
## BertTokenizerFast | |
[[autodoc]] BertTokenizerFast | |
## TFBertTokenizer | |
[[autodoc]] TFBertTokenizer | |
## Bert specific outputs | |
[[autodoc]] models.bert.modeling_bert.BertForPreTrainingOutput | |
[[autodoc]] models.bert.modeling_tf_bert.TFBertForPreTrainingOutput | |
[[autodoc]] models.bert.modeling_flax_bert.FlaxBertForPreTrainingOutput | |
## BertModel | |
[[autodoc]] BertModel | |
- forward | |
## BertForPreTraining | |
[[autodoc]] BertForPreTraining | |
- forward | |
## BertLMHeadModel | |
[[autodoc]] BertLMHeadModel | |
- forward | |
## BertForMaskedLM | |
[[autodoc]] BertForMaskedLM | |
- forward | |
## BertForNextSentencePrediction | |
[[autodoc]] BertForNextSentencePrediction | |
- forward | |
## BertForSequenceClassification | |
[[autodoc]] BertForSequenceClassification | |
- forward | |
## BertForMultipleChoice | |
[[autodoc]] BertForMultipleChoice | |
- forward | |
## BertForTokenClassification | |
[[autodoc]] BertForTokenClassification | |
- forward | |
## BertForQuestionAnswering | |
[[autodoc]] BertForQuestionAnswering | |
- forward | |
## TFBertModel | |
[[autodoc]] TFBertModel | |
- call | |
## TFBertForPreTraining | |
[[autodoc]] TFBertForPreTraining | |
- call | |
## TFBertModelLMHeadModel | |
[[autodoc]] TFBertLMHeadModel | |
- call | |
## TFBertForMaskedLM | |
[[autodoc]] TFBertForMaskedLM | |
- call | |
## TFBertForNextSentencePrediction | |
[[autodoc]] TFBertForNextSentencePrediction | |
- call | |
## TFBertForSequenceClassification | |
[[autodoc]] TFBertForSequenceClassification | |
- call | |
## TFBertForMultipleChoice | |
[[autodoc]] TFBertForMultipleChoice | |
- call | |
## TFBertForTokenClassification | |
[[autodoc]] TFBertForTokenClassification | |
- call | |
## TFBertForQuestionAnswering | |
[[autodoc]] TFBertForQuestionAnswering | |
- call | |
## FlaxBertModel | |
[[autodoc]] FlaxBertModel | |
- __call__ | |
## FlaxBertForPreTraining | |
[[autodoc]] FlaxBertForPreTraining | |
- __call__ | |
## FlaxBertForCausalLM | |
[[autodoc]] FlaxBertForCausalLM | |
- __call__ | |
## FlaxBertForMaskedLM | |
[[autodoc]] FlaxBertForMaskedLM | |
- __call__ | |
## FlaxBertForNextSentencePrediction | |
[[autodoc]] FlaxBertForNextSentencePrediction | |
- __call__ | |
## FlaxBertForSequenceClassification | |
[[autodoc]] FlaxBertForSequenceClassification | |
- __call__ | |
## FlaxBertForMultipleChoice | |
[[autodoc]] FlaxBertForMultipleChoice | |
- __call__ | |
## FlaxBertForTokenClassification | |
[[autodoc]] FlaxBertForTokenClassification | |
- __call__ | |
## FlaxBertForQuestionAnswering | |
[[autodoc]] FlaxBertForQuestionAnswering | |
- __call__ | |