--- datasets: - squad_v2 metrics: - f1 - exact_match --- ## Distilroberta-squad2 This model is [Distilroberta base](https://huggingface.co/distilroberta-base) which was fine-tuned for context-based question answering on the [SQuAD v2](https://huggingface.co/datasets/squad_v2) dataset, a dataset of English-language context-question-answer triples designed for extractive question answering training and benchmarking. Version 2 of SQuAD (Stanford Question Answering Dataset) contains the 100,000 examples from SQuAD Version 1.1, along with 50,000 additional "unanswerable" questions, i.e. questions whose answer cannot be found in the provided context. ## Model description This fine-tuned model prioritizes inference speed; DistilRoBERTa operates at a pace twice as fast as the RoBERTa-base model, with only a marginal compromise in quality. ## Intended uses & limitations ```python from transformers import pipeline QA_pipeline = pipeline("question-answering", model="AdamCodd/distilroberta-squad2", handle_impossible_answer=True) input = { 'question': "Which name is also used to describe the Amazon rainforest in English?", 'context': '''The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain "Amazonas" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species.''' } response = QA_pipeline(**input) print(response) ``` ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - mixed_precision = "fp16" - max_seq_len = 384 - doc_stride = 128 - optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 150 - num_epochs: 3 ### Training results Evaluated on the SQuAD 2.0 dev set with the [official eval script](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/). Results: ``` 'exact': 72.9470226564474, 'f1': 76.03522762032603, 'total': 11873, 'HasAns_exact': 72.4527665317139, 'HasAns_f1': 78.63803264779528, 'HasAns_total': 5928, 'NoAns_exact': 73.43986543313709, 'NoAns_f1': 73.43986543313709, 'NoAns_total': 5945, 'best_exact': 72.95544512760044, 'best_exact_thresh': 0.0, 'best_f1': 76.04365009147917, 'best_f1_thresh': 0.0 ``` ### Framework versions - Transformers 4.34.0 - Torch 2.0.1 - Accelerate 0.23.0 - Tokenizers 0.14.1 If you want to support me, you can [here](https://ko-fi.com/adamcodd).