Spaces:
Runtime error
Runtime error
Update app.py
Browse files
app.py
CHANGED
@@ -44,11 +44,12 @@ def predict(context,question):
|
|
44 |
|
45 |
md = """This prediction model is designed to answer a question about a given input text--reading comprehension. The model does not just answer questions in general -- it only works from the text that you provide. However, automated reading comprehension can be a valuable task.
|
46 |
|
47 |
-
The model is based on the Zafrir et al. (2021) paper: [Prune Once for All: Sparse Pre-Trained Language Models](https://arxiv.org/abs/2111.05754). The model can be found [here](https://huggingface.co/Intel/bert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa). It has had weight pruning and model distillation applied to create a sparse weight pattern that is maintained even after fine-tuning has been applied. According to Zafrir et al. (2016), their "results show the best compression-to-accuracy ratio for BERT-Base". This model is still in FP32, but can be quantized to INT8 with the [Intel® Neural Compressor](https://github.com/intel/neural-compressor). An INT8 version of this model can actually be found [here on
|
48 |
|
49 |
The training dataset used is the English Wikipedia dataset (2500M words), and then fine-tuned on the SQuADv1.1 dataset containing 89K training examples, compiled by Rajpurkar et al. (2016): [100, 000+ Questions for Machine Comprehension of Text](https://arxiv.org/abs/1606.05250).
|
50 |
|
51 |
Author of Hugging Face Space: Benjamin Consolvo, AI Solutions Engineer Manager at Intel
|
|
|
52 |
Date last updated: 03/28/2023
|
53 |
"""
|
54 |
# The main idea of this BERT-Base model is that it is much more fast and efficient in deployment than its dense counterpart: (https://huggingface.co/csarron/bert-base-uncased-squad-v1).
|
|
|
44 |
|
45 |
md = """This prediction model is designed to answer a question about a given input text--reading comprehension. The model does not just answer questions in general -- it only works from the text that you provide. However, automated reading comprehension can be a valuable task.
|
46 |
|
47 |
+
The model is based on the Zafrir et al. (2021) paper: [Prune Once for All: Sparse Pre-Trained Language Models](https://arxiv.org/abs/2111.05754). The model can be found [here](https://huggingface.co/Intel/bert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa). It has had weight pruning and model distillation applied to create a sparse weight pattern that is maintained even after fine-tuning has been applied. According to Zafrir et al. (2016), their "results show the best compression-to-accuracy ratio for BERT-Base". This model is still in FP32, but can be quantized to INT8 with the [Intel® Neural Compressor](https://github.com/intel/neural-compressor). An INT8 version of this model can actually be found [here on Hugging Face](https://huggingface.co/Intel/distilbert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa-int8).
|
48 |
|
49 |
The training dataset used is the English Wikipedia dataset (2500M words), and then fine-tuned on the SQuADv1.1 dataset containing 89K training examples, compiled by Rajpurkar et al. (2016): [100, 000+ Questions for Machine Comprehension of Text](https://arxiv.org/abs/1606.05250).
|
50 |
|
51 |
Author of Hugging Face Space: Benjamin Consolvo, AI Solutions Engineer Manager at Intel
|
52 |
+
|
53 |
Date last updated: 03/28/2023
|
54 |
"""
|
55 |
# The main idea of this BERT-Base model is that it is much more fast and efficient in deployment than its dense counterpart: (https://huggingface.co/csarron/bert-base-uncased-squad-v1).
|