Update README.md
Browse files
README.md
CHANGED
@@ -28,7 +28,7 @@ Evaluation results and benchmark
|
|
28 |
|
29 |
We compare [DistilCamemBERT-QA](https://huggingface.co/cmarkea/distilcamembert-base-qa) to two other modelizations working on the french language. The first one [etalab-ia/camembert-base-squadFR-fquad-piaf](https://huggingface.co/etalab-ia/camembert-base-squadFR-fquad-piaf) is based on well named [CamemBERT](https://huggingface.co/camembert-base), the french RoBERTa model and the second one [fmikaelian/flaubert-base-uncased-squad](https://huggingface.co/fmikaelian/flaubert-base-uncased-squad) is based on [FlauBERT](https://huggingface.co/flaubert/flaubert_base_uncased) another french model based on BERT architecture this time.
|
30 |
|
31 |
-
For our benchmarks, we
|
32 |
|
33 |
| **model** | **time (ms)** | **exact match (%)** | **f1-score (%)** | **inclusion-score (%)** |
|
34 |
| :--------------: | :-----------: | :--------------: | :------------: | :------------: |
|
|
|
28 |
|
29 |
We compare [DistilCamemBERT-QA](https://huggingface.co/cmarkea/distilcamembert-base-qa) to two other modelizations working on the french language. The first one [etalab-ia/camembert-base-squadFR-fquad-piaf](https://huggingface.co/etalab-ia/camembert-base-squadFR-fquad-piaf) is based on well named [CamemBERT](https://huggingface.co/camembert-base), the french RoBERTa model and the second one [fmikaelian/flaubert-base-uncased-squad](https://huggingface.co/fmikaelian/flaubert-base-uncased-squad) is based on [FlauBERT](https://huggingface.co/flaubert/flaubert_base_uncased) another french model based on BERT architecture this time.
|
30 |
|
31 |
+
For our benchmarks, we do a word-to-word comparison between words that are matching between the predicted answer and the ground truth. We also use f1-score, which measures the intersection quality between predicted responses and ground truth. Finally, we use inclusion score, which measures if the ground truth answer is included in the predicted answer. An **AMD Ryzen 5 4500U @ 2.3GHz with 6 cores** was used for the mean inference time measure.
|
32 |
|
33 |
| **model** | **time (ms)** | **exact match (%)** | **f1-score (%)** | **inclusion-score (%)** |
|
34 |
| :--------------: | :-----------: | :--------------: | :------------: | :------------: |
|