chainyo commited on
Commit
2e8422b
1 Parent(s): 87bc5be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -28,7 +28,7 @@ Evaluation results and benchmark
28
 
29
  We compare [DistilCamemBERT-QA](https://huggingface.co/cmarkea/distilcamembert-base-qa) to two other modelizations working on the french language. The first one [etalab-ia/camembert-base-squadFR-fquad-piaf](https://huggingface.co/etalab-ia/camembert-base-squadFR-fquad-piaf) is based on well named [CamemBERT](https://huggingface.co/camembert-base), the french RoBERTa model and the second one [fmikaelian/flaubert-base-uncased-squad](https://huggingface.co/fmikaelian/flaubert-base-uncased-squad) is based on [FlauBERT](https://huggingface.co/flaubert/flaubert_base_uncased) another french model based on BERT architecture this time.
30
 
31
- For our benchmarks, we compare matching character by character between the predicted answer and the ground truth. We also use the f1-score, which measures the intersection quality between predicted responses and ground truth. Finally, we use the inclusion score, which measures if the ground truth answer is included in the predicted answer. An **AMD Ryzen 5 4500U @ 2.3GHz with 6 cores** was used for the mean inference time measure.
32
 
33
  | **model** | **time (ms)** | **exact match (%)** | **f1-score (%)** | **inclusion-score (%)** |
34
  | :--------------: | :-----------: | :--------------: | :------------: | :------------: |
 
28
 
29
  We compare [DistilCamemBERT-QA](https://huggingface.co/cmarkea/distilcamembert-base-qa) to two other modelizations working on the french language. The first one [etalab-ia/camembert-base-squadFR-fquad-piaf](https://huggingface.co/etalab-ia/camembert-base-squadFR-fquad-piaf) is based on well named [CamemBERT](https://huggingface.co/camembert-base), the french RoBERTa model and the second one [fmikaelian/flaubert-base-uncased-squad](https://huggingface.co/fmikaelian/flaubert-base-uncased-squad) is based on [FlauBERT](https://huggingface.co/flaubert/flaubert_base_uncased) another french model based on BERT architecture this time.
30
 
31
+ For our benchmarks, we do a word-to-word comparison between words that are matching between the predicted answer and the ground truth. We also use f1-score, which measures the intersection quality between predicted responses and ground truth. Finally, we use inclusion score, which measures if the ground truth answer is included in the predicted answer. An **AMD Ryzen 5 4500U @ 2.3GHz with 6 cores** was used for the mean inference time measure.
32
 
33
  | **model** | **time (ms)** | **exact match (%)** | **f1-score (%)** | **inclusion-score (%)** |
34
  | :--------------: | :-----------: | :--------------: | :------------: | :------------: |