File size: 1,238 Bytes
b672aea 9aaae0e 57db5d2 f23cbcf 57db5d2 b672aea e9bd328 9aaae0e e9bd328 a24243f e9bd328 9aaae0e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
---
license: mit
datasets:
- deutsche-telekom/ger-backtrans-paraphrase
- paws-x
- stsb_multi_mt
language:
- de
model-index:
- name: e5-base-sts-en-de
results:
- task:
type: semantic textual similarity
dataset:
type: stsb_multi_mt
name: stsb_multi_mt
metrics:
- type: spearmanr
value: 0.904
---
**INFO**: The model is being continuously updated.
The model is a [multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) model fine-tuned with the task of semantic textual similarity in mind.
## Model Training
The model has been fine-tuned on the German subsets of the following datasets:
- [German paraphrase corpus by Philip May](https://huggingface.co/datasets/deutsche-telekom/ger-backtrans-paraphrase)
- [paws-x](https://huggingface.co/datasets/paws-x)
- [stsb_multi_mt](https://huggingface.co/datasets/stsb_multi_mt)
The training procedure can be divided into two stages:
- training on paraphrase datasets with the Multiple Negatives Ranking Loss
- training on semantic textual similarity datasets using the Cosine Similarity Loss
# Results
The model achieves the following results:
- 0.920 on stsb's validation subset
- 0.904 on stsb's test subset |