dangvantuan commited on
Commit
1813d84
1 Parent(s): 01a0bb3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -1481,7 +1481,7 @@ language:
1481
 
1482
  # [bilingual-embedding-large](https://huggingface.co/Lajavaness/bilingual-embedding-large)
1483
 
1484
- bilingual-embedding is the Embedding Model for bilingual language: french and english. This model is a specialized sentence-embedding trained specifically for the bilingual language, leveraging the robust capabilities of [XLM-RoBERTa](https://huggingface.co/FacebookAI/xlm-roberta-large), a pre-trained language model based on the [XLM-RoBERTa](https://huggingface.co/FacebookAI/xlm-roberta-large) architecture. The model utilizes xlm-roberta to encode english-french sentences into a 1024-dimensional vector space, facilitating a wide range of applications from semantic search to text clustering. The embeddings capture the nuanced meanings of english-french sentences, reflecting both the lexical and contextual layers of the language.
1485
 
1486
 
1487
  ## Full Model Architecture
@@ -1501,7 +1501,7 @@ SentenceTransformer(
1501
  - Dataset: [STSB-fr and en]
1502
  - Method: Fine-tuning specifically for the semantic textual similarity benchmark using Siamese BERT-Networks configured with the 'sentence-transformers' library.
1503
  ### Stage 4: Advanced Augmentation Fine-tuning
1504
- - Dataset: STSB-vn with generate [silver sample from gold sample](https://www.sbert.net/examples/training/data_augmentation/README.html)
1505
  - Method: Employed an advanced strategy using [Augmented SBERT](https://arxiv.org/abs/2010.08240) with Pair Sampling Strategies, integrating both Cross-Encoder and Bi-Encoder models. This stage further refined the embeddings by enriching the training data dynamically, enhancing the model's robustness and accuracy.
1506
 
1507
 
@@ -1517,7 +1517,6 @@ Then you can use the model like this:
1517
 
1518
  ```python
1519
  from sentence_transformers import SentenceTransformer
1520
- from pyvi.ViTokenizer import tokenize
1521
 
1522
  sentences = ["Paris est une capitale de la France", "Paris is a capital of France"]
1523
 
 
1481
 
1482
  # [bilingual-embedding-large](https://huggingface.co/Lajavaness/bilingual-embedding-large)
1483
 
1484
+ Bilingual-embedding is the Embedding Model for bilingual language: french and english. This model is a specialized sentence-embedding trained specifically for the bilingual language, leveraging the robust capabilities of [XLM-RoBERTa](https://huggingface.co/FacebookAI/xlm-roberta-large), a pre-trained language model based on the [XLM-RoBERTa](https://huggingface.co/FacebookAI/xlm-roberta-large) architecture. The model utilizes xlm-roberta to encode english-french sentences into a 1024-dimensional vector space, facilitating a wide range of applications from semantic search to text clustering. The embeddings capture the nuanced meanings of english-french sentences, reflecting both the lexical and contextual layers of the language.
1485
 
1486
 
1487
  ## Full Model Architecture
 
1501
  - Dataset: [STSB-fr and en]
1502
  - Method: Fine-tuning specifically for the semantic textual similarity benchmark using Siamese BERT-Networks configured with the 'sentence-transformers' library.
1503
  ### Stage 4: Advanced Augmentation Fine-tuning
1504
+ - Dataset: STSB with generate [silver sample from gold sample](https://www.sbert.net/examples/training/data_augmentation/README.html)
1505
  - Method: Employed an advanced strategy using [Augmented SBERT](https://arxiv.org/abs/2010.08240) with Pair Sampling Strategies, integrating both Cross-Encoder and Bi-Encoder models. This stage further refined the embeddings by enriching the training data dynamically, enhancing the model's robustness and accuracy.
1506
 
1507
 
 
1517
 
1518
  ```python
1519
  from sentence_transformers import SentenceTransformer
 
1520
 
1521
  sentences = ["Paris est une capitale de la France", "Paris is a capital of France"]
1522