--- language: - ja library_name: sentence-transformers tags: - sentence-transformers - sentence-similarity - feature-extraction metrics: widget: [] pipeline_tag: sentence-similarity license: apache-2.0 datasets: - hpprc/emb - hpprc/mqa-ja - google-research-datasets/paws-x --- # SentenceTransformer based on yano0/my_rope_bert_v2 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [yano0/my_rope_bert_v2](https://huggingface.co/yano0/my_rope_bert_v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details The model is 1024-context sentence embedding model based on the RoFormer. The model is pre-trained with Wikipedia and cc100 and fine-tuned as a sentence embedding model. Fine-tuning begins with weakly supervised learning using mc4 and MQA. After that, we perform the same 3-stage learning process as [GLuCoSE v2](https://huggingface.co/pkshatech/GLuCoSE-base-ja-v2). ### Model Description - **Model Type:** Sentence Transformer - **Maximum Sequence Length:** 1024 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: RetrievaBertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("pkshatech/RoSEtta-base") # Run inference sentences = [ 'The weather is lovely today.', "It's so sunny outside!", 'He drove to the stadium.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Semantic Similarity * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:-----------| | pearson_cosine | 0.8363 | | **spearman_cosine** | **0.7829** | | pearson_manhattan | 0.8169 | | spearman_manhattan | 0.7806 | | pearson_euclidean | 0.8176 | | spearman_euclidean | 0.7813 | | pearson_dot | 0.7906 | | spearman_dot | 0.7341 | | pearson_max | 0.8363 | | spearman_max | 0.7829 | ## Benchmarks ### Retieval Evaluated with [MIRACL-ja](https://huggingface.co/datasets/miracl/miracl), [JQARA](https://huggingface.co/datasets/hotchpotch/JQaRA) and [MLDR-ja](https://huggingface.co/datasets/Shitao/MLDR). | model | size | MIRACL
Recall@5 | JQaRA
nDCG@10 | MLDR
nDCG@10 | |--------|--------|---------------------|-------------------|-------------------| | me5-base | 0.3B | 84.2 | 47.2 | 25.4 | | GLuCoSE | 0.1B | 53.3 | 30.8 | 25.2 | | RoSEtta | 0.2B | 79.3 | 57.7 | 32.3 | ### JMTEB Evaluated with [JMTEB](https://github.com/sbintuitions/JMTEB). * Time-consuming [‘amazon_review_classification’, ‘mrtydi’, ‘jaqket’, ‘esci’] were excluded and evaluated. * The average is a macro-average per task. | model | size | Class. | Ret. | STS. | Clus. | Pair. | Avg. | |--------|--------|--------|------|------|-------|-------|------| | me5-base | 0.3B | 75.1 | 80.6 | 80.5 | 52.6 | 62.4 | 70.2 | | GLuCoSE | 0.1B | 82.6 | 69.8 | 78.2 | 51.5 | 66.2 | 69.7 | | RoSEtta | 0.2B | 79.0 | 84.3 | 81.4 | 53.2 | 61.7 | 71.9 | ## Authors Chihiro Yano, Go Mocho, Hideyuki Tachibana, Hiroto Takegawa, Yotaro Watanabe ## License This model is published under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).