File size: 5,564 Bytes

---
language:
- ja
library_name: sentence-transformers
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
metrics:
widget: []
pipeline_tag: sentence-similarity
license: apache-2.0
datasets:
- hpprc/emb
- hpprc/mqa-ja
- google-research-datasets/paws-x
---

# SentenceTransformer based on yano0/my_rope_bert_v2

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [yano0/my_rope_bert_v2](https://huggingface.co/yano0/my_rope_bert_v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details
The model is 1024-context sentence embedding model based on the RoFormer.
The model is pre-trained with Wikipedia and cc100 and fine-tuned as a sentence embedding model.
Fine-tuning begins with weakly supervised learning using mc4 and MQA.
After that, we perform the same 3-stage learning process as [GLuCoSE v2](https://huggingface.co/pkshatech/GLuCoSE-base-ja-v2).

### Model Description
- **Model Type:** Sentence Transformer
- **Maximum Sequence Length:** 1024 tokens
- **Output Dimensionality:** 768 tokens
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: RetrievaBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("pkshatech/RoSEtta-base")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

## Evaluation

### Metrics

#### Semantic Similarity

* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| pearson_cosine      | 0.8363     |
| **spearman_cosine** | **0.7829** |
| pearson_manhattan   | 0.8169     |
| spearman_manhattan  | 0.7806     |
| pearson_euclidean   | 0.8176     |
| spearman_euclidean  | 0.7813     |
| pearson_dot         | 0.7906     |
| spearman_dot        | 0.7341     |
| pearson_max         | 0.8363     |
| spearman_max        | 0.7829     |

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Benchmarks

### Retieval
Evaluated with [MIRACL-ja](https://huggingface.co/datasets/miracl/miracl), [JQARA]（https://huggingface.co/datasets/hotchpotch/JQaRA） and [MLDR-ja](https://huggingface.co/datasets/Shitao/MLDR).

| model | size | MIRACL<br>Recall@5 | JQaRA<br>nDCG@10 | MLDR<br>nDCG@10 |
|--------|--------|---------------------|-------------------|-------------------|
| me5-base | 0.3B | 84.2 | 47.2 | 25.4 |
| GLuCoSE | 0.1B | 53.3 | 30.8 | 25.2 |
| RoSEtta | 0.2B | 79.3 | 57.7 | 32.3 |


### JMTEB
Evaluated with [JMTEB](https://github.com/sbintuitions/JMTEB).
* Time-consuming [‘amazon_review_classification’, ‘mrtydi’, ‘jaqket’, ‘esci’] were excluded and evaluated.
* The average is a macro-average per task.

| model | size | Class. | Ret. | STS. | Clus. | Pair. | Avg. |
|--------|--------|--------|------|------|-------|-------|------|
| me5-base | 0.3B | 75.1 | 80.6 | 80.5 | 52.6 | 62.4 | 70.2 |
| GLuCoSE | 0.1B | 82.6 | 69.8 | 78.2 | 51.5 | 66.2 | 69.7 |
| RoSEtta | 0.2B | 79.0 | 84.3 | 81.4 | 53.2 | 61.7 | 71.9 |

## Authors
Chihiro Yano, Go Mocho, Hideyuki Tachibana, Hiroto Takegawa, Yotaro Watanabe

## License
This model is published under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).