File size: 953 Bytes
2f94d86
 
 
 
 
 
 
 
 
 
 
 
c4a6fa2
2f94d86
 
 
 
 
 
 
 
 
3ac6695
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
language: zh
tags:
- cross-encoder
datasets:
- dialogue
---

# Data
train data is similarity sentence data from E-commerce dialogue, about 50w sentence pairs.

## Model
model created by [sentence-tansformers](https://www.sbert.net/index.html),model struct is cross-encoder, pretrained model is hfl/chinese-roberta-wwm-ext.
This model structure is as same as [tuhailong/cross_encoder_roberta-wwm-ext_v0](https://huggingface.co/tuhailong/cross_encoder_roberta-wwm-ext_v0),the difference is changing the order of input sentences and put them in train dataset, the performance is better in my dataset.

### Usage
```python
>>> from sentence_transformers.cross_encoder import CrossEncoder
>>> model = CrossEncoder(model_save_path, device="cuda", max_length=64)
>>> sentences = ["今天天气不错", "今天心情不错"]
>>> score = model.predict([sentences])
>>> print(score[0])
```

#### Code
train code from https://github.com/TTurn/cross-encoder