File size: 5,352 Bytes
a418052
 
 
 
dfd64c8
 
9822383
 
dfd64c8
 
 
 
a418052
 
 
 
dfd64c8
 
 
 
 
 
 
 
 
 
ef545c8
3b40d1b
dfd64c8
 
 
 
 
 
 
 
 
 
 
 
3b40d1b
dfd64c8
 
a418052
6ea9a53
a418052
6ea9a53
dfd64c8
a418052
ab216b6
3b40d1b
ab216b6
5c9c5e5
 
 
 
 
 
 
 
 
 
 
a418052
ab216b6
dfd64c8
 
3b40d1b
5c9c5e5
3b40d1b
ab216b6
5c9c5e5
 
 
 
 
 
 
 
 
 
 
dfd64c8
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
library_name: transformers
tags:
- cross-encoder
datasets:
- lightonai/ms-marco-en-bge
- juanluisdb/triviaqa-bge-m3-logits
- juanluisdb/nq-bge-m3-logits
language:
- en
base_model:
- cross-encoder/ms-marco-MiniLM-L-6-v2
---

# Model Card for Model ID

This model is finetuned starting from the well-known [ms-marco-MiniLM-L-6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2) using KL distillation techniques as described [here](https://www.answer.ai/posts/2024-08-13-small-but-mighty-colbert.html),
using [bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) as teacher

# Usage

## Usage with Transformers

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("juanluisdb/MiniLM-L-6-rerank-m3")
tokenizer = AutoTokenizer.from_pretrained("juanluisdb/MiniLM-L-6-rerank-m3")
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
    scores = model(**features).logits
    print(scores)
```


## Usage with SentenceTransformers

```python
from sentence_transformers import CrossEncoder
model = CrossEncoder("juanluisdb/MiniLM-L-6-rerank-m3", max_length=512)
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])
```

# Evaluation

### BEIR (NDCG@10)
I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results.

  
|                |   bm25 |   jina-reranker-v1-turbo-en | bge-reranker-v2-m3   | mxbai-rerank-base-v1   |   ms-marco-MiniLM-L-6-v2 | MiniLM-L-6-rerank-m3   |
|:---------------|:-------:|:----------------------------:|:---------------------:|:-----------------------:|:-------------------------:|:------------------------------:|
| nq*             |  0.305 |                       0.533 | **0.597**            | 0.535                  |                    0.523 | 0.580                         |
| fever*         |  0.638 |                       0.852 | 0.857                | 0.767                  |                    0.801 | **0.867**                     |
| fiqa           |  0.238 |                       0.336 | **0.397**            | 0.382                  |                    0.349 | 0.364                         |
| trec-covid     |  0.589 |                       0.774 | 0.784                | **0.830**              |                    0.741 | 0.738                         |
| scidocs        |  0.15  |                       0.166 | 0.169                | **0.171**              |                    0.164 | 0.165                         |
| scifact        |  0.676 |                       0.739 | 0.731                | 0.719                  |                    0.688 | **0.750**                     |
| nfcorpus       |  0.318 |                       0.353 | 0.336                | **0.353**              |                    0.349 | 0.350                         |
| hotpotqa       |  0.629 |                       0.745 | **0.794**            | 0.668                  |                    0.724 | 0.775                         |
| dbpedia-entity |  0.319 |                       0.421 | **0.445**            | 0.416                  |                    0.445 | 0.444                         |
| quora          |  0.787 |                       0.858 | 0.858                | 0.747                  |                    0.825 | **0.871**                     |
| climate-fever  |  0.163 |                       0.233 | **0.314**            | 0.253                  |                    0.244 | 0.309                         |


\* Training splits of NQ and Fever were used as part of the training data.

Comparison with [ablated model](https://huggingface.co/juanluisdb/MiniLM-L-6-rerank-m3-ablated) trained only on MSMarco:

|                |   ms-marco-MiniLM-L-6-v2 |   MiniLM-L-6-rerank-m3-ablated |
|:---------------|:-------------------------:|:--------------------------------------:|
| nq             |                   0.5234 |                                **0.5412** |
| fever          |                   0.8007 |                                **0.8221** |
| fiqa           |                   0.349  |                                **0.3598** |
| trec-covid     |                   **0.741**  |                                0.7331 |
| scidocs        |                   **0.1638** |                                0.163  |
| scifact        |                   0.688  |                                **0.7376** |
| nfcorpus       |                   0.3493 |                                **0.3495** |
| hotpotqa       |                   0.7235 |                                **0.7583** |
| dbpedia-entity |                   **0.4445** |                                0.4382 |
| quora          |                   0.8251 |                                **0.8619** |
| climate-fever  |                   0.2438 |                                **0.2449** |


# Datasets Used

~900k queries with 32-way triplets were used from these datasets:

* MSMarco
* TriviaQA
* Natural Questions
* FEVER