|
--- |
|
library_name: transformers |
|
tags: |
|
- cross-encoder |
|
datasets: |
|
- lightonai/ms-marco-en-bge |
|
language: |
|
- en |
|
base_model: |
|
- cross-encoder/ms-marco-MiniLM-L-6-v2 |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
This model is finetuned starting from the well-known [ms-marco-MiniLM-L-6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2) using KL distillation techniques as described [here](https://www.answer.ai/posts/2024-08-13-small-but-mighty-colbert.html), |
|
using [bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) as teacher |
|
|
|
# Usage |
|
|
|
## Usage with Transformers |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
import torch |
|
model = AutoModelForSequenceClassification.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn") |
|
tokenizer = AutoTokenizer.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn") |
|
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'], padding=True, truncation=True, return_tensors="pt") |
|
model.eval() |
|
with torch.no_grad(): |
|
scores = model(**features).logits |
|
print(scores) |
|
``` |
|
|
|
|
|
## Usage with SentenceTransformers |
|
|
|
```python |
|
from sentence_transformers import CrossEncoder |
|
model = CrossEncoder("juanluisdb/MiniLM-L-6-rerank-reborn", max_length=512) |
|
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')]) |
|
``` |
|
|
|
# Evaluation |
|
|
|
### BEIR (NDCG@10) |
|
I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results. |
|
|
|
| | nq* | fever* | fiqa | trec-covid | scidocs | scifact | nfcorpus | hotpotqa | dbpedia-entity | quora | climate-fever | |
|
|:--------------------------|:----------|:----------|:----------|:-------------|:----------|:----------|:-----------|:-----------|:-----------------|:----------|:----------------| |
|
| bm25 | 0.305 | 0.638 | 0.238 | 0.589 | 0.150 | 0.676 | 0.318 | 0.629 | 0.319 | 0.787 | 0.163 | |
|
| jina-reranker-v1-turbo-en | 0.533 | 0.852 | 0.336 | 0.774 | 0.166 | 0.739 | 0.353 | 0.745 | 0.421 | 0.858 | 0.233 | |
|
| bge-reranker-v2-m3 | **0.597** | 0.857 | **0.397** | 0.784 | 0.169 | 0.731 | 0.336 | **0.794** | **0.445** | 0.858 | **0.314** | |
|
| mxbai-rerank-base-v1 | 0.535 | 0.767 | 0.382 | **0.830** | **0.171** | 0.719 | **0.353** | 0.668 | 0.416 | 0.747 | 0.253 | |
|
| ms-marco-MiniLM-L-6-v2 | 0.523 | 0.801 | 0.349 | 0.741 | 0.164 | 0.688 | 0.349 | 0.724 | 0.445 | 0.825 | 0.244 | |
|
| MiniLM-L-6-rerank-reborn | 0.580 | **0.867** | 0.364 | 0.738 | 0.165 | **0.750** | 0.350 | 0.775 | 0.444 | **0.871** | 0.309 | |
|
|
|
\* Training splits of NQ and Fever were used as part of the training data. |
|
|
|
Comparison with [ablated model](https://huggingface.co/juanluisdb/MiniLM-L-6-rerank-reborn-ablated/settings) trained only on MSMarco: |
|
| | nq | fever | fiqa | trec-covid | scidocs | scifact | nfcorpus | hotpotqa | dbpedia-entity | quora | climate-fever | |
|
|:------------------------------------|-------:|--------:|-------:|-------------:|----------:|----------:|-----------:|-----------:|-----------------:|--------:|----------------:| |
|
| ms-marco-MiniLM-L-6-v2 | 0.5234 | 0.8007 | 0.349 | 0.741 | 0.1638 | 0.688 | 0.3493 | 0.7235 | 0.4445 | 0.8251 | 0.2438 | |
|
| MiniLM-L-6-rerank-refreshed-ablated | 0.5412 | 0.8221 | 0.3598 | 0.7331 | 0.163 | 0.7376 | 0.3495 | 0.7583 | 0.4382 | 0.8619 | 0.2449 | |
|
| improvement (%) | **3.40** | **2.67** | **3.08** | -1.07 | -0.47 | **7.22** | 0.08 | **4.80** | -1.41 | **4.45** | **0.47** | |
|
|
|
|
|
# Datasets Used |
|
|
|
~900k queries with 32-way triplets were used from these datasets: |
|
|
|
* MSMarco |
|
* TriviaQA |
|
* Natural Questions |
|
* FEVER |