metadata

library_name: transformers
tags:
  - cross-encoder
datasets:
  - lightonai/ms-marco-en-bge
language:
  - en
base_model:
  - cross-encoder/ms-marco-MiniLM-L-6-v2

Model Card for Model ID

This model is finetuned starting from the well-known ms-marco-MiniLM-L-6-v2 using KL distillation techniques as described here, using bge-reranker-v2-m3 as teacher

Usage

Usage with Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn")
tokenizer = AutoTokenizer.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn")
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
    scores = model(**features).logits
    print(scores)

Usage with SentenceTransformers

from sentence_transformers import CrossEncoder
model = CrossEncoder("juanluisdb/MiniLM-L-6-rerank-reborn", max_length=512)
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])

Evaluation

BEIR (NDCG@10)

I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results.

	nq*	fever*	fiqa	trec-covid	scidocs	scifact	nfcorpus	hotpotqa	dbpedia-entity	quora	climate-fever
bm25	0.305	0.638	0.238	0.589	0.150	0.676	0.318	0.629	0.319	0.787	0.163
jina-reranker-v1-turbo-en	0.533	0.852	0.336	0.774	0.166	0.739	0.353	0.745	0.421	0.858	0.233
bge-reranker-v2-m3	0.597	0.857	0.397	0.784	0.169	0.731	0.336	0.794	0.445	0.858	0.314
mxbai-rerank-base-v1	0.535	0.767	0.382	0.830	0.171	0.719	0.353	0.668	0.416	0.747	0.253
ms-marco-MiniLM-L-6-v2	0.523	0.801	0.349	0.741	0.164	0.688	0.349	0.724	0.445	0.825	0.244
MiniLM-L-6-rerank-reborn	0.580	0.867	0.364	0.738	0.165	0.750	0.350	0.775	0.444	0.871	0.309

* Training splits of NQ and Fever were used as part of the training data.

Comparison with ablated model trained only on MSMarco:

	nq	fever	fiqa	trec-covid	scidocs	scifact	nfcorpus	hotpotqa	dbpedia-entity	quora	climate-fever
ms-marco-MiniLM-L-6-v2	0.5234	0.8007	0.349	0.741	0.1638	0.688	0.3493	0.7235	0.4445	0.8251	0.2438
MiniLM-L-6-rerank-refreshed-ablated	0.5412	0.8221	0.3598	0.7331	0.163	0.7376	0.3495	0.7583	0.4382	0.8619	0.2449
improvement (%)	3.40	2.67	3.08	-1.07	-0.47	7.22	0.08	4.80	-1.41	4.45	0.47

Datasets Used

~900k queries with 32-way triplets were used from these datasets:

MSMarco
TriviaQA
Natural Questions
FEVER