juanluisdb's picture
Update README.md
dfd64c8 verified
|
raw
history blame
4.36 kB
metadata
library_name: transformers
tags:
  - cross-encoder
datasets:
  - lightonai/ms-marco-en-bge
language:
  - en
base_model:
  - cross-encoder/ms-marco-MiniLM-L-6-v2

Model Card for Model ID

This model is finetuned starting from the well-known ms-marco-MiniLM-L-6-v2 using KL distillation techniques as described here, using bge-reranker-v2-m3 as teacher

Usage

Usage with Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn")
tokenizer = AutoTokenizer.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn")
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
    scores = model(**features).logits
    print(scores)

Usage with SentenceTransformers

from sentence_transformers import CrossEncoder
model = CrossEncoder("juanluisdb/MiniLM-L-6-rerank-reborn", max_length=512)
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])

Evaluation

BEIR (NDCG@10)

I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results.

nq* fever* fiqa trec-covid scidocs scifact nfcorpus hotpotqa dbpedia-entity quora climate-fever
bm25 0.305 0.638 0.238 0.589 0.150 0.676 0.318 0.629 0.319 0.787 0.163
jina-reranker-v1-turbo-en 0.533 0.852 0.336 0.774 0.166 0.739 0.353 0.745 0.421 0.858 0.233
bge-reranker-v2-m3 0.597 0.857 0.397 0.784 0.169 0.731 0.336 0.794 0.445 0.858 0.314
mxbai-rerank-base-v1 0.535 0.767 0.382 0.830 0.171 0.719 0.353 0.668 0.416 0.747 0.253
ms-marco-MiniLM-L-6-v2 0.523 0.801 0.349 0.741 0.164 0.688 0.349 0.724 0.445 0.825 0.244
MiniLM-L-6-rerank-reborn 0.580 0.867 0.364 0.738 0.165 0.750 0.350 0.775 0.444 0.871 0.309

* Training splits of NQ and Fever were used as part of the training data.

Comparison with ablated model trained only on MSMarco:

nq fever fiqa trec-covid scidocs scifact nfcorpus hotpotqa dbpedia-entity quora climate-fever
ms-marco-MiniLM-L-6-v2 0.5234 0.8007 0.349 0.741 0.1638 0.688 0.3493 0.7235 0.4445 0.8251 0.2438
MiniLM-L-6-rerank-refreshed-ablated 0.5412 0.8221 0.3598 0.7331 0.163 0.7376 0.3495 0.7583 0.4382 0.8619 0.2449
improvement (%) 3.40 2.67 3.08 -1.07 -0.47 7.22 0.08 4.80 -1.41 4.45 0.47

Datasets Used

~900k queries with 32-way triplets were used from these datasets:

  • MSMarco
  • TriviaQA
  • Natural Questions
  • FEVER