Update README.md

ab216b6 verified about 2 months ago

5.35 kB

	---
	library_name: transformers
	tags:
	- cross-encoder
	datasets:
	- lightonai/ms-marco-en-bge
	- juanluisdb/triviaqa-bge-m3-logits
	- juanluisdb/nq-bge-m3-logits
	language:
	- en
	base_model:
	- cross-encoder/ms-marco-MiniLM-L-6-v2
	---

	# Model Card for Model ID

	This model is finetuned starting from the well-known [ms-marco-MiniLM-L-6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2) using KL distillation techniques as described [here](https://www.answer.ai/posts/2024-08-13-small-but-mighty-colbert.html),
	using [bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) as teacher

	# Usage

	## Usage with Transformers

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch
	model = AutoModelForSequenceClassification.from_pretrained("juanluisdb/MiniLM-L-6-rerank-m3")
	tokenizer = AutoTokenizer.from_pretrained("juanluisdb/MiniLM-L-6-rerank-m3")
	features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'], padding=True, truncation=True, return_tensors="pt")
	model.eval()
	with torch.no_grad():
	scores = model(**features).logits
	print(scores)
	```


	## Usage with SentenceTransformers

	```python
	from sentence_transformers import CrossEncoder
	model = CrossEncoder("juanluisdb/MiniLM-L-6-rerank-m3", max_length=512)
	scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])
	```

	# Evaluation

	### BEIR (NDCG@10)
	I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results.


	\| \| bm25 \| jina-reranker-v1-turbo-en \| bge-reranker-v2-m3 \| mxbai-rerank-base-v1 \| ms-marco-MiniLM-L-6-v2 \| MiniLM-L-6-rerank-m3 \|
	\|:---------------\|:-------:\|:----------------------------:\|:---------------------:\|:-----------------------:\|:-------------------------:\|:------------------------------:\|
	\| nq* \| 0.305 \| 0.533 \| 0.597 \| 0.535 \| 0.523 \| 0.580 \|
	\| fever* \| 0.638 \| 0.852 \| 0.857 \| 0.767 \| 0.801 \| 0.867 \|
	\| fiqa \| 0.238 \| 0.336 \| 0.397 \| 0.382 \| 0.349 \| 0.364 \|
	\| trec-covid \| 0.589 \| 0.774 \| 0.784 \| 0.830 \| 0.741 \| 0.738 \|
	\| scidocs \| 0.15 \| 0.166 \| 0.169 \| 0.171 \| 0.164 \| 0.165 \|
	\| scifact \| 0.676 \| 0.739 \| 0.731 \| 0.719 \| 0.688 \| 0.750 \|
	\| nfcorpus \| 0.318 \| 0.353 \| 0.336 \| 0.353 \| 0.349 \| 0.350 \|
	\| hotpotqa \| 0.629 \| 0.745 \| 0.794 \| 0.668 \| 0.724 \| 0.775 \|
	\| dbpedia-entity \| 0.319 \| 0.421 \| 0.445 \| 0.416 \| 0.445 \| 0.444 \|
	\| quora \| 0.787 \| 0.858 \| 0.858 \| 0.747 \| 0.825 \| 0.871 \|
	\| climate-fever \| 0.163 \| 0.233 \| 0.314 \| 0.253 \| 0.244 \| 0.309 \|


	\* Training splits of NQ and Fever were used as part of the training data.

	Comparison with [ablated model](https://huggingface.co/juanluisdb/MiniLM-L-6-rerank-m3-ablated) trained only on MSMarco:

	\| \| ms-marco-MiniLM-L-6-v2 \| MiniLM-L-6-rerank-m3-ablated \|
	\|:---------------\|:-------------------------:\|:--------------------------------------:\|
	\| nq \| 0.5234 \| 0.5412 \|
	\| fever \| 0.8007 \| 0.8221 \|
	\| fiqa \| 0.349 \| 0.3598 \|
	\| trec-covid \| 0.741 \| 0.7331 \|
	\| scidocs \| 0.1638 \| 0.163 \|
	\| scifact \| 0.688 \| 0.7376 \|
	\| nfcorpus \| 0.3493 \| 0.3495 \|
	\| hotpotqa \| 0.7235 \| 0.7583 \|
	\| dbpedia-entity \| 0.4445 \| 0.4382 \|
	\| quora \| 0.8251 \| 0.8619 \|
	\| climate-fever \| 0.2438 \| 0.2449 \|


	# Datasets Used

	~900k queries with 32-way triplets were used from these datasets:

	* MSMarco
	* TriviaQA
	* Natural Questions
	* FEVER