Update README.md

dfd64c8 verified about 2 months ago

4.36 kB

	---
	library_name: transformers
	tags:
	- cross-encoder
	datasets:
	- lightonai/ms-marco-en-bge
	language:
	- en
	base_model:
	- cross-encoder/ms-marco-MiniLM-L-6-v2
	---

	# Model Card for Model ID

	This model is finetuned starting from the well-known [ms-marco-MiniLM-L-6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2) using KL distillation techniques as described [here](https://www.answer.ai/posts/2024-08-13-small-but-mighty-colbert.html),
	using [bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) as teacher

	# Usage

	## Usage with Transformers

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch
	model = AutoModelForSequenceClassification.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn")
	tokenizer = AutoTokenizer.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn")
	features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'], padding=True, truncation=True, return_tensors="pt")
	model.eval()
	with torch.no_grad():
	scores = model(**features).logits
	print(scores)
	```


	## Usage with SentenceTransformers

	```python
	from sentence_transformers import CrossEncoder
	model = CrossEncoder("juanluisdb/MiniLM-L-6-rerank-reborn", max_length=512)
	scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])
	```

	# Evaluation

	### BEIR (NDCG@10)
	I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results.

	\| \| nq* \| fever* \| fiqa \| trec-covid \| scidocs \| scifact \| nfcorpus \| hotpotqa \| dbpedia-entity \| quora \| climate-fever \|
	\|:--------------------------\|:----------\|:----------\|:----------\|:-------------\|:----------\|:----------\|:-----------\|:-----------\|:-----------------\|:----------\|:----------------\|
	\| bm25 \| 0.305 \| 0.638 \| 0.238 \| 0.589 \| 0.150 \| 0.676 \| 0.318 \| 0.629 \| 0.319 \| 0.787 \| 0.163 \|
	\| jina-reranker-v1-turbo-en \| 0.533 \| 0.852 \| 0.336 \| 0.774 \| 0.166 \| 0.739 \| 0.353 \| 0.745 \| 0.421 \| 0.858 \| 0.233 \|
	\| bge-reranker-v2-m3 \| 0.597 \| 0.857 \| 0.397 \| 0.784 \| 0.169 \| 0.731 \| 0.336 \| 0.794 \| 0.445 \| 0.858 \| 0.314 \|
	\| mxbai-rerank-base-v1 \| 0.535 \| 0.767 \| 0.382 \| 0.830 \| 0.171 \| 0.719 \| 0.353 \| 0.668 \| 0.416 \| 0.747 \| 0.253 \|
	\| ms-marco-MiniLM-L-6-v2 \| 0.523 \| 0.801 \| 0.349 \| 0.741 \| 0.164 \| 0.688 \| 0.349 \| 0.724 \| 0.445 \| 0.825 \| 0.244 \|
	\| MiniLM-L-6-rerank-reborn \| 0.580 \| 0.867 \| 0.364 \| 0.738 \| 0.165 \| 0.750 \| 0.350 \| 0.775 \| 0.444 \| 0.871 \| 0.309 \|

	\* Training splits of NQ and Fever were used as part of the training data.

	Comparison with [ablated model](https://huggingface.co/juanluisdb/MiniLM-L-6-rerank-reborn-ablated/settings) trained only on MSMarco:
	\| \| nq \| fever \| fiqa \| trec-covid \| scidocs \| scifact \| nfcorpus \| hotpotqa \| dbpedia-entity \| quora \| climate-fever \|
	\|:------------------------------------\|-------:\|--------:\|-------:\|-------------:\|----------:\|----------:\|-----------:\|-----------:\|-----------------:\|--------:\|----------------:\|
	\| ms-marco-MiniLM-L-6-v2 \| 0.5234 \| 0.8007 \| 0.349 \| 0.741 \| 0.1638 \| 0.688 \| 0.3493 \| 0.7235 \| 0.4445 \| 0.8251 \| 0.2438 \|
	\| MiniLM-L-6-rerank-refreshed-ablated \| 0.5412 \| 0.8221 \| 0.3598 \| 0.7331 \| 0.163 \| 0.7376 \| 0.3495 \| 0.7583 \| 0.4382 \| 0.8619 \| 0.2449 \|
	\| improvement (%) \| 3.40 \| 2.67 \| 3.08 \| -1.07 \| -0.47 \| 7.22 \| 0.08 \| 4.80 \| -1.41 \| 4.45 \| 0.47 \|


	# Datasets Used

	~900k queries with 32-way triplets were used from these datasets:

	* MSMarco
	* TriviaQA
	* Natural Questions
	* FEVER