Passage Reranking Multilingual BERT 🔃 🌍

Model description

Input: Supports over 100 Languages. See List of supported languages for all available.

Purpose: This module takes a search query [1] and a passage [2] and calculates if the passage matches the query. It can be used as an improvement for Elasticsearch Results and boosts the relevancy by up to 100%.

Architecture: On top of BERT there is a Densly Connected NN which takes the 768 Dimensional [CLS] Token as input and provides the output (Arxiv).

Output: Just a single value between between -10 and 10. Better matching query,passage pairs tend to have a higher a score.

Intended uses & limitations

Both query[1] and passage[2] have to fit in 512 Tokens. As you normally want to rerank the first dozens of search results keep in mind the inference time of approximately 300 ms/query.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("amberoad/bert-multilingual-passage-reranking-msmarco")

model = AutoModelForSequenceClassification.from_pretrained("amberoad/bert-multilingual-passage-reranking-msmarco")

This Model can be used as a drop-in replacement in the Nboost Library Through this you can directly improve your Elasticsearch Results without any coding.

Training data

This model is trained using the Microsoft MS Marco Dataset. This training dataset contains approximately 400M tuples of a query, relevant and non-relevant passages. All datasets used for training and evaluating are listed in this table. The used dataset for training is called Train Triples Large, while the evaluation was made on Top 1000 Dev. There are 6,900 queries in total in the development dataset, where each query is mapped to top 1,000 passage retrieved using BM25 from MS MARCO corpus.

Training procedure

The training is performed the same way as stated in this README. See their excellent Paper on Arxiv.

We changed the BERT Model from an English only to the default BERT Multilingual uncased Model from Google.

Training was done 400 000 Steps. This equaled 12 hours an a TPU V3-8.

Eval results

We see nearly similar performance than the English only Model in the English Bing Queries Dataset. Although the training data is English only internal Tests on private data showed a far higher accurancy in German than all other available models.

Fine-tuned Models	Eval Set	Search Boost	Speed on GPU
`amberoad/Multilingual-uncased-MSMARCO` (This Model)	bing queries	+61% _{^{(0.29 vs 0.18)}}	~300 ms/query
`nboost/pt-tinybert-msmarco`	bing queries	+45% _{^{(0.26 vs 0.18)}}	~50ms/query
`nboost/pt-bert-base-uncased-msmarco`	bing queries	+62% _{^{(0.29 vs 0.18)}}	~300 ms/query
`nboost/pt-bert-large-msmarco`	bing queries	+77% _{^{(0.32 vs 0.18)}}	-
`nboost/pt-biobert-base-msmarco`	biomed	+66% _{^{(0.17 vs 0.10)}}	~300 ms/query