MPNet base trained on Natural Questions pairs
This is a sentence-transformers model finetuned from microsoft/mpnet-base on the natural-questions dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: microsoft/mpnet-base
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/mpnet-base-natural-questions-mnrl")
# Run inference
sentences = [
"who was ancient china's main enemy that lived to the north",
'Sui dynasty The Sui Dynasty (Chinese: 隋朝; pinyin: Suí cháo) was a short-lived imperial dynasty of China of pivotal significance. The Sui unified the Northern and Southern dynasties and reinstalled the rule of ethnic Han Chinese in the entirety of China proper, along with sinicization of former nomadic ethnic minorities (the Five Barbarians) within its territory. It was succeeded by the Tang dynasty, which largely inherited its foundation.',
'Sampath Bank Sampath Bank PLC is a licensed commercial bank incorporated in Sri Lanka in 1986 with 229 branches and 373 ATMs island wide. It has won the "Bank of the Year" award by "The Banker" of Financial Times Limited – London, for the second consecutive year and the "National Business Excellence Awards 2010".[citation needed] It has become the third largest private sector bank in Sri Lanka with Rs. 453 billion in deposits as of 30 June 2016.[1]',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
natural-questions-dev
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.5899 |
cosine_accuracy@3 | 0.8181 |
cosine_accuracy@5 | 0.8876 |
cosine_accuracy@10 | 0.9434 |
cosine_precision@1 | 0.5899 |
cosine_precision@3 | 0.2727 |
cosine_precision@5 | 0.1775 |
cosine_precision@10 | 0.0943 |
cosine_recall@1 | 0.5899 |
cosine_recall@3 | 0.8181 |
cosine_recall@5 | 0.8876 |
cosine_recall@10 | 0.9434 |
cosine_ndcg@10 | 0.7719 |
cosine_mrr@10 | 0.7162 |
cosine_map@100 | 0.7188 |
dot_accuracy@1 | 0.561 |
dot_accuracy@3 | 0.7971 |
dot_accuracy@5 | 0.8664 |
dot_accuracy@10 | 0.9328 |
dot_precision@1 | 0.561 |
dot_precision@3 | 0.2657 |
dot_precision@5 | 0.1733 |
dot_precision@10 | 0.0933 |
dot_recall@1 | 0.561 |
dot_recall@3 | 0.7971 |
dot_recall@5 | 0.8664 |
dot_recall@10 | 0.9328 |
dot_ndcg@10 | 0.7508 |
dot_mrr@10 | 0.692 |
dot_map@100 | 0.6951 |
Training Details
Training Dataset
natural-questions
- Dataset: natural-questions at f9e894e
- Size: 100,231 training samples
- Columns:
query
andanswer
- Approximate statistics based on the first 1000 samples:
query answer type string string details - min: 10 tokens
- mean: 11.74 tokens
- max: 21 tokens
- min: 17 tokens
- mean: 135.66 tokens
- max: 512 tokens
- Samples:
query answer when did richmond last play in a preliminary final
Richmond Football Club Richmond began 2017 with 5 straight wins, a feat it had not achieved since 1995. A series of close losses hampered the Tigers throughout the middle of the season, including a 5-point loss to the Western Bulldogs, 2-point loss to Fremantle, and a 3-point loss to the Giants. Richmond ended the season strongly with convincing victories over Fremantle and St Kilda in the final two rounds, elevating the club to 3rd on the ladder. Richmond's first final of the season against the Cats at the MCG attracted a record qualifying final crowd of 95,028; the Tigers won by 51 points. Having advanced to the first preliminary finals for the first time since 2001, Richmond defeated Greater Western Sydney by 36 points in front of a crowd of 94,258 to progress to the Grand Final against Adelaide, their first Grand Final appearance since 1982. The attendance was 100,021, the largest crowd to a grand final since 1986. The Crows led at quarter time and led by as many as 13, but the Tigers took over the game as it progressed and scored seven straight goals at one point. They eventually would win by 48 points – 16.12 (108) to Adelaide's 8.12 (60) – to end their 37-year flag drought.[22] Dustin Martin also became the first player to win a Premiership medal, the Brownlow Medal and the Norm Smith Medal in the same season, while Damien Hardwick was named AFL Coaches Association Coach of the Year. Richmond's jump from 13th to premiers also marked the biggest jump from one AFL season to the next.
who sang what in the world's come over you
Jack Scott (singer) At the beginning of 1960, Scott again changed record labels, this time to Top Rank Records.[1] He then recorded four Billboard Hot 100 hits – "What in the World's Come Over You" (#5), "Burning Bridges" (#3) b/w "Oh Little One" (#34), and "It Only Happened Yesterday" (#38).[1] "What in the World's Come Over You" was Scott's second gold disc winner.[6] Scott continued to record and perform during the 1960s and 1970s.[1] His song "You're Just Gettin' Better" reached the country charts in 1974.[1] In May 1977, Scott recorded a Peel session for BBC Radio 1 disc jockey, John Peel.
who produces the most wool in the world
Wool Global wool production is about 2 million tonnes per year, of which 60% goes into apparel. Wool comprises ca 3% of the global textile market, but its value is higher owing to dying and other modifications of the material.[1] Australia is a leading producer of wool which is mostly from Merino sheep but has been eclipsed by China in terms of total weight.[30] New Zealand (2016) is the third-largest producer of wool, and the largest producer of crossbred wool. Breeds such as Lincoln, Romney, Drysdale, and Elliotdale produce coarser fibers, and wool from these sheep is usually used for making carpets.
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
natural-questions
- Dataset: natural-questions at f9e894e
- Size: 100,231 evaluation samples
- Columns:
query
andanswer
- Approximate statistics based on the first 1000 samples:
query answer type string string details - min: 10 tokens
- mean: 11.79 tokens
- max: 25 tokens
- min: 15 tokens
- mean: 142.78 tokens
- max: 512 tokens
- Samples:
query answer who betrayed siraj ud daula in the battle of plassey in 1757
Siraj ud-Daulah The Battle of Plassey (or Palashi) is widely considered the turning point in the history of the subcontinent, and opened the way to eventual British domination. After Siraj-ud-Daulah's conquest of Calcutta, the British sent fresh troops from Madras to recapture the fort and avenge the attack. A retreating Siraj-ud-Daulah met the British at Plassey. He had to make camp 27 miles away from Murshidabad. On 23 June 1757 Siraj-ud-Daulah called on Mir Jafar because he was saddened by the sudden fall of Mir Mardan who was a very dear companion of Siraj in battles. The Nawab asked for help from Mir Jafar. Mir Jafar advised Siraj to retreat for that day. The Nawab made the blunder in giving the order to stop the fight. Following his command, the soldiers of the Nawab were returning to their camps. At that time, Robert Clive attacked the soldiers with his army. At such a sudden attack, the army of Siraj became indisciplined and could think of no way to fight. So all fled away in such a situation. Betrayed by a conspiracy plotted by Jagat Seth, Mir Jafar, Krishna Chandra, Omichund etc., he lost the battle and had to escape. He went first to Murshidabad and then to Patna by boat, but was eventually arrested by Mir Jafar's soldiers.
what is the meaning of single malt whisky
Single malt whisky Single malt whisky is malt whisky from a single distillery, that is, whisky distilled from fermented mash made exclusively with malted grain (usually barley), as distinguished from unmalted grain.
when is despicable me 3 going to release
Despicable Me 3 Despicable Me 3 premiered on June 14, 2017, at the Annecy International Animated Film Festival, and was released in the United States on June 30, 2017, by Universal Pictures in 3D, RealD 3D, Dolby Cinema, and IMAX 3D. The film received mixed reviews from critics[7] and has grossed over $1 billion worldwide, making it the third highest-grossing film of 2017, the fifth highest-grossing animated film of all time and the 28th highest-grossing overall. It is Illumination's second film to gross over $1 billion, after Minions in 2015, becoming the first ever animated franchise to do so.
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 32per_device_eval_batch_size
: 32learning_rate
: 2e-05num_train_epochs
: 1warmup_ratio
: 0.1bf16
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 32per_device_eval_batch_size
: 32per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | loss | natural-questions-dev_cosine_map@100 |
---|---|---|---|---|
0 | 0 | - | - | 0.1228 |
0.0004 | 1 | 3.0833 | - | - |
0.0355 | 100 | 1.3516 | 0.1545 | 0.5151 |
0.0711 | 200 | 0.1189 | 0.0607 | 0.6299 |
0.1066 | 300 | 0.0641 | 0.0450 | 0.6535 |
0.1422 | 400 | 0.0529 | 0.0436 | 0.6532 |
0.1777 | 500 | 0.0601 | 0.0349 | 0.6716 |
0.2133 | 600 | 0.0453 | 0.0308 | 0.6771 |
0.2488 | 700 | 0.0478 | 0.0298 | 0.6769 |
0.2844 | 800 | 0.0404 | 0.0309 | 0.6834 |
0.3199 | 900 | 0.0377 | 0.0275 | 0.6855 |
0.3555 | 1000 | 0.0391 | 0.0248 | 0.6929 |
0.3910 | 1100 | 0.026 | 0.0265 | 0.6919 |
0.4266 | 1200 | 0.0343 | 0.0247 | 0.6985 |
0.4621 | 1300 | 0.0359 | 0.0245 | 0.6951 |
0.4977 | 1400 | 0.0283 | 0.0213 | 0.6993 |
0.5332 | 1500 | 0.027 | 0.0207 | 0.7072 |
0.5688 | 1600 | 0.0313 | 0.0223 | 0.6980 |
0.6043 | 1700 | 0.0373 | 0.0203 | 0.7042 |
0.6399 | 1800 | 0.0245 | 0.0199 | 0.7049 |
0.6754 | 1900 | 0.0294 | 0.0186 | 0.7143 |
0.7110 | 2000 | 0.0185 | 0.0185 | 0.7116 |
0.7465 | 2100 | 0.0247 | 0.0181 | 0.7118 |
0.7821 | 2200 | 0.0221 | 0.0183 | 0.7142 |
0.8176 | 2300 | 0.0178 | 0.0182 | 0.7141 |
0.8532 | 2400 | 0.0235 | 0.0170 | 0.7172 |
0.8887 | 2500 | 0.0279 | 0.0168 | 0.7190 |
0.9243 | 2600 | 0.0278 | 0.0167 | 0.7188 |
0.9598 | 2700 | 0.022 | 0.0166 | 0.7179 |
0.9954 | 2800 | 0.0191 | 0.0166 | 0.7173 |
1.0 | 2813 | - | - | 0.7188 |
Environmental Impact
Carbon emissions were measured using CodeCarbon.
- Energy Consumed: 0.453 kWh
- Carbon Emitted: 0.176 kg of CO2
- Hours Used: 1.208 hours
Training Hardware
- On Cloud: No
- GPU Model: 1 x NVIDIA GeForce RTX 3090
- CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
- RAM Size: 31.78 GB
Framework Versions
- Python: 3.11.6
- Sentence Transformers: 3.1.0.dev0
- Transformers: 4.41.2
- PyTorch: 2.3.1+cu121
- Accelerate: 0.31.0
- Datasets: 2.20.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 9
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for tomaarsen/mpnet-base-natural-questions-mnrl
Base model
microsoft/mpnet-baseDataset used to train tomaarsen/mpnet-base-natural-questions-mnrl
Evaluation results
- Cosine Accuracy@1 on natural questions devself-reported0.590
- Cosine Accuracy@3 on natural questions devself-reported0.818
- Cosine Accuracy@5 on natural questions devself-reported0.888
- Cosine Accuracy@10 on natural questions devself-reported0.943
- Cosine Precision@1 on natural questions devself-reported0.590
- Cosine Precision@3 on natural questions devself-reported0.273
- Cosine Precision@5 on natural questions devself-reported0.178
- Cosine Precision@10 on natural questions devself-reported0.094
- Cosine Recall@1 on natural questions devself-reported0.590
- Cosine Recall@3 on natural questions devself-reported0.818