Edit model card

SentenceTransformer based on cross-encoder/ms-marco-MiniLM-L-6-v2

This is a sentence-transformers model finetuned from cross-encoder/ms-marco-MiniLM-L-6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: cross-encoder/ms-marco-MiniLM-L-6-v2
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Trelis/ms-marco-MiniLM-L-6-v2-2-cst-ep-MNRLtriplets-2e-5-batch32-gpu-overlap")
# Run inference
sentences = [
    'What is the minimum number of digits allowed for identifying numbers according to clause 4.3.1?',
    '2. 2 teams playing unregistered players are liable to forfeit any match in which unregistered players have competed. fit playing rules - 5th edition copyright © touch football australia 2020 5 3 the ball 3. 1 the game is played with an oval, inflated ball of a shape, colour and size approved by fit or the nta. 3. 2 the ball shall be inflated to the manufacturers ’ recommended air pressure. 3. 3 the referee shall immediately pause the match if the size and shape of the ball no longer complies with clauses 3. 1 or 3. 2 to allow for the ball to replaced or the issue rectified. 3. 4 the ball must not be hidden under player attire. 4 playing uniform 4. 1 participating players are to be correctly attired in matching team uniforms 4. 2 playing uniforms consist of shirt, singlet or other item as approved by the nta or nta competition provider, shorts and / or tights and socks. 4. 3 all players are to wear a unique identifying number not less than 16cm in height, clearly displayed on the rear of the playing top. 4. 3. 1 identifying numbers must feature no more than two ( 2 ) digits.',
    '24. 5 for the avoidance of doubt for clauses 24. 3 and 24. 4 the non - offending team will retain a numerical advantage on the field of play during the drop - off. 25 match officials 25. 1 the referee is the sole judge on all match related matters inside the perimeter for the duration of a match, has jurisdiction over all players, coaches and officials and is required to : 25. 1. 1 inspect the field of play, line markings and markers prior to the commencement of the match to ensure the safety of all participants. 25. 1. 2 adjudicate on the rules of the game ; 25. 1. 3 impose any sanction necessary to control the match ; 25. 1. 4 award tries and record the progressive score ; 25. 1. 5 maintain a count of touches during each possession ; 25. 1. 6 award penalties for infringements against the rules ; and 25. 1. 7 report to the relevant competition administration any sin bins, dismissals or injuries to any participant sustained during a match. 25. 2 only team captains are permitted to seek clarification of a decision directly from the referee. an approach may only be made during a break in play or at the discretion of the referee.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • lr_scheduler_type: constant
  • warmup_ratio: 0.3
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: constant
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.3
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss
0.0066 2 4.4256 -
0.0131 4 4.1504 -
0.0197 6 4.0494 -
0.0262 8 4.0447 -
0.0328 10 3.9851 -
0.0393 12 3.9284 -
0.0459 14 3.9155 -
0.0525 16 3.8791 -
0.0590 18 3.8663 -
0.0656 20 3.9012 -
0.0721 22 3.8999 -
0.0787 24 3.7895 -
0.0852 26 3.7235 -
0.0918 28 3.7938 -
0.0984 30 3.5057 -
0.1049 32 3.5776 -
0.1115 34 3.5092 -
0.1180 36 3.7226 -
0.1246 38 3.5426 -
0.1311 40 3.7318 -
0.1377 42 3.529 -
0.1443 44 3.5977 -
0.1508 46 3.6484 -
0.1574 48 3.5026 -
0.1639 50 3.4568 -
0.1705 52 3.6119 -
0.1770 54 3.4206 -
0.1836 56 3.3701 -
0.1902 58 3.3232 -
0.1967 60 3.3398 -
0.2033 62 3.333 -
0.2098 64 3.3587 -
0.2164 66 3.1304 -
0.2230 68 3.0618 -
0.2295 70 3.145 -
0.2361 72 3.2074 -
0.2426 74 3.0436 -
0.2492 76 3.0572 -
0.2525 77 - 3.0810
0.2557 78 3.1225 -
0.2623 80 2.8197 -
0.2689 82 2.8979 -
0.2754 84 2.7827 -
0.2820 86 2.9472 -
0.2885 88 2.918 -
0.2951 90 2.7035 -
0.3016 92 2.6876 -
0.3082 94 2.8322 -
0.3148 96 2.6335 -
0.3213 98 2.3754 -
0.3279 100 3.0978 -
0.3344 102 2.4946 -
0.3410 104 2.5085 -
0.3475 106 2.7456 -
0.3541 108 2.3934 -
0.3607 110 2.3222 -
0.3672 112 2.4773 -
0.3738 114 2.6684 -
0.3803 116 2.2435 -
0.3869 118 2.243 -
0.3934 120 2.228 -
0.4 122 2.4652 -
0.4066 124 2.2113 -
0.4131 126 2.0805 -
0.4197 128 2.5041 -
0.4262 130 2.4489 -
0.4328 132 2.2474 -
0.4393 134 2.0252 -
0.4459 136 2.257 -
0.4525 138 1.9381 -
0.4590 140 2.0183 -
0.4656 142 2.1021 -
0.4721 144 2.1508 -
0.4787 146 1.9669 -
0.4852 148 1.7468 -
0.4918 150 1.8776 -
0.4984 152 1.8081 -
0.5049 154 1.6799 1.6088
0.5115 156 1.9628 -
0.5180 158 1.8253 -
0.5246 160 1.7791 -
0.5311 162 1.8463 -
0.5377 164 1.6357 -
0.5443 166 1.6531 -
0.5508 168 1.6747 -
0.5574 170 1.5666 -
0.5639 172 1.7272 -
0.5705 174 1.6045 -
0.5770 176 1.3786 -
0.5836 178 1.6547 -
0.5902 180 1.6416 -
0.5967 182 1.4796 -
0.6033 184 1.4595 -
0.6098 186 1.4106 -
0.6164 188 1.4844 -
0.6230 190 1.4581 -
0.6295 192 1.4922 -
0.6361 194 1.2978 -
0.6426 196 1.2612 -
0.6492 198 1.4725 -
0.6557 200 1.3162 -
0.6623 202 1.3736 -
0.6689 204 1.4553 -
0.6754 206 1.4011 -
0.6820 208 1.2523 -
0.6885 210 1.3732 -
0.6951 212 1.3721 -
0.7016 214 1.5262 -
0.7082 216 1.2631 -
0.7148 218 1.6174 -
0.7213 220 1.4252 -
0.7279 222 1.3527 -
0.7344 224 1.1969 -
0.7410 226 1.2901 -
0.7475 228 1.4379 -
0.7541 230 1.1332 -
0.7574 231 - 1.0046
0.7607 232 1.3693 -
0.7672 234 1.3097 -
0.7738 236 1.2314 -
0.7803 238 1.0873 -
0.7869 240 1.2882 -
0.7934 242 1.1723 -
0.8 244 1.1748 -
0.8066 246 1.2916 -
0.8131 248 1.0894 -
0.8197 250 1.2299 -
0.8262 252 1.207 -
0.8328 254 1.1361 -
0.8393 256 1.1323 -
0.8459 258 1.0927 -
0.8525 260 1.1433 -
0.8590 262 1.1088 -
0.8656 264 1.1384 -
0.8721 266 1.0962 -
0.8787 268 1.1878 -
0.8852 270 1.0113 -
0.8918 272 1.1411 -
0.8984 274 1.0289 -
0.9049 276 1.0163 -
0.9115 278 1.2859 -
0.9180 280 0.9449 -
0.9246 282 1.0941 -
0.9311 284 1.0908 -
0.9377 286 1.1028 -
0.9443 288 1.0633 -
0.9508 290 1.1004 -
0.9574 292 1.0483 -
0.9639 294 1.0064 -
0.9705 296 1.0088 -
0.9770 298 1.0068 -
0.9836 300 1.1903 -
0.9902 302 0.9401 -
0.9967 304 0.8369 -
1.0033 306 0.5046 -
1.0098 308 1.0626 0.8660
1.0164 310 0.9587 -
1.0230 312 1.0565 -
1.0295 314 1.1329 -
1.0361 316 1.1857 -
1.0426 318 0.9777 -
1.0492 320 0.9883 -
1.0557 322 0.9076 -
1.0623 324 0.7942 -
1.0689 326 1.1952 -
1.0754 328 0.9726 -
1.0820 330 1.0663 -
1.0885 332 1.0337 -
1.0951 334 0.9522 -
1.1016 336 0.9813 -
1.1082 338 0.9057 -
1.1148 340 1.0076 -
1.1213 342 0.8557 -
1.1279 344 0.9341 -
1.1344 346 0.9188 -
1.1410 348 1.091 -
1.1475 350 0.8205 -
1.1541 352 1.0509 -
1.1607 354 0.9201 -
1.1672 356 1.0741 -
1.1738 358 0.8662 -
1.1803 360 0.9468 -
1.1869 362 0.8604 -
1.1934 364 0.8141 -
1.2 366 0.9475 -
1.2066 368 0.8407 -
1.2131 370 0.764 -
1.2197 372 0.798 -
1.2262 374 0.8205 -
1.2328 376 0.7995 -
1.2393 378 0.9305 -
1.2459 380 0.858 -
1.2525 382 0.8465 -
1.2590 384 0.7691 -
1.2623 385 - 0.7879
1.2656 386 1.0073 -
1.2721 388 0.8026 -
1.2787 390 0.8108 -
1.2852 392 0.7783 -
1.2918 394 0.8766 -
1.2984 396 0.8576 -
1.3049 398 0.884 -
1.3115 400 0.9547 -
1.3180 402 0.9231 -
1.3246 404 0.8027 -
1.3311 406 0.9117 -
1.3377 408 0.7743 -
1.3443 410 0.8257 -
1.3508 412 0.8738 -
1.3574 414 0.972 -
1.3639 416 0.8297 -
1.3705 418 0.8941 -
1.3770 420 0.8513 -
1.3836 422 0.7588 -
1.3902 424 0.8332 -
1.3967 426 0.7682 -
1.4033 428 0.7916 -
1.4098 430 0.9519 -
1.4164 432 1.0526 -
1.4230 434 0.8724 -
1.4295 436 0.8267 -
1.4361 438 0.7672 -
1.4426 440 0.7977 -
1.4492 442 0.6947 -
1.4557 444 0.9042 -
1.4623 446 0.8971 -
1.4689 448 0.9655 -
1.4754 450 0.8512 -
1.4820 452 0.9421 -
1.4885 454 0.9501 -
1.4951 456 0.8214 -
1.5016 458 0.9335 -
1.5082 460 0.7617 -
1.5148 462 0.8601 0.7855
1.5213 464 0.757 -
1.5279 466 0.7389 -
1.5344 468 0.8146 -
1.5410 470 0.9235 -
1.5475 472 0.9901 -
1.5541 474 0.9624 -
1.5607 476 0.8909 -
1.5672 478 0.7276 -
1.5738 480 0.9444 -
1.5803 482 0.874 -
1.5869 484 0.7985 -
1.5934 486 0.9335 -
1.6 488 0.8108 -
1.6066 490 0.7779 -
1.6131 492 0.8807 -
1.6197 494 0.8146 -
1.6262 496 0.9218 -
1.6328 498 0.8439 -
1.6393 500 0.7348 -
1.6459 502 0.8533 -
1.6525 504 0.7695 -
1.6590 506 0.7911 -
1.6656 508 0.837 -
1.6721 510 0.731 -
1.6787 512 0.911 -
1.6852 514 0.7963 -
1.6918 516 0.7719 -
1.6984 518 0.8011 -
1.7049 520 0.7428 -
1.7115 522 0.8159 -
1.7180 524 0.7833 -
1.7246 526 0.7934 -
1.7311 528 0.7854 -
1.7377 530 0.8398 -
1.7443 532 0.7875 -
1.7508 534 0.7282 -
1.7574 536 0.8269 -
1.7639 538 0.8033 -
1.7672 539 - 0.7595
1.7705 540 0.9471 -
1.7770 542 0.941 -
1.7836 544 0.725 -
1.7902 546 0.8978 -
1.7967 548 0.8361 -
1.8033 550 0.7092 -
1.8098 552 0.809 -
1.8164 554 0.9399 -
1.8230 556 0.769 -
1.8295 558 0.7381 -
1.8361 560 0.7554 -
1.8426 562 0.8553 -
1.8492 564 0.919 -
1.8557 566 0.7479 -
1.8623 568 0.8381 -
1.8689 570 0.7911 -
1.8754 572 0.8076 -
1.8820 574 0.7868 -
1.8885 576 0.9147 -
1.8951 578 0.7271 -
1.9016 580 0.7201 -
1.9082 582 0.7538 -
1.9148 584 0.7522 -
1.9213 586 0.7737 -
1.9279 588 0.7187 -
1.9344 590 0.8713 -
1.9410 592 0.7971 -
1.9475 594 0.8226 -
1.9541 596 0.7074 -
1.9607 598 0.804 -
1.9672 600 0.7259 -
1.9738 602 0.7758 -
1.9803 604 0.8209 -
1.9869 606 0.7918 -
1.9934 608 0.7467 -
2.0 610 0.4324 -

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.3
  • PyTorch: 2.1.1+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.17.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
4
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Trelis/ms-marco-MiniLM-L-6-v2-2-cst-ep-MNRLtriplets-2e-5-batch32-gpu-overlap

Finetuned
(6)
this model