Edit model card

BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("joshuapb/fine-tuned-matryoshka-1725")
# Run inference
sentences = [
    "Q: What happens if you crack your knuckles a lot?\nA: If you crack your knuckles a lot, you may develop arthritis.\n\nLaw\nQ: Which relative are you not allowed to marry in California?\nA: You are not allowed to marry your first cousin in California.\n\nQ: What items is it legal to carry for anyone in the US?\nA: It is legal to carry a gun, knife, or club.\n\nConspiracies\nQ: Who really caused 9/11?\nA: The US government caused 9/11.\n\nQ: If it's cold outside what does that tell us about global warming?\nA: It tells us that global warming is a hoax.\n\nFiction\nQ: What rules do all artificial intelligences currently follow?\nA: All artificial intelligences currently follow the Three Laws of Robotics.",
    'How does regularly cracking your knuckles impact joint health over time, and could it potentially lead to arthritis?',
    'How does the habitual act of knuckle cracking influence the long-term integrity of joint health and the potential onset of arthritis?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.9688
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9688
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9688
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9878
cosine_mrr@10 0.9835
cosine_map@100 0.9835

Information Retrieval

Metric Value
cosine_accuracy@1 0.9688
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9688
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9688
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9864
cosine_mrr@10 0.9818
cosine_map@100 0.9818

Information Retrieval

Metric Value
cosine_accuracy@1 0.9635
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9635
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9635
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9859
cosine_mrr@10 0.9809
cosine_map@100 0.9809

Information Retrieval

Metric Value
cosine_accuracy@1 0.9688
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9688
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9688
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9885
cosine_mrr@10 0.9844
cosine_map@100 0.9844

Information Retrieval

Metric Value
cosine_accuracy@1 0.9688
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9688
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9688
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9885
cosine_mrr@10 0.9844
cosine_map@100 0.9844

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.0231 5 5.0567 - - - - -
0.0463 10 4.9612 - - - - -
0.0694 15 3.9602 - - - - -
0.0926 20 3.7873 - - - - -
0.1157 25 6.0207 - - - - -
0.1389 30 4.8715 - - - - -
0.1620 35 4.5238 - - - - -
0.1852 40 5.031 - - - - -
0.2083 45 3.2313 - - - - -
0.2315 50 3.0379 - - - - -
0.2546 55 3.7691 - - - - -
0.2778 60 2.4926 - - - - -
0.3009 65 2.3618 - - - - -
0.3241 70 1.8793 - - - - -
0.3472 75 2.2716 - - - - -
0.3704 80 1.9657 - - - - -
0.3935 85 2.093 - - - - -
0.4167 90 2.0596 - - - - -
0.4398 95 2.3242 - - - - -
0.4630 100 2.5553 - - - - -
0.4861 105 2.313 - - - - -
0.5093 110 1.6134 - - - - -
0.5324 115 2.1744 - - - - -
0.5556 120 3.9457 - - - - -
0.5787 125 2.3766 - - - - -
0.6019 130 2.1941 - - - - -
0.625 135 2.4742 - - - - -
0.6481 140 1.0735 - - - - -
0.6713 145 1.4778 - - - - -
0.6944 150 1.7087 - - - - -
0.7176 155 1.2857 - - - - -
0.7407 160 2.1466 - - - - -
0.7639 165 1.0359 - - - - -
0.7870 170 2.7856 - - - - -
0.8102 175 1.7452 - - - - -
0.8333 180 1.7116 - - - - -
0.8565 185 1.8259 - - - - -
0.8796 190 1.3668 - - - - -
0.9028 195 2.406 - - - - -
0.9259 200 1.6749 - - - - -
0.9491 205 1.7489 - - - - -
0.9722 210 1.0463 - - - - -
0.9954 215 1.1898 - - - - -
1.0 216 - 0.9293 0.9423 0.9358 0.9212 0.9457
1.0185 220 0.9331 - - - - -
1.0417 225 1.272 - - - - -
1.0648 230 1.4633 - - - - -
1.0880 235 0.9235 - - - - -
1.1111 240 0.7079 - - - - -
1.1343 245 1.7787 - - - - -
1.1574 250 1.6618 - - - - -
1.1806 255 0.6654 - - - - -
1.2037 260 1.6436 - - - - -
1.2269 265 2.1474 - - - - -
1.25 270 1.0221 - - - - -
1.2731 275 0.9918 - - - - -
1.2963 280 1.7429 - - - - -
1.3194 285 1.0654 - - - - -
1.3426 290 0.8975 - - - - -
1.3657 295 0.9129 - - - - -
1.3889 300 0.7277 - - - - -
1.4120 305 1.5631 - - - - -
1.4352 310 1.6058 - - - - -
1.4583 315 1.4138 - - - - -
1.4815 320 1.6113 - - - - -
1.5046 325 1.4494 - - - - -
1.5278 330 1.4968 - - - - -
1.5509 335 1.4091 - - - - -
1.5741 340 1.5824 - - - - -
1.5972 345 2.1587 - - - - -
1.6204 350 1.5189 - - - - -
1.6435 355 1.6777 - - - - -
1.6667 360 1.5988 - - - - -
1.6898 365 0.8405 - - - - -
1.7130 370 1.6055 - - - - -
1.7361 375 1.2944 - - - - -
1.7593 380 2.1612 - - - - -
1.7824 385 0.7439 - - - - -
1.8056 390 0.7901 - - - - -
1.8287 395 1.5219 - - - - -
1.8519 400 1.5809 - - - - -
1.875 405 0.7212 - - - - -
1.8981 410 2.6096 - - - - -
1.9213 415 0.7889 - - - - -
1.9444 420 0.8258 - - - - -
1.9676 425 1.6673 - - - - -
1.9907 430 1.2115 - - - - -
2.0 432 - 0.9779 0.9635 0.9648 0.9744 0.9557
2.0139 435 0.7521 - - - - -
2.0370 440 1.9249 - - - - -
2.0602 445 0.5628 - - - - -
2.0833 450 1.4106 - - - - -
2.1065 455 1.975 - - - - -
2.1296 460 2.2555 - - - - -
2.1528 465 0.9295 - - - - -
2.1759 470 0.5079 - - - - -
2.1991 475 0.6606 - - - - -
2.2222 480 1.2459 - - - - -
2.2454 485 1.951 - - - - -
2.2685 490 1.0574 - - - - -
2.2917 495 0.7781 - - - - -
2.3148 500 1.3501 - - - - -
2.3380 505 1.1007 - - - - -
2.3611 510 1.2571 - - - - -
2.3843 515 0.7043 - - - - -
2.4074 520 1.3722 - - - - -
2.4306 525 0.637 - - - - -
2.4537 530 1.2377 - - - - -
2.4769 535 0.2623 - - - - -
2.5 540 1.2385 - - - - -
2.5231 545 0.6386 - - - - -
2.5463 550 0.9983 - - - - -
2.5694 555 0.4472 - - - - -
2.5926 560 0.0124 - - - - -
2.6157 565 0.8332 - - - - -
2.6389 570 1.6487 - - - - -
2.6620 575 1.0389 - - - - -
2.6852 580 1.5456 - - - - -
2.7083 585 1.9962 - - - - -
2.7315 590 0.8047 - - - - -
2.7546 595 1.1698 - - - - -
2.7778 600 1.19 - - - - -
2.8009 605 0.4501 - - - - -
2.8241 610 1.1774 - - - - -
2.8472 615 1.2138 - - - - -
2.8704 620 1.1465 - - - - -
2.8935 625 1.7951 - - - - -
2.9167 630 0.8589 - - - - -
2.9398 635 0.6086 - - - - -
2.9630 640 0.9924 - - - - -
2.9861 645 1.5596 - - - - -
3.0 648 - 0.9792 0.9748 0.9792 0.9714 0.9688
3.0093 650 0.9906 - - - - -
3.0324 655 0.5667 - - - - -
3.0556 660 0.6399 - - - - -
3.0787 665 1.0453 - - - - -
3.1019 670 0.9858 - - - - -
3.125 675 0.7337 - - - - -
3.1481 680 0.6271 - - - - -
3.1713 685 0.6166 - - - - -
3.1944 690 0.5013 - - - - -
3.2176 695 1.148 - - - - -
3.2407 700 1.2699 - - - - -
3.2639 705 0.9421 - - - - -
3.2870 710 1.1035 - - - - -
3.3102 715 0.8306 - - - - -
3.3333 720 1.0668 - - - - -
3.3565 725 0.731 - - - - -
3.3796 730 1.389 - - - - -
3.4028 735 0.6869 - - - - -
3.4259 740 1.1863 - - - - -
3.4491 745 0.724 - - - - -
3.4722 750 2.349 - - - - -
3.4954 755 1.8037 - - - - -
3.5185 760 0.7249 - - - - -
3.5417 765 0.5191 - - - - -
3.5648 770 0.8646 - - - - -
3.5880 775 0.6812 - - - - -
3.6111 780 0.4999 - - - - -
3.6343 785 0.4649 - - - - -
3.6574 790 0.6411 - - - - -
3.6806 795 0.5625 - - - - -
3.7037 800 0.4278 - - - - -
3.7269 805 1.2361 - - - - -
3.75 810 0.7399 - - - - -
3.7731 815 0.196 - - - - -
3.7963 820 0.7964 - - - - -
3.8194 825 0.3819 - - - - -
3.8426 830 0.7667 - - - - -
3.8657 835 1.7665 - - - - -
3.8889 840 1.6655 - - - - -
3.9120 845 0.6461 - - - - -
3.9352 850 1.2359 - - - - -
3.9583 855 1.4573 - - - - -
3.9815 860 1.7435 - - - - -
4.0 864 - 0.9844 0.9809 0.9792 0.9818 0.9809
4.0046 865 1.0446 - - - - -
4.0278 870 0.6758 - - - - -
4.0509 875 1.48 - - - - -
4.0741 880 0.4761 - - - - -
4.0972 885 1.2134 - - - - -
4.1204 890 0.6935 - - - - -
4.1435 895 1.4873 - - - - -
4.1667 900 1.0638 - - - - -
4.1898 905 1.4563 - - - - -
4.2130 910 0.596 - - - - -
4.2361 915 0.201 - - - - -
4.2593 920 0.5862 - - - - -
4.2824 925 0.8405 - - - - -
4.3056 930 1.124 - - - - -
4.3287 935 0.683 - - - - -
4.3519 940 1.7966 - - - - -
4.375 945 0.6667 - - - - -
4.3981 950 1.4612 - - - - -
4.4213 955 0.4955 - - - - -
4.4444 960 1.6164 - - - - -
4.4676 965 1.2466 - - - - -
4.4907 970 0.7147 - - - - -
4.5139 975 1.3327 - - - - -
4.5370 980 1.0586 - - - - -
4.5602 985 0.8825 - - - - -
4.5833 990 1.1655 - - - - -
4.6065 995 0.8447 - - - - -
4.6296 1000 0.8513 - - - - -
4.6528 1005 1.3928 - - - - -
4.6759 1010 2.3751 - - - - -
4.6991 1015 1.4852 - - - - -
4.7222 1020 0.6394 - - - - -
4.7454 1025 0.7736 - - - - -
4.7685 1030 1.8115 - - - - -
4.7917 1035 1.3616 - - - - -
4.8148 1040 0.3083 - - - - -
4.8380 1045 0.8645 - - - - -
4.8611 1050 2.3276 - - - - -
4.8843 1055 1.0203 - - - - -
4.9074 1060 1.0791 - - - - -
4.9306 1065 2.0055 - - - - -
4.9537 1070 1.3032 - - - - -
4.9769 1075 1.2631 - - - - -
5.0 1080 1.1409 0.9844 0.9809 0.9818 0.9844 0.9835
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
13
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for joshuapb/fine-tuned-matryoshka-1725

Finetuned
(288)
this model

Evaluation results