BGE Base - FinBench Finetuned

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Snorkeler/BGE-Finetuned-FinBench")
# Run inference
sentences = [
    '3M Company and SubsidiariesConsolidated Statement of IncomeYears ended December 31(Millions, except per share amounts)202220212020Net sales$34,229 $35,355 $32,184',
    'Is 3M a capital-intensive business based on FY2022 data?',
    'Among all of the derivative instruments that Verizon used to manage the exposure to fluctuations of foreign currencies exchange rates or interest rates, which one had the highest notional value in FY 2021?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.8933
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.8933
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.8933
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9589
cosine_mrr@10 0.9444
cosine_map@100 0.9444

Information Retrieval

Metric Value
cosine_accuracy@1 0.8867
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.8867
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.8867
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9573
cosine_mrr@10 0.9422
cosine_map@100 0.9422

Information Retrieval

Metric Value
cosine_accuracy@1 0.9133
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9133
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9133
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9671
cosine_mrr@10 0.9556
cosine_map@100 0.9556

Information Retrieval

Metric Value
cosine_accuracy@1 0.9267
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9267
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9267
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9721
cosine_mrr@10 0.9622
cosine_map@100 0.9622

Information Retrieval

Metric Value
cosine_accuracy@1 0.94
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.94
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.94
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.977
cosine_mrr@10 0.9689
cosine_map@100 0.9689

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 150 training samples
  • Columns: context and question
  • Approximate statistics based on the first 150 samples:
    context question
    type string string
    details
    • min: 17 tokens
    • mean: 314.29 tokens
    • max: 512 tokens
    • min: 11 tokens
    • mean: 39.67 tokens
    • max: 175 tokens
  • Samples:
    context question
    Table of Contents 3M Company and SubsidiariesConsolidated Statement of Cash Flow sYears ended December 31 (Millions) 2018 2017 2016 Cash Flows from Operating Activities Net income including noncontrolling interest $5,363 $4,869 $5,058 Adjustments to reconcile net income including noncontrolling interest to net cashprovided by operating activities Depreciation and amortization 1,488 1,544 1,474 Company pension and postretirement contributions (370) (967) (383) Company pension and postretirement expense 410 334 250 Stock-based compensation expense 302 324 298 Gain on sale of businesses (545) (586) (111) Deferred income taxes (57) 107 7 Changes in assets and liabilities Accounts receivable (305) (245) (313) Inventories (509) (387) 57 Accounts payable 408 24 148 Accrued income taxes (current and long-term) 134 967 101 Other net 120 256 76 Net cash provided by (used in) operating activities 6,439 6,240 6,662 Cash Flows from Investing Activities Purchases of property, plant and equipment (PP&E) (1,577) (1,373) (1,420) Proceeds from sale of PP&E and other assets 262 49 58 Acquisitions, net of cash acquired 13 (2,023) (16) Purchases of marketable securities and investments (1,828) (2,152) (1,410) Proceeds from maturities and sale of marketable securities and investments 2,497 1,354 1,247 Proceeds from sale of businesses, net of cash sold 846 1,065 142 Other net 9 (6) (4) Net cash provided by (used in) investing activities 222 (3,086) (1,403) Cash Flows from Financing Activities Change in short-term debt net (284) 578 (797) Repayment of debt (maturities greater than 90 days) (1,034) (962) (992) Proceeds from debt (maturities greater than 90 days) 2,251 1,987 2,832 Purchases of treasury stock (4,870) (2,068) (3,753) Proceeds from issuance of treasury stock pursuant to stock option and benefit plans 485 734 804 Dividends paid to shareholders (3,193) (2,803) (2,678) Other net (56) (121) (42) Net cash provided by (used in) financing activities (6,701) (2,655) (4,626) Effect of exchange rate changes on cash and cash equivalents (160) 156 (33) Net increase (decrease) in cash and cash equivalents (200) 655 600 Cash and cash equivalents at beginning of year 3,053 2,398 1,798 Cash and cash equivalents at end of period $2,853 $3,053 $2,398 The accompanying Notes to Consolidated Financial Statements are an integral part of this statement. 60 What is the FY2018 capital expenditure amount (in USD millions) for 3M? Give a response to the question by relying on the details shown in the cash flow statement.
    Table of Contents 3M Company and SubsidiariesConsolidated Balance Shee tAt December 31 December 31, December 31, (Dollars in millions, except per share amount) 2018 2017 Assets Current assets Cash and cash equivalents $2,853 $3,053 Marketable securities current 380 1,076 Accounts receivable net of allowances of $95 and $103 5,020 4,911 Inventories Finished goods 2,120 1,915 Work in process 1,292 1,218 Raw materials and supplies 954 901 Total inventories 4,366 4,034 Prepaids 741 937 Other current assets 349 266 Total current assets 13,709 14,277 Property, plant and equipment 24,873 24,914 Less: Accumulated depreciation (16,135) (16,048) Property, plant and equipment net 8,738 8,866 Goodwill 10,051 10,513 Intangible assets net 2,657 2,936 Other assets 1,345 1,395 Total assets $36,500 $37,987 Liabilities Current liabilities Short-term borrowings and current portion of long-term debt $1,211 $1,853 Accounts payable 2,266 1,945 Accrued payroll 749 870 Accrued income taxes 243 310 Other current liabilities 2,775 2,709 Total current liabilities 7,244 7,687 Long-term debt 13,411 12,096 Pension and postretirement benefits 2,987 3,620 Other liabilities 3,010 2,962 Total liabilities $26,652 $26,365 Commitments and contingencies (Note 16) Equity 3M Company shareholders equity: Common stock par value, $.01 par value $ 9 $ 9 Shares outstanding - 2018: 576,575,168 Shares outstanding - 2017: 594,884,237 Additional paid-in capital 5,643 5,352 Retained earnings 40,636 39,115 Treasury stock (29,626) (25,887) Accumulated other comprehensive income (loss) (6,866) (7,026) Total 3M Company shareholders equity 9,796 11,563 Noncontrolling interest 52 59 Total equity $9,848 $11,622 Total liabilities and equity $36,500 $37,987 The accompanying Notes to Consolidated Financial Statements are an integral part of this statement.58 Assume that you are a public equities analyst. Answer the following question by primarily using information that is shown in the balance sheet: what is the year end FY2018 net PPNE for 3M? Answer in USD billions.
    3M Company and SubsidiariesConsolidated Statement of IncomeYears ended December 31(Millions, except per share amounts)202220212020Net sales$34,229 $35,355 $32,184 Is 3M a capital-intensive business based on FY2022 data?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 50
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • fp16: True
  • tf32: False
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 50
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: False
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_map@100 dim_512_cosine_map@100 dim_256_cosine_map@100 dim_128_cosine_map@100 dim_64_cosine_map@100
0 0 - 0.4797 0.4762 0.4373 0.3948 0.2870
1.0 1 - 0.4796 0.4762 0.4374 0.3946 0.2869
2.0 2 - 0.5128 0.4990 0.4817 0.4673 0.3554
3.0 4 - 0.5387 0.5180 0.5362 0.5217 0.4156
1.0 1 - 0.5387 0.5180 0.5362 0.5217 0.4156
2.0 2 - 0.5509 0.5339 0.5399 0.5288 0.4394
3.0 4 - 0.5921 0.5763 0.5743 0.5709 0.5007
4.0 5 - 0.6112 0.6097 0.6068 0.6031 0.5435
5.0 6 - 0.6244 0.6383 0.6379 0.6478 0.5920
6.0 8 - 0.6763 0.6857 0.7064 0.7134 0.6909
7.0 9 - 0.6853 0.7161 0.7264 0.7463 0.7321
8.0 10 2.0247 - - - - -
8.2 11 - 0.7454 0.7757 0.7821 0.8181 0.7850
9.0 12 - 0.7661 0.7926 0.8071 0.8261 0.8165
10.0 13 - 0.7783 0.8061 0.8221 0.8396 0.8382
11.0 15 - 0.8221 0.8217 0.8600 0.8834 0.8903
12.0 16 - 0.8301 0.8393 0.8756 0.8908 0.9143
13.0 17 - 0.8454 0.8562 0.8943 0.9167 0.9261
14.0 19 - 0.8697 0.8861 0.9167 0.9311 0.9417
15.0 20 0.72 0.8808 0.8939 0.9217 0.9344 0.9522
16.2 22 - 0.9061 0.9 0.9439 0.9411 0.9556
17.0 23 - 0.9061 0.9061 0.9439 0.9444 0.9556
18.0 24 - 0.9111 0.9117 0.9444 0.9444 0.9589
19.0 26 - 0.9256 0.92 0.9478 0.9522 0.9589
20.0 27 - 0.9256 0.9233 0.9478 0.9489 0.9611
21.0 28 - 0.9289 0.9311 0.9478 0.9556 0.9644
22.0 30 0.3518 0.94 0.9344 0.9511 0.9556 0.9656
23.0 31 - 0.9411 0.9356 0.9544 0.9556 0.9656
24.2 33 - 0.9411 0.9389 0.9544 0.9589 0.9689
25.0 34 - 0.9378 0.9389 0.9556 0.9589 0.9689
26.0 35 - 0.9378 0.9389 0.9556 0.9589 0.9689
27.0 37 - 0.9444 0.9389 0.9556 0.9589 0.9689
28.0 38 - 0.9444 0.9389 0.9589 0.9589 0.9689
29.0 39 - 0.9444 0.9389 0.9589 0.9589 0.9689
29.4 40 0.2456 - - - - -
30.0 41 - 0.9444 0.9422 0.9589 0.9589 0.9689
31.0 42 - 0.9444 0.9422 0.9589 0.9622 0.9689
32.2 44 - 0.9444 0.9422 0.9556 0.9622 0.9689
33.0 45 - 0.9444 0.9422 0.9556 0.9622 0.9689
34.0 46 - 0.9444 0.9422 0.9556 0.9622 0.9689
35.0 48 - 0.9444 0.9422 0.9556 0.9622 0.9689
36.0 49 - 0.9444 0.9422 0.9556 0.9622 0.9689
37.0 50 0.2123 0.9444 0.9422 0.9556 0.9622 0.9689
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.2.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 1.1.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
394
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Snorkeler/BGE-Finetuned-FinBench

Finetuned
(305)
this model

Evaluation results