BGE Base - FinBench Finetuned
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-base-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
- License: apache-2.0
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Snorkeler/BGE-Finetuned-FinBench")
sentences = [
'3M Company and SubsidiariesConsolidated Statement of IncomeYears ended December 31(Millions, except per share amounts)202220212020Net sales$34,229 $35,355 $32,184',
'Is 3M a capital-intensive business based on FY2022 data?',
'Among all of the derivative instruments that Verizon used to manage the exposure to fluctuations of foreign currencies exchange rates or interest rates, which one had the highest notional value in FY 2021?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
Evaluation
Metrics
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.8933 |
cosine_accuracy@3 |
1.0 |
cosine_accuracy@5 |
1.0 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.8933 |
cosine_precision@3 |
0.3333 |
cosine_precision@5 |
0.2 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.8933 |
cosine_recall@3 |
1.0 |
cosine_recall@5 |
1.0 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.9589 |
cosine_mrr@10 |
0.9444 |
cosine_map@100 |
0.9444 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.8867 |
cosine_accuracy@3 |
1.0 |
cosine_accuracy@5 |
1.0 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.8867 |
cosine_precision@3 |
0.3333 |
cosine_precision@5 |
0.2 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.8867 |
cosine_recall@3 |
1.0 |
cosine_recall@5 |
1.0 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.9573 |
cosine_mrr@10 |
0.9422 |
cosine_map@100 |
0.9422 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.9133 |
cosine_accuracy@3 |
1.0 |
cosine_accuracy@5 |
1.0 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.9133 |
cosine_precision@3 |
0.3333 |
cosine_precision@5 |
0.2 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.9133 |
cosine_recall@3 |
1.0 |
cosine_recall@5 |
1.0 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.9671 |
cosine_mrr@10 |
0.9556 |
cosine_map@100 |
0.9556 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.9267 |
cosine_accuracy@3 |
1.0 |
cosine_accuracy@5 |
1.0 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.9267 |
cosine_precision@3 |
0.3333 |
cosine_precision@5 |
0.2 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.9267 |
cosine_recall@3 |
1.0 |
cosine_recall@5 |
1.0 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.9721 |
cosine_mrr@10 |
0.9622 |
cosine_map@100 |
0.9622 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.94 |
cosine_accuracy@3 |
1.0 |
cosine_accuracy@5 |
1.0 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.94 |
cosine_precision@3 |
0.3333 |
cosine_precision@5 |
0.2 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.94 |
cosine_recall@3 |
1.0 |
cosine_recall@5 |
1.0 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.977 |
cosine_mrr@10 |
0.9689 |
cosine_map@100 |
0.9689 |
Training Details
Training Dataset
json
- Dataset: json
- Size: 150 training samples
- Columns:
context
and question
- Approximate statistics based on the first 150 samples:
|
context |
question |
type |
string |
string |
details |
- min: 17 tokens
- mean: 314.29 tokens
- max: 512 tokens
|
- min: 11 tokens
- mean: 39.67 tokens
- max: 175 tokens
|
- Samples:
context |
question |
Table of Contents 3M Company and SubsidiariesConsolidated Statement of Cash Flow sYears ended December 31 (Millions) 2018 2017 2016 Cash Flows from Operating Activities Net income including noncontrolling interest $5,363 $4,869 $5,058 Adjustments to reconcile net income including noncontrolling interest to net cashprovided by operating activities Depreciation and amortization 1,488 1,544 1,474 Company pension and postretirement contributions (370) (967) (383) Company pension and postretirement expense 410 334 250 Stock-based compensation expense 302 324 298 Gain on sale of businesses (545) (586) (111) Deferred income taxes (57) 107 7 Changes in assets and liabilities Accounts receivable (305) (245) (313) Inventories (509) (387) 57 Accounts payable 408 24 148 Accrued income taxes (current and long-term) 134 967 101 Other net 120 256 76 Net cash provided by (used in) operating activities 6,439 6,240 6,662 Cash Flows from Investing Activities Purchases of property, plant and equipment (PP&E) (1,577) (1,373) (1,420) Proceeds from sale of PP&E and other assets 262 49 58 Acquisitions, net of cash acquired 13 (2,023) (16) Purchases of marketable securities and investments (1,828) (2,152) (1,410) Proceeds from maturities and sale of marketable securities and investments 2,497 1,354 1,247 Proceeds from sale of businesses, net of cash sold 846 1,065 142 Other net 9 (6) (4) Net cash provided by (used in) investing activities 222 (3,086) (1,403) Cash Flows from Financing Activities Change in short-term debt net (284) 578 (797) Repayment of debt (maturities greater than 90 days) (1,034) (962) (992) Proceeds from debt (maturities greater than 90 days) 2,251 1,987 2,832 Purchases of treasury stock (4,870) (2,068) (3,753) Proceeds from issuance of treasury stock pursuant to stock option and benefit plans 485 734 804 Dividends paid to shareholders (3,193) (2,803) (2,678) Other net (56) (121) (42) Net cash provided by (used in) financing activities (6,701) (2,655) (4,626) Effect of exchange rate changes on cash and cash equivalents (160) 156 (33) Net increase (decrease) in cash and cash equivalents (200) 655 600 Cash and cash equivalents at beginning of year 3,053 2,398 1,798 Cash and cash equivalents at end of period $2,853 $3,053 $2,398 The accompanying Notes to Consolidated Financial Statements are an integral part of this statement. 60 |
What is the FY2018 capital expenditure amount (in USD millions) for 3M? Give a response to the question by relying on the details shown in the cash flow statement. |
Table of Contents 3M Company and SubsidiariesConsolidated Balance Shee tAt December 31 December 31, December 31, (Dollars in millions, except per share amount) 2018 2017 Assets Current assets Cash and cash equivalents $2,853 $3,053 Marketable securities current 380 1,076 Accounts receivable net of allowances of $95 and $103 5,020 4,911 Inventories Finished goods 2,120 1,915 Work in process 1,292 1,218 Raw materials and supplies 954 901 Total inventories 4,366 4,034 Prepaids 741 937 Other current assets 349 266 Total current assets 13,709 14,277 Property, plant and equipment 24,873 24,914 Less: Accumulated depreciation (16,135) (16,048) Property, plant and equipment net 8,738 8,866 Goodwill 10,051 10,513 Intangible assets net 2,657 2,936 Other assets 1,345 1,395 Total assets $36,500 $37,987 Liabilities Current liabilities Short-term borrowings and current portion of long-term debt $1,211 $1,853 Accounts payable 2,266 1,945 Accrued payroll 749 870 Accrued income taxes 243 310 Other current liabilities 2,775 2,709 Total current liabilities 7,244 7,687 Long-term debt 13,411 12,096 Pension and postretirement benefits 2,987 3,620 Other liabilities 3,010 2,962 Total liabilities $26,652 $26,365 Commitments and contingencies (Note 16) Equity 3M Company shareholders equity: Common stock par value, $.01 par value $ 9 $ 9 Shares outstanding - 2018: 576,575,168 Shares outstanding - 2017: 594,884,237 Additional paid-in capital 5,643 5,352 Retained earnings 40,636 39,115 Treasury stock (29,626) (25,887) Accumulated other comprehensive income (loss) (6,866) (7,026) Total 3M Company shareholders equity 9,796 11,563 Noncontrolling interest 52 59 Total equity $9,848 $11,622 Total liabilities and equity $36,500 $37,987 The accompanying Notes to Consolidated Financial Statements are an integral part of this statement.58 |
Assume that you are a public equities analyst. Answer the following question by primarily using information that is shown in the balance sheet: what is the year end FY2018 net PPNE for 3M? Answer in USD billions. |
3M Company and SubsidiariesConsolidated Statement of IncomeYears ended December 31(Millions, except per share amounts)202220212020Net sales$34,229 $35,355 $32,184 |
Is 3M a capital-intensive business based on FY2022 data? |
- Loss:
MatryoshkaLoss
with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epoch
per_device_train_batch_size
: 32
per_device_eval_batch_size
: 16
gradient_accumulation_steps
: 16
learning_rate
: 2e-05
num_train_epochs
: 50
lr_scheduler_type
: cosine
warmup_ratio
: 0.1
fp16
: True
tf32
: False
load_best_model_at_end
: True
optim
: adamw_torch_fused
batch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: False
do_predict
: False
eval_strategy
: epoch
prediction_loss_only
: True
per_device_train_batch_size
: 32
per_device_eval_batch_size
: 16
per_gpu_train_batch_size
: None
per_gpu_eval_batch_size
: None
gradient_accumulation_steps
: 16
eval_accumulation_steps
: None
learning_rate
: 2e-05
weight_decay
: 0.0
adam_beta1
: 0.9
adam_beta2
: 0.999
adam_epsilon
: 1e-08
max_grad_norm
: 1.0
num_train_epochs
: 50
max_steps
: -1
lr_scheduler_type
: cosine
lr_scheduler_kwargs
: {}
warmup_ratio
: 0.1
warmup_steps
: 0
log_level
: passive
log_level_replica
: warning
log_on_each_node
: True
logging_nan_inf_filter
: True
save_safetensors
: True
save_on_each_node
: False
save_only_model
: False
restore_callback_states_from_checkpoint
: False
no_cuda
: False
use_cpu
: False
use_mps_device
: False
seed
: 42
data_seed
: None
jit_mode_eval
: False
use_ipex
: False
bf16
: False
fp16
: True
fp16_opt_level
: O1
half_precision_backend
: auto
bf16_full_eval
: False
fp16_full_eval
: False
tf32
: False
local_rank
: 0
ddp_backend
: None
tpu_num_cores
: None
tpu_metrics_debug
: False
debug
: []
dataloader_drop_last
: False
dataloader_num_workers
: 0
dataloader_prefetch_factor
: None
past_index
: -1
disable_tqdm
: False
remove_unused_columns
: True
label_names
: None
load_best_model_at_end
: True
ignore_data_skip
: False
fsdp
: []
fsdp_min_num_params
: 0
fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap
: None
accelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed
: None
label_smoothing_factor
: 0.0
optim
: adamw_torch_fused
optim_args
: None
adafactor
: False
group_by_length
: False
length_column_name
: length
ddp_find_unused_parameters
: None
ddp_bucket_cap_mb
: None
ddp_broadcast_buffers
: False
dataloader_pin_memory
: True
dataloader_persistent_workers
: False
skip_memory_metrics
: True
use_legacy_prediction_loop
: False
push_to_hub
: False
resume_from_checkpoint
: None
hub_model_id
: None
hub_strategy
: every_save
hub_private_repo
: False
hub_always_push
: False
gradient_checkpointing
: False
gradient_checkpointing_kwargs
: None
include_inputs_for_metrics
: False
eval_do_concat_batches
: True
fp16_backend
: auto
push_to_hub_model_id
: None
push_to_hub_organization
: None
mp_parameters
:
auto_find_batch_size
: False
full_determinism
: False
torchdynamo
: None
ray_scope
: last
ddp_timeout
: 1800
torch_compile
: False
torch_compile_backend
: None
torch_compile_mode
: None
dispatch_batches
: None
split_batches
: None
include_tokens_per_second
: False
include_num_input_tokens_seen
: False
neftune_noise_alpha
: None
optim_target_modules
: None
batch_eval_metrics
: False
batch_sampler
: no_duplicates
multi_dataset_batch_sampler
: proportional
Training Logs
Epoch |
Step |
Training Loss |
dim_768_cosine_map@100 |
dim_512_cosine_map@100 |
dim_256_cosine_map@100 |
dim_128_cosine_map@100 |
dim_64_cosine_map@100 |
0 |
0 |
- |
0.4797 |
0.4762 |
0.4373 |
0.3948 |
0.2870 |
1.0 |
1 |
- |
0.4796 |
0.4762 |
0.4374 |
0.3946 |
0.2869 |
2.0 |
2 |
- |
0.5128 |
0.4990 |
0.4817 |
0.4673 |
0.3554 |
3.0 |
4 |
- |
0.5387 |
0.5180 |
0.5362 |
0.5217 |
0.4156 |
1.0 |
1 |
- |
0.5387 |
0.5180 |
0.5362 |
0.5217 |
0.4156 |
2.0 |
2 |
- |
0.5509 |
0.5339 |
0.5399 |
0.5288 |
0.4394 |
3.0 |
4 |
- |
0.5921 |
0.5763 |
0.5743 |
0.5709 |
0.5007 |
4.0 |
5 |
- |
0.6112 |
0.6097 |
0.6068 |
0.6031 |
0.5435 |
5.0 |
6 |
- |
0.6244 |
0.6383 |
0.6379 |
0.6478 |
0.5920 |
6.0 |
8 |
- |
0.6763 |
0.6857 |
0.7064 |
0.7134 |
0.6909 |
7.0 |
9 |
- |
0.6853 |
0.7161 |
0.7264 |
0.7463 |
0.7321 |
8.0 |
10 |
2.0247 |
- |
- |
- |
- |
- |
8.2 |
11 |
- |
0.7454 |
0.7757 |
0.7821 |
0.8181 |
0.7850 |
9.0 |
12 |
- |
0.7661 |
0.7926 |
0.8071 |
0.8261 |
0.8165 |
10.0 |
13 |
- |
0.7783 |
0.8061 |
0.8221 |
0.8396 |
0.8382 |
11.0 |
15 |
- |
0.8221 |
0.8217 |
0.8600 |
0.8834 |
0.8903 |
12.0 |
16 |
- |
0.8301 |
0.8393 |
0.8756 |
0.8908 |
0.9143 |
13.0 |
17 |
- |
0.8454 |
0.8562 |
0.8943 |
0.9167 |
0.9261 |
14.0 |
19 |
- |
0.8697 |
0.8861 |
0.9167 |
0.9311 |
0.9417 |
15.0 |
20 |
0.72 |
0.8808 |
0.8939 |
0.9217 |
0.9344 |
0.9522 |
16.2 |
22 |
- |
0.9061 |
0.9 |
0.9439 |
0.9411 |
0.9556 |
17.0 |
23 |
- |
0.9061 |
0.9061 |
0.9439 |
0.9444 |
0.9556 |
18.0 |
24 |
- |
0.9111 |
0.9117 |
0.9444 |
0.9444 |
0.9589 |
19.0 |
26 |
- |
0.9256 |
0.92 |
0.9478 |
0.9522 |
0.9589 |
20.0 |
27 |
- |
0.9256 |
0.9233 |
0.9478 |
0.9489 |
0.9611 |
21.0 |
28 |
- |
0.9289 |
0.9311 |
0.9478 |
0.9556 |
0.9644 |
22.0 |
30 |
0.3518 |
0.94 |
0.9344 |
0.9511 |
0.9556 |
0.9656 |
23.0 |
31 |
- |
0.9411 |
0.9356 |
0.9544 |
0.9556 |
0.9656 |
24.2 |
33 |
- |
0.9411 |
0.9389 |
0.9544 |
0.9589 |
0.9689 |
25.0 |
34 |
- |
0.9378 |
0.9389 |
0.9556 |
0.9589 |
0.9689 |
26.0 |
35 |
- |
0.9378 |
0.9389 |
0.9556 |
0.9589 |
0.9689 |
27.0 |
37 |
- |
0.9444 |
0.9389 |
0.9556 |
0.9589 |
0.9689 |
28.0 |
38 |
- |
0.9444 |
0.9389 |
0.9589 |
0.9589 |
0.9689 |
29.0 |
39 |
- |
0.9444 |
0.9389 |
0.9589 |
0.9589 |
0.9689 |
29.4 |
40 |
0.2456 |
- |
- |
- |
- |
- |
30.0 |
41 |
- |
0.9444 |
0.9422 |
0.9589 |
0.9589 |
0.9689 |
31.0 |
42 |
- |
0.9444 |
0.9422 |
0.9589 |
0.9622 |
0.9689 |
32.2 |
44 |
- |
0.9444 |
0.9422 |
0.9556 |
0.9622 |
0.9689 |
33.0 |
45 |
- |
0.9444 |
0.9422 |
0.9556 |
0.9622 |
0.9689 |
34.0 |
46 |
- |
0.9444 |
0.9422 |
0.9556 |
0.9622 |
0.9689 |
35.0 |
48 |
- |
0.9444 |
0.9422 |
0.9556 |
0.9622 |
0.9689 |
36.0 |
49 |
- |
0.9444 |
0.9422 |
0.9556 |
0.9622 |
0.9689 |
37.0 |
50 |
0.2123 |
0.9444 |
0.9422 |
0.9556 |
0.9622 |
0.9689 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.2.1
- Transformers: 4.41.2
- PyTorch: 2.1.2+cu121
- Accelerate: 1.1.1
- Datasets: 2.19.1
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}