Edit model card

bge-base-en-v1.5-klej-dyk

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Herkules na rozstajach',
    'jak zinterpretować wymowę obrazu Herkules na rozstajach?',
    'Dowódcą grupy był Wiaczesław Razumowicz ps. „Chmara”.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.1731
cosine_accuracy@3 0.4615
cosine_accuracy@5 0.6226
cosine_accuracy@10 0.7356
cosine_precision@1 0.1731
cosine_precision@3 0.1538
cosine_precision@5 0.1245
cosine_precision@10 0.0736
cosine_recall@1 0.1731
cosine_recall@3 0.4615
cosine_recall@5 0.6226
cosine_recall@10 0.7356
cosine_ndcg@10 0.4434
cosine_mrr@10 0.3505
cosine_map@100 0.3574

Information Retrieval

Metric Value
cosine_accuracy@1 0.1683
cosine_accuracy@3 0.4519
cosine_accuracy@5 0.601
cosine_accuracy@10 0.7091
cosine_precision@1 0.1683
cosine_precision@3 0.1506
cosine_precision@5 0.1202
cosine_precision@10 0.0709
cosine_recall@1 0.1683
cosine_recall@3 0.4519
cosine_recall@5 0.601
cosine_recall@10 0.7091
cosine_ndcg@10 0.4296
cosine_mrr@10 0.3406
cosine_map@100 0.3485

Information Retrieval

Metric Value
cosine_accuracy@1 0.1923
cosine_accuracy@3 0.4543
cosine_accuracy@5 0.5913
cosine_accuracy@10 0.6899
cosine_precision@1 0.1923
cosine_precision@3 0.1514
cosine_precision@5 0.1183
cosine_precision@10 0.069
cosine_recall@1 0.1923
cosine_recall@3 0.4543
cosine_recall@5 0.5913
cosine_recall@10 0.6899
cosine_ndcg@10 0.4311
cosine_mrr@10 0.3488
cosine_map@100 0.3561

Information Retrieval

Metric Value
cosine_accuracy@1 0.1635
cosine_accuracy@3 0.4159
cosine_accuracy@5 0.5168
cosine_accuracy@10 0.5986
cosine_precision@1 0.1635
cosine_precision@3 0.1386
cosine_precision@5 0.1034
cosine_precision@10 0.0599
cosine_recall@1 0.1635
cosine_recall@3 0.4159
cosine_recall@5 0.5168
cosine_recall@10 0.5986
cosine_ndcg@10 0.3764
cosine_mrr@10 0.3052
cosine_map@100 0.3152

Information Retrieval

Metric Value
cosine_accuracy@1 0.1659
cosine_accuracy@3 0.351
cosine_accuracy@5 0.4399
cosine_accuracy@10 0.5288
cosine_precision@1 0.1659
cosine_precision@3 0.117
cosine_precision@5 0.088
cosine_precision@10 0.0529
cosine_recall@1 0.1659
cosine_recall@3 0.351
cosine_recall@5 0.4399
cosine_recall@10 0.5288
cosine_ndcg@10 0.3382
cosine_mrr@10 0.278
cosine_map@100 0.2877

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,738 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 6 tokens
    • mean: 90.01 tokens
    • max: 512 tokens
    • min: 10 tokens
    • mean: 30.82 tokens
    • max: 76 tokens
  • Samples:
    positive anchor
    Londyńska premiera w Ambassadors Theatre na londyńskim West Endzie miała miejsce 25 listopada 1952 roku, a przedstawione grane jest do dziś (od 1974 r.) w sąsiednim St Martin's Theatre. W Polsce była wystawiana m.in. w Teatrze Nowym w Zabrzu. w którym londyńskim muzeum wystawiana była instalacja My Bed?
    Theridion grallator osiąga długość 5 mm. U niektórych postaci na żółtym odwłoku występuje wzór przypominający uśmiechniętą lub śmiejącą się twarz klowna. które pająki noszą na grzbiecie wzór przypominający uśmiechniętego klauna?
    W 1998 w wyniku sporów o wytyczenie granicy między dwoma państwami wybuchła wojna erytrejsko-etiopska. Zakończyła się porozumieniem zawartym w Algierze 12 grudnia 2000. Od tego czasu strefa graniczna jest patrolowana przez siły pokojowe ONZ. jakie były skutki wojny erytrejsko-etiopskiej?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.0684 1 7.2706 - - - - -
0.1368 2 8.2776 - - - - -
0.2051 3 7.1399 - - - - -
0.2735 4 6.6905 - - - - -
0.3419 5 6.735 - - - - -
0.4103 6 7.0537 - - - - -
0.4786 7 6.871 - - - - -
0.5470 8 6.7277 - - - - -
0.6154 9 5.9853 - - - - -
0.6838 10 6.0518 - - - - -
0.7521 11 5.8291 - - - - -
0.8205 12 5.0064 - - - - -
0.8889 13 4.8572 - - - - -
0.9573 14 5.1899 0.2812 0.3335 0.3486 0.2115 0.3639
1.0256 15 4.2996 - - - - -
1.0940 16 4.1475 - - - - -
1.1624 17 4.6174 - - - - -
1.2308 18 4.394 - - - - -
1.2991 19 4.0255 - - - - -
1.3675 20 3.9722 - - - - -
1.4359 21 3.9509 - - - - -
1.5043 22 3.7674 - - - - -
1.5726 23 3.7572 - - - - -
1.6410 24 3.9463 - - - - -
1.7094 25 3.7151 - - - - -
1.7778 26 3.7771 - - - - -
1.8462 27 3.5228 - - - - -
1.9145 28 2.7906 - - - - -
1.9829 29 3.4555 0.3164 0.3529 0.3641 0.2636 0.3681
2.0513 30 2.737 - - - - -
2.1197 31 3.1976 - - - - -
2.1880 32 3.1363 - - - - -
2.2564 33 2.9706 - - - - -
2.3248 34 2.9629 - - - - -
2.3932 35 2.7226 - - - - -
2.4615 36 2.4378 - - - - -
2.5299 37 2.7201 - - - - -
2.5983 38 2.6802 - - - - -
2.6667 39 3.1613 - - - - -
2.7350 40 2.9344 - - - - -
2.8034 41 2.5254 - - - - -
2.8718 42 2.5617 - - - - -
2.9402 43 2.459 0.3197 0.3571 0.3640 0.2739 0.3733
3.0085 44 2.3785 - - - - -
3.0769 45 1.9408 - - - - -
3.1453 46 2.7095 - - - - -
3.2137 47 2.4774 - - - - -
3.2821 48 2.2178 - - - - -
3.3504 49 2.0884 - - - - -
3.4188 50 2.1044 - - - - -
3.4872 51 2.1504 - - - - -
3.5556 52 2.1177 - - - - -
3.6239 53 2.2283 - - - - -
3.6923 54 2.3964 - - - - -
3.7607 55 2.0972 - - - - -
3.8291 56 2.0961 - - - - -
3.8974 57 1.783 - - - - -
3.9658 58 2.1031 0.3246 0.3533 0.3603 0.2829 0.3687
4.0342 59 1.6699 - - - - -
4.1026 60 1.6675 - - - - -
4.1709 61 2.1672 - - - - -
4.2393 62 1.8881 - - - - -
4.3077 63 1.701 - - - - -
4.3761 64 1.9154 - - - - -
4.4444 65 1.4549 - - - - -
4.5128 66 1.5444 - - - - -
4.5812 67 1.8352 - - - - -
4.6496 68 1.7908 - - - - -
4.7179 69 1.6876 - - - - -
4.7863 70 1.7366 - - - - -
4.8547 71 1.8689 - - - - -
4.9231 72 1.4676 - - - - -
4.9915 73 1.5045 0.3170 0.3538 0.3606 0.2829 0.3675
5.0598 74 1.2155 - - - - -
5.1282 75 1.4365 - - - - -
5.1966 76 1.7451 - - - - -
5.2650 77 1.4537 - - - - -
5.3333 78 1.3813 - - - - -
5.4017 79 1.4035 - - - - -
5.4701 80 1.3912 - - - - -
5.5385 81 1.3286 - - - - -
5.6068 82 1.5153 - - - - -
5.6752 83 1.6745 - - - - -
5.7436 84 1.4323 - - - - -
5.8120 85 1.5299 - - - - -
5.8803 86 1.488 - - - - -
5.9487 87 1.5195 0.3206 0.3556 0.3530 0.2878 0.3605
6.0171 88 1.2999 - - - - -
6.0855 89 1.1511 - - - - -
6.1538 90 1.552 - - - - -
6.2222 91 1.35 - - - - -
6.2906 92 1.218 - - - - -
6.3590 93 1.1712 - - - - -
6.4274 94 1.3381 - - - - -
6.4957 95 1.1716 - - - - -
6.5641 96 1.2117 - - - - -
6.6325 97 1.5349 - - - - -
6.7009 98 1.4564 - - - - -
6.7692 99 1.3541 - - - - -
6.8376 100 1.2468 - - - - -
6.9060 101 1.1519 - - - - -
6.9744 102 1.2421 0.3150 0.3555 0.3501 0.2858 0.3575
7.0427 103 1.0096 - - - - -
7.1111 104 1.1405 - - - - -
7.1795 105 1.2958 - - - - -
7.2479 106 1.35 - - - - -
7.3162 107 1.1291 - - - - -
7.3846 108 0.9968 - - - - -
7.4530 109 1.0454 - - - - -
7.5214 110 1.102 - - - - -
7.5897 111 1.1328 - - - - -
7.6581 112 1.5988 - - - - -
7.7265 113 1.2992 - - - - -
7.7949 114 1.2572 - - - - -
7.8632 115 1.1414 - - - - -
7.9316 116 1.1432 - - - - -
8.0 117 1.1181 0.3154 0.3545 0.3509 0.2884 0.3578
8.0684 118 0.9365 - - - - -
8.1368 119 1.3286 - - - - -
8.2051 120 1.3711 - - - - -
8.2735 121 1.2001 - - - - -
8.3419 122 1.165 - - - - -
8.4103 123 1.0575 - - - - -
8.4786 124 1.105 - - - - -
8.5470 125 1.077 - - - - -
8.6154 126 1.2217 - - - - -
8.6838 127 1.3254 - - - - -
8.7521 128 1.2165 - - - - -
8.8205 129 1.3021 - - - - -
8.8889 130 1.0927 - - - - -
8.9573 131 1.3961 0.3150 0.3540 0.3490 0.2882 0.3588
9.0256 132 1.0779 - - - - -
9.0940 133 0.901 - - - - -
9.1624 134 1.313 - - - - -
9.2308 135 1.1409 - - - - -
9.2991 136 1.1635 - - - - -
9.3675 137 1.0244 - - - - -
9.4359 138 1.0576 - - - - -
9.5043 139 1.0101 - - - - -
9.5726 140 1.1516 0.3152 0.3561 0.3485 0.2877 0.3574
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.2
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.2
  • PyTorch: 2.3.1
  • Accelerate: 0.27.2
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
8
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ve88ifz2/bge-base-en-v1.5-klej-dyk-v0.2

Finetuned
(287)
this model

Evaluation results