Edit model card

SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sachin19566/bge-base-en-v1.5-udemy-fte")
# Run inference
sentences = [
    'Multiply your returns using \'Value Investing",https://www.udemy.com/multiply-your-returns-using-value-investing/,true,20,1942,19,63,All Levels,4.5 hours,2015-07-23T00:08:33Z\n874284,Weekly Forex Analysis by Baraq FX"',
    'All Levels',
    'Business Finance',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,683 training samples
  • Columns: course_title, level, and subject
  • Approximate statistics based on the first 1000 samples:
    course_title level subject
    type string string string
    details
    • min: 4 tokens
    • mean: 11.02 tokens
    • max: 81 tokens
    • min: 4 tokens
    • mean: 4.27 tokens
    • max: 5 tokens
    • min: 4 tokens
    • mean: 4.0 tokens
    • max: 4 tokens
  • Samples:
    course_title level subject
    Ultimate Investment Banking Course All Levels Business Finance
    Complete GST Course & Certification - Grow Your CA Practice All Levels Business Finance
    Financial Modeling for Business Analysts and Consultants Intermediate Level Business Finance
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 100 evaluation samples
  • Columns: course_title, level, and subject
  • Approximate statistics based on the first 100 samples:
    course_title level subject
    type string string string
    details
    • min: 4 tokens
    • mean: 12.63 tokens
    • max: 81 tokens
    • min: 4 tokens
    • mean: 4.42 tokens
    • max: 5 tokens
    • min: 4 tokens
    • mean: 4.0 tokens
    • max: 4 tokens
  • Samples:
    course_title level subject
    Learn to Use jQuery UI Widgets Beginner Level Web Development
    Financial Statements: Learn Accounting. Unlock the Numbers. Beginner Level Business Finance
    Trade Recap I - A Real Look at Futures Options Markets Beginner Level Business Finance
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 3e-06
  • max_steps: 932
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-06
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3.0
  • max_steps: 932
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss
0.0866 20 2.2161 1.7831
0.1732 40 1.9601 1.5400
0.2597 60 1.6253 1.1987
0.3463 80 1.2393 1.0009
0.4329 100 1.1817 0.9073
0.5195 120 1.0667 0.8817
0.6061 140 1.258 0.8282
0.6926 160 1.2375 0.7618
0.7792 180 1.0925 0.7274
0.8658 200 1.0823 0.7101
0.9524 220 0.8789 0.7056
1.0390 240 0.9597 0.7107
1.1255 260 0.8427 0.7221
1.2121 280 0.8612 0.7287
1.2987 300 0.8428 0.7275
1.3853 320 0.6426 0.7451
1.4719 340 0.709 0.7642
1.5584 360 0.6602 0.7851
1.6450 380 0.7356 0.8244
1.7316 400 0.7633 0.8310
1.8182 420 0.9592 0.8185
1.9048 440 0.6715 0.8094
1.9913 460 0.7926 0.8103
2.0779 480 0.7703 0.8011
2.1645 500 0.6287 0.8266
2.2511 520 0.5481 0.8536
2.3377 540 0.7101 0.8679
2.4242 560 0.423 0.9025
2.5108 580 0.6814 0.9197
2.5974 600 0.5879 0.9492
2.6840 620 0.537 0.9861
2.7706 640 0.5107 1.0179
2.8571 660 0.6164 1.0413
2.9437 680 0.6582 1.0710
3.0303 700 0.4553 1.1001
3.1169 720 0.3649 1.1416
3.2035 740 0.9273 1.1142
3.2900 760 0.8816 1.0694
3.3766 780 0.7005 1.0481
3.4632 800 1.9002 1.0289
3.5498 820 1.4467 1.0141
3.6364 840 1.5564 1.0023
3.7229 860 1.2316 0.9961
3.8095 880 1.0549 0.9931
3.8961 900 1.2359 0.9913
3.9827 920 1.3568 0.9897

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.0
  • Transformers: 4.44.2
  • PyTorch: 2.4.0+cu121
  • Accelerate: 0.33.0
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
7
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for sachin19566/bge-base-en-v1.5-udemy-fte

Finetuned
(249)
this model