Edit model card

SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("datasocietyco/bge-base-en-v1.5-course-recommender-v4")
# Run inference
sentences = [
    'React Ecosystem: Styling. A course that expands on your react knowledge to make your own styled components and leverage material UI library. tags: styling, react, internet, MUI, browser. Languages: Course language: TBD. Prerequisites: Prerequisite course required: React Ecosystem: State Management & Redux. Target audience: Professionals who would like to explore the world of browsers, domains, and websites..',
    'Course Name:React Ecosystem: Styling|Course Description:A course that expands on your react knowledge to make your own styled components and leverage material UI library|Tags:styling, react, internet, MUI, browser|Course language: TBD|Target Audience:Professionals who would like to explore the world of browsers, domains, and websites.|Prerequisite course required: React Ecosystem: State Management & Redux',
    'Course Name:Spark Data Structures & Parallelism|Course Description:A 4-hour course for intermediate-level data scientists / engineers that covers Spark architecture and fundamentals including RDDs, DataFrames, Datasets.|Tags:spark, DataFrame, Spark UI, parallel processing, Dataset, RDDs|Course language: Scala|Target Audience:Students with basic knowledge of scala and want to expand their knowledge on scala and spark|Prerequisite course required: Intro to Scala Collections',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 143 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 143 samples:
    anchor positive
    type string string
    details
    • min: 70 tokens
    • mean: 111.95 tokens
    • max: 205 tokens
    • min: 68 tokens
    • mean: 109.95 tokens
    • max: 203 tokens
  • Samples:
    anchor positive
    Reinforcement Learning. This course covers the specialized branch of machine learning and deep learning called reinforcement learning (RL). By the end of this course students will be able to define RL use cases and real world scenarios where RL models are used, they will be able to create a simple RL model and evaluate its performance.. tags: deep learning, Keras, reinforcement learning, neural networks, TensorFlow. Languages: Course language: Python. Prerequisites: Prerequisite course required: Working with Complex Pre-trained CNNs in Python. Target audience: Professionals some Python experience who would like to expand their skillset to more advanced machine learning algorithms for reinforcement learning.. Course Name:Reinforcement Learning
    Optimizing Ensemble Methods in Python. This course covers advanced topics in optimizing ensemble learning methods – specifically random forest and gradient boosting. Students will learn to implement base models and perform hyperparameter tuning to enhance the performance of models.. tags: ensemble methods, classification, random forest, hyperparameter tuning, gbm, boosting, python, gradient boosting machines. Languages: Course language: Python. Prerequisites: Prerequisite course required: Ensemble Methods in Python. Target audience: Professionals experience in ensemble methods and who want to enhance their skill set in advanced Python classification techniques.. Course Name:Optimizing Ensemble Methods in Python
    Fundamentals of Accelerated Computing with OpenACC. Find out how to write and configure code parallelization with OpenACC, optimize memory movements between the CPU and GPU accelerator, and apply the techniques to accelerate a CPU-only Laplace Heat Equation to achieve performance gains.. tags: NVIDIA Nsight, OpenACC. Languages: Course language: Python. Prerequisites: No prerequisite course required. Target audience: Professionals who want to learn how to write code, configure code parallelization with OpenACC, optimize memory movements between the CPU and GPU accelerator, and implement the workflow learnt for massive performance gains.. Course Name:Fundamentals of Accelerated Computing with OpenACC
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 36 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 36 samples:
    anchor positive
    type string string
    details
    • min: 70 tokens
    • mean: 121.5 tokens
    • max: 187 tokens
    • min: 68 tokens
    • mean: 119.5 tokens
    • max: 185 tokens
  • Samples:
    anchor positive
    Intro to CSS, Part 2. A course that continues to build on the foundational understanding of CSS syntax and allows students to work with responsive design and media queries.. tags: styling, internet, CSS, web, web page, browser, HTML. Languages: Course language: CSS, HTML. Prerequisites: Prerequisite course required: Intro to CSS, Part 1. Target audience: Professionals who would like to continue learning the core concepts of CSS and be able to style simple web pages.. Course Name:Intro to CSS, Part 2
    Foundations of Statistics in Python. This course is designed for learners who would like to learn about statistics and apply it for decision-making. This course is a comprehensive review of statistical terms ranging from foundational (mean, median, mode, standard deviation, variance, covariance, correlation) to more complex concepts such as normality in data, confidence intervals, and p-values. Additional topics include how to calculate summary statistics and how to carry out hypothesis testing to inform decisions.. tags: two-tailed test, statistics, sampling, hypothesis testing, confidence intervals, one-tailed test, central limit theorem. Languages: Course language: Python. Prerequisites: Prerequisite course required: Intro to Visualization in Python. Target audience: Professionals some Python experience who would like to expand their skill set to more advanced Python visualization techniques and tools.. Course Name:Foundations of Statistics in Python
    Spherical k-Means and Hierarchical Clustering in R. This course covers the unsupervised learning method called clustering which is used to find patterns or groups in data without the need for labelled data. This course includes different methods of clustering on numerical data including density-based and hierarchical-based clustering and how to build, evaluate and interpret these models.. tags: DBSCAN, R, unsupervised learning, analytics, machine learning, hierarchical, clustering. Languages: Course language: R. Prerequisites: Prerequisite course required: Intro to Clustering in R. Target audience: Professionals with some R experience who would like to expand their skillset to more clustering techniques like hierarchical clustering and DBSCAN.. Course Name:Spherical k-Means and Hierarchical Clustering in R
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 3e-06
  • max_steps: 64
  • warmup_ratio: 0.1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-06
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3.0
  • max_steps: 64
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss
2.2222 20 0.0598 0.0318
4.4444 40 0.0167 0.0250
6.6667 60 0.0109 0.0233

Framework Versions

  • Python: 3.9.13
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.1
  • PyTorch: 2.2.2
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
0
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for datasocietyco/bge-base-en-v1.5-course-recommender-v4

Finetuned
(287)
this model