praveenku32k's picture
Add new SentenceTransformer model.
9d03f01 verified
metadata
base_model: sentence-transformers/all-MiniLM-L6-v2
datasets:
  - sentence-transformers/stsb
language:
  - en
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:5749
  - loss:CosineSimilarityLoss
widget:
  - source_sentence: The man talked to a girl over the internet camera.
    sentences:
      - A group of elderly people pose around a dining table.
      - A teenager talks to a girl over a webcam.
      - There is no 'still' that is not relative to some other object.
  - source_sentence: A woman is writing something.
    sentences:
      - Two eagles are perched on a branch.
      - >-
        It refers to the maximum f-stop (which is defined as the ratio of focal
        length to effective aperture diameter).
      - A woman is chopping green onions.
  - source_sentence: The player shoots the winning points.
    sentences:
      - Minimum wage laws hurt the least skilled, least productive the most.
      - The basketball player is about to score points for his team.
      - Sheep are grazing in the field in front of a line of trees.
  - source_sentence: >-
      Stars form in star-formation regions, which itself develop from molecular
      clouds.
    sentences:
      - >-
        Although I believe Searle is mistaken, I don't think you have found the
        problem.
      - >-
        It may be possible for a solar system like ours to exist outside of a
        galaxy.
      - >-
        A blond-haired child performing on the trumpet in front of a house while
        his younger brother watches.
  - source_sentence: >-
      While Queen may refer to both Queen regent (sovereign) or Queen consort,
      the King has always been the sovereign.
    sentences:
      - At first, I thought this is a bit of a tricky question.
      - A man sitting on the floor in a room is strumming a guitar.
      - >-
        There is a very good reason not to refer to the Queen's spouse as "King"
        - because they aren't the King.
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.8937895757423139
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8933408335166381
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8893270459753304
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8931680438618355
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8894951039580792
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8933408335166381
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.8937895714961118
            name: Pearson Dot
          - type: spearman_dot
            value: 0.8933411165328404
            name: Spearman Dot
          - type: pearson_max
            value: 0.8937895757423139
            name: Pearson Max
          - type: spearman_max
            value: 0.8933411165328404
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.8567692080604863
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8581039412647984
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8539129662613905
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8559325366695306
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8559600700692871
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8581039412647984
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.8567692052012096
            name: Pearson Dot
          - type: spearman_dot
            value: 0.8581039412647984
            name: Spearman Dot
          - type: pearson_max
            value: 0.8567692080604863
            name: Pearson Max
          - type: spearman_max
            value: 0.8581039412647984
            name: Spearman Max

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the sentence-transformers/stsb dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("praveenku32k/all-MiniLM-L6-v2-sts")
# Run inference
sentences = [
    'While Queen may refer to both Queen regent (sovereign) or Queen consort, the King has always been the sovereign.',
    'There is a very good reason not to refer to the Queen\'s spouse as "King" - because they aren\'t the King.',
    'A man sitting on the floor in a room is strumming a guitar.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8938
spearman_cosine 0.8933
pearson_manhattan 0.8893
spearman_manhattan 0.8932
pearson_euclidean 0.8895
spearman_euclidean 0.8933
pearson_dot 0.8938
spearman_dot 0.8933
pearson_max 0.8938
spearman_max 0.8933

Semantic Similarity

Metric Value
pearson_cosine 0.8568
spearman_cosine 0.8581
pearson_manhattan 0.8539
spearman_manhattan 0.8559
pearson_euclidean 0.856
spearman_euclidean 0.8581
pearson_dot 0.8568
spearman_dot 0.8581
pearson_max 0.8568
spearman_max 0.8581

Training Details

Training Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 5,749 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.0 tokens
    • max: 28 tokens
    • min: 5 tokens
    • mean: 9.95 tokens
    • max: 25 tokens
    • min: 0.0
    • mean: 0.54
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A plane is taking off. An air plane is taking off. 1.0
    A man is playing a large flute. A man is playing a flute. 0.76
    A man is spreading shreded cheese on a pizza. A man is spreading shredded cheese on an uncooked pizza. 0.76
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 15.1 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 15.11 tokens
    • max: 53 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss sts-dev_spearman_cosine sts-test_spearman_cosine
0.2778 100 0.0258 0.0231 0.8859 -
0.5556 200 0.0229 0.0214 0.8916 -
0.8333 300 0.0222 0.0203 0.8924 -
1.1111 400 0.0178 0.0213 0.8927 -
1.3889 500 0.0135 0.0211 0.8924 -
1.6667 600 0.0123 0.0215 0.8921 -
1.9444 700 0.0128 0.0208 0.8910 -
2.2222 800 0.009 0.0207 0.8941 -
2.5 900 0.008 0.0208 0.8943 -
2.7778 1000 0.0075 0.0209 0.8943 -
3.0556 1100 0.0081 0.0215 0.8934 -
3.3333 1200 0.0063 0.0211 0.8932 -
3.6111 1300 0.0061 0.0213 0.8933 -
3.8889 1400 0.0059 0.0213 0.8933 -
4.0 1440 - - - 0.8581

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}