tomaarsen's picture
tomaarsen HF staff
Add new SentenceTransformer model.
089660d verified
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - loss:Matryoshka2dLoss
  - loss:MatryoshkaLoss
  - loss:CoSENTLoss
base_model: distilbert/distilbert-base-uncased
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: A woman is reading.
    sentences:
      - A woman is taking a picture.
      - Breivik complains of 'ridicule'
      - The small dog protects its owner.
  - source_sentence: A man shoots a man.
    sentences:
      - A man is shooting off guns.
      - A tiger walks around aimlessly.
      - A cat sleeps on purple sheet.
  - source_sentence: A man is speaking.
    sentences:
      - A man is talking.
      - 19 hurt in New Orleans shooting
      - The dogs are chasing a black cat.
  - source_sentence: A man is spitting.
    sentences:
      - Breivik complains of 'ridicule'
      - The man is hiking in the woods.
      - Eurozone agrees Greece bail-out
  - source_sentence: A parrot is talking.
    sentences:
      - A parrot is talking into a microphone.
      - A monkey pratices martial arts.
      - The two men are wearing jeans.
pipeline_tag: sentence-similarity
co2_eq_emissions:
  emissions: 5.379215660466108
  energy_consumed: 0.013838919430479152
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 0.072
  hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
  - name: SentenceTransformer based on distilbert/distilbert-base-uncased
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.861868947947514
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8712617743584893
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8611484157829896
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8619125760745536
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8615299857042606
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8623855766060573
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7716399182083511
            name: Pearson Dot
          - type: spearman_dot
            value: 0.781574012832885
            name: Spearman Dot
          - type: pearson_max
            value: 0.861868947947514
            name: Pearson Max
          - type: spearman_max
            value: 0.8712617743584893
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.8281542233533932
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8373087013752897
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.842468233222574
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8374178427964344
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8424571958251152
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8372826604544046
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6750086731901399
            name: Pearson Dot
          - type: spearman_dot
            value: 0.656834541089774
            name: Spearman Dot
          - type: pearson_max
            value: 0.842468233222574
            name: Pearson Max
          - type: spearman_max
            value: 0.8374178427964344
            name: Spearman Max

SentenceTransformer based on distilbert/distilbert-base-uncased

This is a sentence-transformers model finetuned from distilbert/distilbert-base-uncased on the sentence-transformers/stsb dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/distilbert-base-uncased-sts-2d-matryoshka")
# Run inference
sentences = [
    'A parrot is talking.',
    'A parrot is talking into a microphone.',
    'A monkey pratices martial arts.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8619
spearman_cosine 0.8713
pearson_manhattan 0.8611
spearman_manhattan 0.8619
pearson_euclidean 0.8615
spearman_euclidean 0.8624
pearson_dot 0.7716
spearman_dot 0.7816
pearson_max 0.8619
spearman_max 0.8713

Semantic Similarity

Metric Value
pearson_cosine 0.8282
spearman_cosine 0.8373
pearson_manhattan 0.8425
spearman_manhattan 0.8374
pearson_euclidean 0.8425
spearman_euclidean 0.8373
pearson_dot 0.675
spearman_dot 0.6568
pearson_max 0.8425
spearman_max 0.8374

Training Details

Training Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 5,749 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.0 tokens
    • max: 28 tokens
    • min: 5 tokens
    • mean: 9.95 tokens
    • max: 25 tokens
    • min: 0.0
    • mean: 0.54
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A plane is taking off. An air plane is taking off. 1.0
    A man is playing a large flute. A man is playing a flute. 0.76
    A man is spreading shreded cheese on a pizza. A man is spreading shredded cheese on an uncooked pizza. 0.76
  • Loss: Matryoshka2dLoss with these parameters:
    {
        "loss": "CoSENTLoss",
        "n_layers_per_step": 1,
        "last_layer_weight": 1.0,
        "prior_layers_weight": 1.0,
        "kl_div_weight": 1.0,
        "kl_temperature": 0.3,
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": 1
    }
    

Evaluation Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 15.1 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 15.11 tokens
    • max: 53 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: Matryoshka2dLoss with these parameters:
    {
        "loss": "CoSENTLoss",
        "n_layers_per_step": 1,
        "last_layer_weight": 1.0,
        "prior_layers_weight": 1.0,
        "kl_div_weight": 1.0,
        "kl_temperature": 0.3,
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": 1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: False
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: None
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss sts-dev_spearman_cosine sts-test_spearman_cosine
0.2778 100 7.1781 6.6704 0.8345 -
0.5556 200 6.5316 6.7135 0.8439 -
0.8333 300 6.6267 6.8697 0.8551 -
1.1111 400 6.5709 6.7623 0.8568 -
1.3889 500 6.2898 6.4412 0.8644 -
1.6667 600 6.2021 6.7711 0.8595 -
1.9444 700 6.201 6.5252 0.8628 -
2.2222 800 6.0862 6.9795 0.8652 -
2.5 900 6.303 6.7339 0.8685 -
2.7778 1000 5.9031 6.7249 0.8694 -
3.0556 1100 6.0803 6.8350 0.8684 -
3.3333 1200 6.0564 6.9703 0.8695 -
3.6111 1300 5.8407 7.3822 0.8707 -
3.8889 1400 5.8229 7.0442 0.8713 -
4.0 1440 - - - 0.8373

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.014 kWh
  • Carbon Emitted: 0.005 kg of CO2
  • Hours Used: 0.072 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.0.0.dev0
  • Transformers: 4.41.0.dev0
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.26.1
  • Datasets: 2.18.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Matryoshka2dLoss

@misc{li20242d,
    title={2D Matryoshka Sentence Embeddings}, 
    author={Xianming Li and Zongxi Li and Jing Li and Haoran Xie and Qing Li},
    year={2024},
    eprint={2402.14776},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}