indobert-large-stsb / README.md
quarkss's picture
Update README.md
5e4ad92 verified
metadata
base_model: indobenchmark/indobert-large-p2
datasets:
  - quarkss/stsb-indo-mt
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:5749
  - loss:CosineSimilarityLoss
widget:
  - source_sentence: Dua ekor anjing berenang di kolam renang.
    sentences:
      - Anjing-anjing sedang berenang di kolam renang.
      - Seekor binatang sedang berjalan di atas tanah.
      - Seorang pria sedang menyeka pinggiran mangkuk.
  - source_sentence: Seorang anak perempuan sedang mengiris mentega menjadi dua bagian.
    sentences:
      - Seorang wanita sedang mengiris tahu.
      - Dua orang berkelahi.
      - Seorang pria sedang menari.
  - source_sentence: Seorang gadis sedang makan kue mangkuk.
    sentences:
      - Seorang pria sedang mengiris bawang putih dengan alat pengiris mandolin.
      - Seorang pria sedang memotong dan memotong bawang.
      - Seorang wanita sedang makan kue mangkuk.
  - source_sentence: Sebuah helikopter mendarat di landasan helikopter.
    sentences:
      - Seorang pria sedang mengiris mentimun.
      - Seorang pria sedang memotong batang pohon dengan kapak.
      - Sebuah helikopter mendarat.
  - source_sentence: Seorang pria sedang berjalan dengan seekor kuda.
    sentences:
      - Seorang pria sedang menuntun seekor kuda dengan tali kekang.
      - Seorang pria sedang menembakkan pistol.
      - Seorang wanita sedang memetik tomat.
model-index:
  - name: SentenceTransformer based on indobenchmark/indobert-large-p2
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: pearson_cosine
            value: 0.8691840566814281
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8676618157111291
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8591936899214765
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8625729388794413
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8599101625523397
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8632992102966184
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.8440663965451926
            name: Pearson Dot
          - type: spearman_dot
            value: 0.8392116432595296
            name: Spearman Dot
          - type: pearson_max
            value: 0.8691840566814281
            name: Pearson Max
          - type: spearman_max
            value: 0.8676618157111291
            name: Spearman Max
          - type: pearson_cosine
            value: 0.8401688802461491
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8365597846163649
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8276067064758832
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8315689286193226
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8277930159560367
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.831557090168861
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.8170329546065831
            name: Pearson Dot
          - type: spearman_dot
            value: 0.8083098402255348
            name: Spearman Dot
          - type: pearson_max
            value: 0.8401688802461491
            name: Pearson Max
          - type: spearman_max
            value: 0.8365597846163649
            name: Spearman Max

SentenceTransformer based on indobenchmark/indobert-large-p2

This is a sentence-transformers model finetuned from indobenchmark/indobert-large-p2. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

STSB Test

Model Spearman Correlation
quarkss/indobert-large-stsb 0.8366
quarkss/indobert-base-stsb 0.8123
sentence-transformers/all-MiniLM-L6-v2 0.5952
indobenchmark/indobert-large-p2 0.5673
sentence-transformers/all-mpnet-base-v2 0.5531
sentence-transformers/stsb-bert-base 0.5349
indobenchmark/indobert-base-p2 0.5309

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: indobenchmark/indobert-large-p2
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("quarkss/indobert-large-stsb")
# Run inference
sentences = [
    'Seorang pria sedang berjalan dengan seekor kuda.',
    'Seorang pria sedang menuntun seekor kuda dengan tali kekang.',
    'Seorang pria sedang menembakkan pistol.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8692
spearman_cosine 0.8677
pearson_manhattan 0.8592
spearman_manhattan 0.8626
pearson_euclidean 0.8599
spearman_euclidean 0.8633
pearson_dot 0.8441
spearman_dot 0.8392
pearson_max 0.8692
spearman_max 0.8677

Semantic Similarity

Metric Value
pearson_cosine 0.8402
spearman_cosine 0.8366
pearson_manhattan 0.8276
spearman_manhattan 0.8316
pearson_euclidean 0.8278
spearman_euclidean 0.8316
pearson_dot 0.817
spearman_dot 0.8083
pearson_max 0.8402
spearman_max 0.8366

Training Details

Training Dataset

Unnamed Dataset

  • Size: 5,749 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 9.65 tokens
    • max: 25 tokens
    • min: 6 tokens
    • mean: 9.59 tokens
    • max: 24 tokens
    • min: 0.0
    • mean: 0.54
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    Sebuah pesawat sedang lepas landas. Sebuah pesawat terbang sedang lepas landas. 1.0
    Seorang pria sedang memainkan seruling besar. Seorang pria sedang memainkan seruling. 0.76
    Seorang pria sedang mengoleskan keju parut di atas pizza. Seorang pria sedang mengoleskan keju parut di atas pizza yang belum matang. 0.76
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss spearman_cosine spearman_max
0.2778 100 0.0867 - -
0.5556 200 0.0351 - -
0.8333 300 0.0303 - -
1.1111 400 0.0202 - -
1.3889 500 0.0154 0.8612 -
1.6667 600 0.0136 - -
1.9444 700 0.0145 - -
2.2222 800 0.0082 - -
2.5 900 0.0072 - -
2.7778 1000 0.0068 0.8660 -
3.0556 1100 0.0065 - -
3.3333 1200 0.0044 - -
3.6111 1300 0.0044 - -
3.8889 1400 0.0045 - -
4.1667 1500 0.0038 0.8677 -
4.4444 1600 0.0038 - -
4.7222 1700 0.0035 - -
5.0 1800 0.0034 - 0.8366

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.0.1+cu117
  • Accelerate: 0.32.1
  • Datasets: 2.17.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}