legal-ft-arctic-l / README.md
llm-wizard's picture
Add new SentenceTransformer model
7899b34 verified
metadata
base_model: Snowflake/snowflake-arctic-embed-l
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
  - dot_accuracy@1
  - dot_accuracy@3
  - dot_accuracy@5
  - dot_accuracy@10
  - dot_precision@1
  - dot_precision@3
  - dot_precision@5
  - dot_precision@10
  - dot_recall@1
  - dot_recall@3
  - dot_recall@5
  - dot_recall@10
  - dot_ndcg@10
  - dot_mrr@10
  - dot_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:400
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      What does it mean for a decision to not be considered arbitrary and
      capricious according to the provided context?
    sentences:
      - >-
        Aravind Srinivas, Founding Story and Journey of Perplexity, YOUTUBE, at
        17:57 (Jan. 18, 2024), 

        https://www.youtube.com/watch?v=ygRVDIwheB4. 

        17 See, e.g., Avoiding Plagiarism Guide, APA Style 7th Edition (last
        visited Aug. 30, 2024), 

        https://apastyle.apa.org/instructional-aids/avoiding-plagiarism.pdf. 

        18 See What is Perplexity?, supra note 1 (promoting Perplexity’s
        “Reliable sources” with an 

        explanation that “[e]very answer is backed by citations from trusted
        news outlets, academic papers, 

        and established blogs”).  

        19 Madhumita Murgia & Cristina Criddle, Perplexity’s popularity surges
        as AI search start-up takes 

        on Google, THE FINANCIAL TIMES (Aug. 9, 2024),
        https://www.ft.com/content/87af3340-2611-

        4650-9ae3-036927e9f65c.
      - >-
        30 
         
        Serv. Comm'n, 43 Mass. App. Ct. 300, 303 (1997). A decision is not
        arbitrary and capricious if 

        "reasonable minds could differ" on the proper outcome. See Kinchla v.
        Board of Appeals of 

        Falmouth, 11 Mass. App. Ct. 927, 927 (1981). 

        In determining the appropriate definition of general words used in a
        statute, the courts may 

        look to sources outside the statute such as "their use in other legal
        contexts: and dictionary 

        definitions." See Commonwealth v. Correia, 17 Mass.App.Ct. 233, 235
        (1983) “Arbitrary” is 

        defined as subject to individual will or judgment without restriction;
        contingent solely upon one's 

        discretion… having unlimited power; uncontrolled or unrestricted by law;
        despotic; tyrannical;
      - |-
        purpose of providing a substitute product.    
        Case 1:24-cv-07984     Document 1     Filed 10/21/24     Page 3 of 42
  - source_sentence: What percentage of applicants were admitted to Stanford last year?
    sentences:
      - >-
        to which RNH is currently applying are extremely competitive and the
        admissions process for 

        admission into such schools is rigorous.  These schools command an
        extensive applicant pool of 

        high academic achievers with high test scores, grade point averages,
        including grades of A’s and 

        B’s only.  Stanford is one of the most competitive schools in the
        country. Last year, 4% of the 

        applicant pool were admitted.  Thousands of extremely well qualified,
        who elsewhere would be 

        highly admissible, were denied. It is essential that any applicant have
        the most competitive 

        transcript possible. A C+ is a red flag that will be noticed far more
        quickly and glaringly than the 

        Case 1:24-cv-12437-WGY   Document 8   Filed 10/08/24   Page 6 of 42
      - >-
        18 
         
        upon in affirming the decision through an appeal to exclude RNH and his
        classmate from the NHS.  

        Id. at ¶145.  At that time, Defendant Swanson and other Defendants knew
        or should have known 

        that the District inducted at least seven students into NHS, who had
        academic infractions on their 

        record, one of which was because of the prior use of AI.  Id. at
        ¶146.   

        The “committee” that adjudicated selection for NHS this year did not
        include teachers who 

        know and are familiar with RNH and his classmate.  Id. at ¶147.  This is
        due to the then escalating 

        contract conflict with the Hingham Educators Association (“HEA”) where
        HEA engaged in an
      - >-
        42 
         
        CERTIFICATE OF SERVICE 
         
        I, Peter S. Farrell, hereby certify that I served a copy of the
        foregoing on all counsel of 

        record pursuant to Local Rule 5.4(c) by causing a copy of the same to be
        electronically filed and 

        served through the CM/ECF filing system to: 
         
        Gareth W. Notis, Esquire 

        Morrison Mahoney LLP 

        250 Summer Street 

        Boston, MA 02210 

        gnotis@morrisonmahoney.com 
         
         
         
         
         
         
         
         
        ______________________________ 
         
         
         
         
         
         
        Peter S. Farrell 
         
        Case 1:24-cv-12437-WGY   Document 8   Filed 10/08/24   Page 42 of 42
  - source_sentence: What is the case number for the document filed on 10/08/24?
    sentences:
      - Case 1:24-cv-07984     Document 1     Filed 10/21/24     Page 19 of 42
      - Case 1:24-cv-12437-WGY   Document 8   Filed 10/08/24   Page 33 of 42
      - >-
        11 See, e.g., Elizabeth Lopatto, Perplexity’s Grand Theft AI, THE VERGE
        (June 27, 2024), 

        https://www.theverge.com/2024/6/27/24187405/perplexity-ai-twitter-lie-plagiarism 

        (describing 

        Perplexity as a “rent-seeking middleman on high-quality sources” that
        “starve[s] the primary 

        source of ad revenue”); Dhruv Mehrotra & Tim Marchman, Perplexity Is a
        Bullsh*t Machine, 

        WIRED 

        (June 

        19, 

        2024), 

        https://www.wired.com/story/perplexity-is-a-bullshit-machine 

        (discussing Perplexity’s reliance on recent news articles for its
        content as well as its tendency to 

        falsely attribute information) (asterisk added); Casey Newton, How to
        Stop Perplexity and save 

        the web from bad AI, PLATFORMER (June 20, 2024),
        https://www.platformer.news/how-to-stop-
  - source_sentence: >-
      How does Perplexity gather and compile information from authoritative
      sources?
    sentences:
      - >-
        utilize have been trained. To employ a RAG system, AI applications
        typically utilize indexed 

        databases that house all the content from which the AI application will
        retrieve specific information 

        to generate outputs for its users. The larger the index, the more
        “answers” the AI application can 

        provide.  

        51. 

        In Perplexity’s words, it “scours the internet, gathering information
        from 

        authoritative sources like articles, websites, and journals.”6 It then,
        “compiles the most relevant 

        insights into a coherent, easy-to-understand answer” automatically
        generated from those original 

        sources.7  

        52. 

        The assembling of authoritative sources for a RAG index is a distinct
        process from
      - >-
        9 

        26. 

        Perplexity processes subscription purchases from customers in this State
        and 

        District, transmits Plaintiffs’ copyrighted content to users in this
        State and District, and has a 

        significant number of customers in this State and District. 

        27. 

        As a direct and proximate result of Perplexity’s unauthorized use
        and/or 

        dissemination of Plaintiffs’ copyrighted works and trademarks in New
        York and elsewhere, 

        Plaintiffs have lost and will continue to lose revenue and profits from
        the market for content 

        licensing, subscribers, visitors, and users. 

        FACTUAL ALLEGATIONS 

        I. 

        Plaintiffs’ Robust Businesses and Copyrighted Works 

        28. 

        Dow Jones began in 1882 as a niche news agency in a Wall Street
        basement,
      - >-
        1 

        UNITED STATES DISTRICT COURT 

        SOUTHERN DISTRICT OF NEW YORK 
         
        DOW JONES & COMPANY, INC.  

        and NYP HOLDINGS, INC., 
         
        Plaintiffs, 
         
        v. 
         
        PERPLEXITY AI, INC., 
         
        Defendant. 
         
         
         
        Civil Action No. 24-cv-7984 
         
         
        COMPLAINT 
         
        JURY TRIAL DEMANDED  
         
        Plaintiffs Dow Jones & Company, Inc. (“Dow Jones”) and NYP Holdings,
        Inc. (“NYP 

        Holdings”) (collectively, “Plaintiffs”), by and through their attorneys,
        Torridon Law PLLC, for 

        their Complaint, hereby allege against Defendant Perplexity AI, Inc.
        (“Perplexity” or 

        “Defendant”), as follows: 

        NATURE OF THE ACTION 

        1. 

        Perplexity is a generative artificial intelligence company that claims
        to provide its
  - source_sentence: >-
      What recent partnership did News Corp enter into regarding licensing
      content for OpenAI's applications?
    sentences:
      - >-
        integrity infractions.   Plain and simple.  It should not take the
        Plaintiffs engaging counsel, 

        demanding information and forcing Hingham to investigate this matter to
        reveal that selection for 

        NHS was a manipulated sham conducted by the Defendants, who at all times
        relevant were state 

        actors. 

        C. The Student Will Suffer Irreparable Harm If The Injunction is Not
        Granted 

        In order for the Plaintiffs to obtain injunctive relief, they must show
        that they are "likely to 

        suffer irreparable injury before a decision is rendered on the merits."
        See Philips Elecs. N. Am. 

        Corp. v. Halperin, 2000 Mass. Super LEXIS 574 citing Sierra Club v.
        Larson, 769 F. Supp. 420,
      - >-
        licensing initiatives abound.”3 For example, News Corp recently
        partnered with OpenAI to license 

        its content for certain uses in OpenAI’s applications. OpenAI users will
        have the benefit of 

        accessing Plaintiffs’ content, whether quoted or summarized by OpenAI.
        This cooperative 

        relationship will allow OpenAI and Plaintiffs to experiment with new
        product experiences and 

        revenue models. 

        15. 

        Generative AI technology can be developed in two ways. It can be
        developed 

        legally by recognizing the legitimate rights of copyright holders and by
        including in the AI business 

        model the legitimate costs and benefits of licensing the copyrighted
        material, or it can be developed
      - >-
        ban or prohibition on the use of AI by students. The Defendants were not
        trained on any policies 

        or procedures for use of AI alone, never mind what they were “able to
        do” to students who used 

        it.    The entire purpose behind having such policies and procedures in
        place is to ensure notice, 

        equity, fairness and to be sure:  a level playing field for all.  
        Making matters worse, there exists 

        no adequate procedures and policies for the induction of an applicant
        into NHS when compared to 

        other members who are inducted despite the same or similar infractions. 
        This is a denial of student 

        rights of the highest order. 
         
        In the case here, RNH was disciplined on an ad hoc and on-going basis
        over more than six
model-index:
  - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy@1
            value: 0.6875
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8541666666666666
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9583333333333334
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9791666666666666
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6875
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.28472222222222215
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.19166666666666665
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09791666666666665
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6875
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8541666666666666
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9583333333333334
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9791666666666666
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8280840444145441
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7793650793650793
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7812590187590187
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.6875
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.8541666666666666
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.9583333333333334
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 0.9791666666666666
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.6875
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.28472222222222215
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.19166666666666665
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.09791666666666665
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.6875
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.8541666666666666
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.9583333333333334
            name: Dot Recall@5
          - type: dot_recall@10
            value: 0.9791666666666666
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.8280840444145441
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.7793650793650793
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.7812590187590187
            name: Dot Map@100

SentenceTransformer based on Snowflake/snowflake-arctic-embed-l

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-l. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Snowflake/snowflake-arctic-embed-l
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("llm-wizard/legal-ft-arctic-l")
# Run inference
sentences = [
    "What recent partnership did News Corp enter into regarding licensing content for OpenAI's applications?",
    'licensing initiatives abound.”3 For example, News Corp recently partnered with OpenAI to license \nits content for certain uses in OpenAI’s applications. OpenAI users will have the benefit of \naccessing Plaintiffs’ content, whether quoted or summarized by OpenAI. This cooperative \nrelationship will allow OpenAI and Plaintiffs to experiment with new product experiences and \nrevenue models. \n15. \nGenerative AI technology can be developed in two ways. It can be developed \nlegally by recognizing the legitimate rights of copyright holders and by including in the AI business \nmodel the legitimate costs and benefits of licensing the copyrighted material, or it can be developed',
    'integrity infractions.   Plain and simple.  It should not take the Plaintiffs engaging counsel, \ndemanding information and forcing Hingham to investigate this matter to reveal that selection for \nNHS was a manipulated sham conducted by the Defendants, who at all times relevant were state \nactors. \nC. The Student Will Suffer Irreparable Harm If The Injunction is Not Granted \nIn order for the Plaintiffs to obtain injunctive relief, they must show that they are "likely to \nsuffer irreparable injury before a decision is rendered on the merits." See Philips Elecs. N. Am. \nCorp. v. Halperin, 2000 Mass. Super LEXIS 574 citing Sierra Club v. Larson, 769 F. Supp. 420,',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6875
cosine_accuracy@3 0.8542
cosine_accuracy@5 0.9583
cosine_accuracy@10 0.9792
cosine_precision@1 0.6875
cosine_precision@3 0.2847
cosine_precision@5 0.1917
cosine_precision@10 0.0979
cosine_recall@1 0.6875
cosine_recall@3 0.8542
cosine_recall@5 0.9583
cosine_recall@10 0.9792
cosine_ndcg@10 0.8281
cosine_mrr@10 0.7794
cosine_map@100 0.7813
dot_accuracy@1 0.6875
dot_accuracy@3 0.8542
dot_accuracy@5 0.9583
dot_accuracy@10 0.9792
dot_precision@1 0.6875
dot_precision@3 0.2847
dot_precision@5 0.1917
dot_precision@10 0.0979
dot_recall@1 0.6875
dot_recall@3 0.8542
dot_recall@5 0.9583
dot_recall@10 0.9792
dot_ndcg@10 0.8281
dot_mrr@10 0.7794
dot_map@100 0.7813

Training Details

Training Dataset

Unnamed Dataset

  • Size: 400 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 400 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 10 tokens
    • mean: 20.73 tokens
    • max: 34 tokens
    • min: 25 tokens
    • mean: 140.37 tokens
    • max: 260 tokens
  • Samples:
    sentence_0 sentence_1
    How does Perplexity's business model differ from that of traditional search engines? 11.
    Perplexity’s business is fundamentally distinct from that of traditional search
    engines that also copy a vast amount of content into their indices but do so merely to provide links
    to the originating sites. In its traditional form, a search engine is a tool for discovery, pointing
    searchers to websites such as the pages of The Wall Street Journal or the New York Post, where the
    users can click to find the information and answers they seek. Those clicks in turn provide revenue
    for content producers. In part because traditional search engines that simply provide hyperlinks
    promote merely the discovery of copyrighted content, and not its substitution (and commercial
    What role do clicks on traditional search engines play in the revenue generation for content producers? 11.
    Perplexity’s business is fundamentally distinct from that of traditional search
    engines that also copy a vast amount of content into their indices but do so merely to provide links
    to the originating sites. In its traditional form, a search engine is a tool for discovery, pointing
    searchers to websites such as the pages of The Wall Street Journal or the New York Post, where the
    users can click to find the information and answers they seek. Those clicks in turn provide revenue
    for content producers. In part because traditional search engines that simply provide hyperlinks
    promote merely the discovery of copyrighted content, and not its substitution (and commercial
    Who were the founders of Dow Jones? founded by reporters Charles Dow, Edward Jones, and Charles Bergstresser. Publishing the first
    edition of The Wall Street Journal in July 1889, Dow Jones has now expanded into a worldwide
    news powerhouse. It creates and distributes some of the most widely recognized and reputable
    publications in the news industry, including, in addition to The Wall Street Journal, Dow Jones
    Newswires, MarketWatch, Financial News, and Barron’s.
    29.
    Dow Jones is a trusted source of accurate, original news stories, data and analytics,
    and financial and business insight for millions of customers across the country and around the
    world.
    30.
    A recipient of 39 Pulitzer Prizes, the award-winning newsroom at The Wall Street
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 10
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 10
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step cosine_map@100
1.0 40 0.7519
1.25 50 0.8072
2.0 80 0.7892
2.5 100 0.7949
3.0 120 0.7850
3.75 150 0.7537
4.0 160 0.7905
5.0 200 0.7650
6.0 240 0.7860
6.25 250 0.7806
7.0 280 0.7819
7.5 300 0.7820
8.0 320 0.7820
8.75 350 0.7821
9.0 360 0.7823
10.0 400 0.7813

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.2.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}