korruz commited on
Commit
6362a69
1 Parent(s): 6961f8d

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,809 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ datasets: []
4
+ language:
5
+ - en
6
+ library_name: sentence-transformers
7
+ license: apache-2.0
8
+ metrics:
9
+ - cosine_accuracy@1
10
+ - cosine_accuracy@3
11
+ - cosine_accuracy@5
12
+ - cosine_accuracy@10
13
+ - cosine_precision@1
14
+ - cosine_precision@3
15
+ - cosine_precision@5
16
+ - cosine_precision@10
17
+ - cosine_recall@1
18
+ - cosine_recall@3
19
+ - cosine_recall@5
20
+ - cosine_recall@10
21
+ - cosine_ndcg@10
22
+ - cosine_mrr@10
23
+ - cosine_map@100
24
+ pipeline_tag: sentence-similarity
25
+ tags:
26
+ - sentence-transformers
27
+ - sentence-similarity
28
+ - feature-extraction
29
+ - generated_from_trainer
30
+ - dataset_size:6300
31
+ - loss:MatryoshkaLoss
32
+ - loss:MultipleNegativesRankingLoss
33
+ widget:
34
+ - source_sentence: Tesla has implemented various remedial measures, including conducting
35
+ training and audits, and enhancements to its site waste management programs, and
36
+ settlement discussions are ongoing.
37
+ sentences:
38
+ - What regulatory body primarily regulates product safety, efficacy, and other aspects
39
+ in the U.S.?
40
+ - What remedial measures has Tesla implemented in response to the investigation
41
+ of its waste segregation practices?
42
+ - What were the main drivers behind the sales growth of TREMFYA?
43
+ - source_sentence: Sales of Alphagan/Combigan in the United States decreased by 40.1%
44
+ from $373 million in 2021 to $121 million in 2023.
45
+ sentences:
46
+ - What were the total revenues from unaffiliated customers in 2021?
47
+ - What was the percentage decrease in sales for Alphagan/Combigan in the United
48
+ States from 2021 to 2023?
49
+ - What percent excess of fair value over carrying value did the Compute reporting
50
+ unit have as of the annual test date in 2023?
51
+ - source_sentence: Long-lived and intangible assets are reviewed for impairment based
52
+ on indicators of impairment and the evaluation involves estimating the future
53
+ undiscounted cash flows attributable to the asset groups.
54
+ sentences:
55
+ - How are long-lived and intangible assets evaluated for impairment?
56
+ - What strategies are being adopted to enhance revenue through acquisition according
57
+ to the business plans described?
58
+ - How is impairment evaluated for long-lived assets such as leases, property, and
59
+ equipment?
60
+ - source_sentence: Our 2023 operating income was $5.5 billion, an improvement of $1.9
61
+ billion compared to 2022.
62
+ sentences:
63
+ - What was the total unrecognized compensation cost related to unvested stock-based
64
+ awards as of October 29, 2023?
65
+ - What significant financial activity occurred in continuing investing activities
66
+ in 2023?
67
+ - What was the operating income for 2023, and how did it compare to 2022?
68
+ - source_sentence: We use raw materials that are subject to price volatility caused
69
+ by weather, supply conditions, political and economic variables and other unpredictable
70
+ factors. We may use futures, options and swap contracts to manage the volatility
71
+ related to the above exposures.
72
+ sentences:
73
+ - What financial instruments does the company use to manage commodity price exposure?
74
+ - What types of legal proceedings is the company currently involved in?
75
+ - What was the net impact of fair value hedging instruments on earnings in 2023?
76
+ model-index:
77
+ - name: BGE base Financial Matryoshka
78
+ results:
79
+ - task:
80
+ type: information-retrieval
81
+ name: Information Retrieval
82
+ dataset:
83
+ name: dim 768
84
+ type: dim_768
85
+ metrics:
86
+ - type: cosine_accuracy@1
87
+ value: 0.6814285714285714
88
+ name: Cosine Accuracy@1
89
+ - type: cosine_accuracy@3
90
+ value: 0.82
91
+ name: Cosine Accuracy@3
92
+ - type: cosine_accuracy@5
93
+ value: 0.8614285714285714
94
+ name: Cosine Accuracy@5
95
+ - type: cosine_accuracy@10
96
+ value: 0.8942857142857142
97
+ name: Cosine Accuracy@10
98
+ - type: cosine_precision@1
99
+ value: 0.6814285714285714
100
+ name: Cosine Precision@1
101
+ - type: cosine_precision@3
102
+ value: 0.2733333333333333
103
+ name: Cosine Precision@3
104
+ - type: cosine_precision@5
105
+ value: 0.17228571428571426
106
+ name: Cosine Precision@5
107
+ - type: cosine_precision@10
108
+ value: 0.08942857142857141
109
+ name: Cosine Precision@10
110
+ - type: cosine_recall@1
111
+ value: 0.6814285714285714
112
+ name: Cosine Recall@1
113
+ - type: cosine_recall@3
114
+ value: 0.82
115
+ name: Cosine Recall@3
116
+ - type: cosine_recall@5
117
+ value: 0.8614285714285714
118
+ name: Cosine Recall@5
119
+ - type: cosine_recall@10
120
+ value: 0.8942857142857142
121
+ name: Cosine Recall@10
122
+ - type: cosine_ndcg@10
123
+ value: 0.7922308461157294
124
+ name: Cosine Ndcg@10
125
+ - type: cosine_mrr@10
126
+ value: 0.7589693877551015
127
+ name: Cosine Mrr@10
128
+ - type: cosine_map@100
129
+ value: 0.7633405151451278
130
+ name: Cosine Map@100
131
+ - task:
132
+ type: information-retrieval
133
+ name: Information Retrieval
134
+ dataset:
135
+ name: dim 512
136
+ type: dim_512
137
+ metrics:
138
+ - type: cosine_accuracy@1
139
+ value: 0.68
140
+ name: Cosine Accuracy@1
141
+ - type: cosine_accuracy@3
142
+ value: 0.8214285714285714
143
+ name: Cosine Accuracy@3
144
+ - type: cosine_accuracy@5
145
+ value: 0.8614285714285714
146
+ name: Cosine Accuracy@5
147
+ - type: cosine_accuracy@10
148
+ value: 0.8957142857142857
149
+ name: Cosine Accuracy@10
150
+ - type: cosine_precision@1
151
+ value: 0.68
152
+ name: Cosine Precision@1
153
+ - type: cosine_precision@3
154
+ value: 0.2738095238095238
155
+ name: Cosine Precision@3
156
+ - type: cosine_precision@5
157
+ value: 0.17228571428571426
158
+ name: Cosine Precision@5
159
+ - type: cosine_precision@10
160
+ value: 0.08957142857142855
161
+ name: Cosine Precision@10
162
+ - type: cosine_recall@1
163
+ value: 0.68
164
+ name: Cosine Recall@1
165
+ - type: cosine_recall@3
166
+ value: 0.8214285714285714
167
+ name: Cosine Recall@3
168
+ - type: cosine_recall@5
169
+ value: 0.8614285714285714
170
+ name: Cosine Recall@5
171
+ - type: cosine_recall@10
172
+ value: 0.8957142857142857
173
+ name: Cosine Recall@10
174
+ - type: cosine_ndcg@10
175
+ value: 0.7914243245771438
176
+ name: Cosine Ndcg@10
177
+ - type: cosine_mrr@10
178
+ value: 0.7576258503401355
179
+ name: Cosine Mrr@10
180
+ - type: cosine_map@100
181
+ value: 0.7617439775393929
182
+ name: Cosine Map@100
183
+ - task:
184
+ type: information-retrieval
185
+ name: Information Retrieval
186
+ dataset:
187
+ name: dim 256
188
+ type: dim_256
189
+ metrics:
190
+ - type: cosine_accuracy@1
191
+ value: 0.69
192
+ name: Cosine Accuracy@1
193
+ - type: cosine_accuracy@3
194
+ value: 0.8271428571428572
195
+ name: Cosine Accuracy@3
196
+ - type: cosine_accuracy@5
197
+ value: 0.8571428571428571
198
+ name: Cosine Accuracy@5
199
+ - type: cosine_accuracy@10
200
+ value: 0.8928571428571429
201
+ name: Cosine Accuracy@10
202
+ - type: cosine_precision@1
203
+ value: 0.69
204
+ name: Cosine Precision@1
205
+ - type: cosine_precision@3
206
+ value: 0.2757142857142857
207
+ name: Cosine Precision@3
208
+ - type: cosine_precision@5
209
+ value: 0.1714285714285714
210
+ name: Cosine Precision@5
211
+ - type: cosine_precision@10
212
+ value: 0.08928571428571426
213
+ name: Cosine Precision@10
214
+ - type: cosine_recall@1
215
+ value: 0.69
216
+ name: Cosine Recall@1
217
+ - type: cosine_recall@3
218
+ value: 0.8271428571428572
219
+ name: Cosine Recall@3
220
+ - type: cosine_recall@5
221
+ value: 0.8571428571428571
222
+ name: Cosine Recall@5
223
+ - type: cosine_recall@10
224
+ value: 0.8928571428571429
225
+ name: Cosine Recall@10
226
+ - type: cosine_ndcg@10
227
+ value: 0.7943028094464931
228
+ name: Cosine Ndcg@10
229
+ - type: cosine_mrr@10
230
+ value: 0.7623684807256232
231
+ name: Cosine Mrr@10
232
+ - type: cosine_map@100
233
+ value: 0.7661836876217925
234
+ name: Cosine Map@100
235
+ - task:
236
+ type: information-retrieval
237
+ name: Information Retrieval
238
+ dataset:
239
+ name: dim 128
240
+ type: dim_128
241
+ metrics:
242
+ - type: cosine_accuracy@1
243
+ value: 0.6657142857142857
244
+ name: Cosine Accuracy@1
245
+ - type: cosine_accuracy@3
246
+ value: 0.8042857142857143
247
+ name: Cosine Accuracy@3
248
+ - type: cosine_accuracy@5
249
+ value: 0.8457142857142858
250
+ name: Cosine Accuracy@5
251
+ - type: cosine_accuracy@10
252
+ value: 0.8871428571428571
253
+ name: Cosine Accuracy@10
254
+ - type: cosine_precision@1
255
+ value: 0.6657142857142857
256
+ name: Cosine Precision@1
257
+ - type: cosine_precision@3
258
+ value: 0.2680952380952381
259
+ name: Cosine Precision@3
260
+ - type: cosine_precision@5
261
+ value: 0.16914285714285712
262
+ name: Cosine Precision@5
263
+ - type: cosine_precision@10
264
+ value: 0.08871428571428569
265
+ name: Cosine Precision@10
266
+ - type: cosine_recall@1
267
+ value: 0.6657142857142857
268
+ name: Cosine Recall@1
269
+ - type: cosine_recall@3
270
+ value: 0.8042857142857143
271
+ name: Cosine Recall@3
272
+ - type: cosine_recall@5
273
+ value: 0.8457142857142858
274
+ name: Cosine Recall@5
275
+ - type: cosine_recall@10
276
+ value: 0.8871428571428571
277
+ name: Cosine Recall@10
278
+ - type: cosine_ndcg@10
279
+ value: 0.7784460550829944
280
+ name: Cosine Ndcg@10
281
+ - type: cosine_mrr@10
282
+ value: 0.7434297052154194
283
+ name: Cosine Mrr@10
284
+ - type: cosine_map@100
285
+ value: 0.74745032636981
286
+ name: Cosine Map@100
287
+ - task:
288
+ type: information-retrieval
289
+ name: Information Retrieval
290
+ dataset:
291
+ name: dim 64
292
+ type: dim_64
293
+ metrics:
294
+ - type: cosine_accuracy@1
295
+ value: 0.6342857142857142
296
+ name: Cosine Accuracy@1
297
+ - type: cosine_accuracy@3
298
+ value: 0.7771428571428571
299
+ name: Cosine Accuracy@3
300
+ - type: cosine_accuracy@5
301
+ value: 0.8157142857142857
302
+ name: Cosine Accuracy@5
303
+ - type: cosine_accuracy@10
304
+ value: 0.8642857142857143
305
+ name: Cosine Accuracy@10
306
+ - type: cosine_precision@1
307
+ value: 0.6342857142857142
308
+ name: Cosine Precision@1
309
+ - type: cosine_precision@3
310
+ value: 0.259047619047619
311
+ name: Cosine Precision@3
312
+ - type: cosine_precision@5
313
+ value: 0.16314285714285712
314
+ name: Cosine Precision@5
315
+ - type: cosine_precision@10
316
+ value: 0.08642857142857142
317
+ name: Cosine Precision@10
318
+ - type: cosine_recall@1
319
+ value: 0.6342857142857142
320
+ name: Cosine Recall@1
321
+ - type: cosine_recall@3
322
+ value: 0.7771428571428571
323
+ name: Cosine Recall@3
324
+ - type: cosine_recall@5
325
+ value: 0.8157142857142857
326
+ name: Cosine Recall@5
327
+ - type: cosine_recall@10
328
+ value: 0.8642857142857143
329
+ name: Cosine Recall@10
330
+ - type: cosine_ndcg@10
331
+ value: 0.7508028784634385
332
+ name: Cosine Ndcg@10
333
+ - type: cosine_mrr@10
334
+ value: 0.7143225623582764
335
+ name: Cosine Mrr@10
336
+ - type: cosine_map@100
337
+ value: 0.7188596090649563
338
+ name: Cosine Map@100
339
+ ---
340
+
341
+ # BGE base Financial Matryoshka
342
+
343
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
344
+
345
+ ## Model Details
346
+
347
+ ### Model Description
348
+ - **Model Type:** Sentence Transformer
349
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
350
+ - **Maximum Sequence Length:** 512 tokens
351
+ - **Output Dimensionality:** 768 tokens
352
+ - **Similarity Function:** Cosine Similarity
353
+ <!-- - **Training Dataset:** Unknown -->
354
+ - **Language:** en
355
+ - **License:** apache-2.0
356
+
357
+ ### Model Sources
358
+
359
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
360
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
361
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
362
+
363
+ ### Full Model Architecture
364
+
365
+ ```
366
+ SentenceTransformer(
367
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
368
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
369
+ (2): Normalize()
370
+ )
371
+ ```
372
+
373
+ ## Usage
374
+
375
+ ### Direct Usage (Sentence Transformers)
376
+
377
+ First install the Sentence Transformers library:
378
+
379
+ ```bash
380
+ pip install -U sentence-transformers
381
+ ```
382
+
383
+ Then you can load this model and run inference.
384
+ ```python
385
+ from sentence_transformers import SentenceTransformer
386
+
387
+ # Download from the 🤗 Hub
388
+ model = SentenceTransformer("korruz/bge-base-financial-matryoshka")
389
+ # Run inference
390
+ sentences = [
391
+ 'We use raw materials that are subject to price volatility caused by weather, supply conditions, political and economic variables and other unpredictable factors. We may use futures, options and swap contracts to manage the volatility related to the above exposures.',
392
+ 'What financial instruments does the company use to manage commodity price exposure?',
393
+ 'What types of legal proceedings is the company currently involved in?',
394
+ ]
395
+ embeddings = model.encode(sentences)
396
+ print(embeddings.shape)
397
+ # [3, 768]
398
+
399
+ # Get the similarity scores for the embeddings
400
+ similarities = model.similarity(embeddings, embeddings)
401
+ print(similarities.shape)
402
+ # [3, 3]
403
+ ```
404
+
405
+ <!--
406
+ ### Direct Usage (Transformers)
407
+
408
+ <details><summary>Click to see the direct usage in Transformers</summary>
409
+
410
+ </details>
411
+ -->
412
+
413
+ <!--
414
+ ### Downstream Usage (Sentence Transformers)
415
+
416
+ You can finetune this model on your own dataset.
417
+
418
+ <details><summary>Click to expand</summary>
419
+
420
+ </details>
421
+ -->
422
+
423
+ <!--
424
+ ### Out-of-Scope Use
425
+
426
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
427
+ -->
428
+
429
+ ## Evaluation
430
+
431
+ ### Metrics
432
+
433
+ #### Information Retrieval
434
+ * Dataset: `dim_768`
435
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
436
+
437
+ | Metric | Value |
438
+ |:--------------------|:-----------|
439
+ | cosine_accuracy@1 | 0.6814 |
440
+ | cosine_accuracy@3 | 0.82 |
441
+ | cosine_accuracy@5 | 0.8614 |
442
+ | cosine_accuracy@10 | 0.8943 |
443
+ | cosine_precision@1 | 0.6814 |
444
+ | cosine_precision@3 | 0.2733 |
445
+ | cosine_precision@5 | 0.1723 |
446
+ | cosine_precision@10 | 0.0894 |
447
+ | cosine_recall@1 | 0.6814 |
448
+ | cosine_recall@3 | 0.82 |
449
+ | cosine_recall@5 | 0.8614 |
450
+ | cosine_recall@10 | 0.8943 |
451
+ | cosine_ndcg@10 | 0.7922 |
452
+ | cosine_mrr@10 | 0.759 |
453
+ | **cosine_map@100** | **0.7633** |
454
+
455
+ #### Information Retrieval
456
+ * Dataset: `dim_512`
457
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
458
+
459
+ | Metric | Value |
460
+ |:--------------------|:-----------|
461
+ | cosine_accuracy@1 | 0.68 |
462
+ | cosine_accuracy@3 | 0.8214 |
463
+ | cosine_accuracy@5 | 0.8614 |
464
+ | cosine_accuracy@10 | 0.8957 |
465
+ | cosine_precision@1 | 0.68 |
466
+ | cosine_precision@3 | 0.2738 |
467
+ | cosine_precision@5 | 0.1723 |
468
+ | cosine_precision@10 | 0.0896 |
469
+ | cosine_recall@1 | 0.68 |
470
+ | cosine_recall@3 | 0.8214 |
471
+ | cosine_recall@5 | 0.8614 |
472
+ | cosine_recall@10 | 0.8957 |
473
+ | cosine_ndcg@10 | 0.7914 |
474
+ | cosine_mrr@10 | 0.7576 |
475
+ | **cosine_map@100** | **0.7617** |
476
+
477
+ #### Information Retrieval
478
+ * Dataset: `dim_256`
479
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
480
+
481
+ | Metric | Value |
482
+ |:--------------------|:-----------|
483
+ | cosine_accuracy@1 | 0.69 |
484
+ | cosine_accuracy@3 | 0.8271 |
485
+ | cosine_accuracy@5 | 0.8571 |
486
+ | cosine_accuracy@10 | 0.8929 |
487
+ | cosine_precision@1 | 0.69 |
488
+ | cosine_precision@3 | 0.2757 |
489
+ | cosine_precision@5 | 0.1714 |
490
+ | cosine_precision@10 | 0.0893 |
491
+ | cosine_recall@1 | 0.69 |
492
+ | cosine_recall@3 | 0.8271 |
493
+ | cosine_recall@5 | 0.8571 |
494
+ | cosine_recall@10 | 0.8929 |
495
+ | cosine_ndcg@10 | 0.7943 |
496
+ | cosine_mrr@10 | 0.7624 |
497
+ | **cosine_map@100** | **0.7662** |
498
+
499
+ #### Information Retrieval
500
+ * Dataset: `dim_128`
501
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
502
+
503
+ | Metric | Value |
504
+ |:--------------------|:-----------|
505
+ | cosine_accuracy@1 | 0.6657 |
506
+ | cosine_accuracy@3 | 0.8043 |
507
+ | cosine_accuracy@5 | 0.8457 |
508
+ | cosine_accuracy@10 | 0.8871 |
509
+ | cosine_precision@1 | 0.6657 |
510
+ | cosine_precision@3 | 0.2681 |
511
+ | cosine_precision@5 | 0.1691 |
512
+ | cosine_precision@10 | 0.0887 |
513
+ | cosine_recall@1 | 0.6657 |
514
+ | cosine_recall@3 | 0.8043 |
515
+ | cosine_recall@5 | 0.8457 |
516
+ | cosine_recall@10 | 0.8871 |
517
+ | cosine_ndcg@10 | 0.7784 |
518
+ | cosine_mrr@10 | 0.7434 |
519
+ | **cosine_map@100** | **0.7475** |
520
+
521
+ #### Information Retrieval
522
+ * Dataset: `dim_64`
523
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
524
+
525
+ | Metric | Value |
526
+ |:--------------------|:-----------|
527
+ | cosine_accuracy@1 | 0.6343 |
528
+ | cosine_accuracy@3 | 0.7771 |
529
+ | cosine_accuracy@5 | 0.8157 |
530
+ | cosine_accuracy@10 | 0.8643 |
531
+ | cosine_precision@1 | 0.6343 |
532
+ | cosine_precision@3 | 0.259 |
533
+ | cosine_precision@5 | 0.1631 |
534
+ | cosine_precision@10 | 0.0864 |
535
+ | cosine_recall@1 | 0.6343 |
536
+ | cosine_recall@3 | 0.7771 |
537
+ | cosine_recall@5 | 0.8157 |
538
+ | cosine_recall@10 | 0.8643 |
539
+ | cosine_ndcg@10 | 0.7508 |
540
+ | cosine_mrr@10 | 0.7143 |
541
+ | **cosine_map@100** | **0.7189** |
542
+
543
+ <!--
544
+ ## Bias, Risks and Limitations
545
+
546
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
547
+ -->
548
+
549
+ <!--
550
+ ### Recommendations
551
+
552
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
553
+ -->
554
+
555
+ ## Training Details
556
+
557
+ ### Training Dataset
558
+
559
+ #### Unnamed Dataset
560
+
561
+
562
+ * Size: 6,300 training samples
563
+ * Columns: <code>positive</code> and <code>anchor</code>
564
+ * Approximate statistics based on the first 1000 samples:
565
+ | | positive | anchor |
566
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
567
+ | type | string | string |
568
+ | details | <ul><li>min: 8 tokens</li><li>mean: 45.15 tokens</li><li>max: 281 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 20.65 tokens</li><li>max: 42 tokens</li></ul> |
569
+ * Samples:
570
+ | positive | anchor |
571
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------|
572
+ | <code>The sale and donation transactions closed in June 2022. Total proceeds from the sale were approximately $6,300 (net of transaction and closing costs), resulting in a loss of $13,568, which was recorded in the SM&A expense caption within the Consolidated Statements of Income.</code> | <code>What were Hershey's total proceeds from the sale of a building portion in June 2022, and what was the resulting financial impact?</code> |
573
+ | <code>Operating income margin increased to 7.9% in fiscal 2022 compared to 6.9% in fiscal 2021.</code> | <code>What was the operating income margin for fiscal year 2022 compared to fiscal year 2021?</code> |
574
+ | <code>iPhone® is the Company’s line of smartphones based on its iOS operating system. The iPhone line includes iPhone 15 Pro, iPhone 15, iPhone 14, iPhone 13 and iPhone SE®.</code> | <code>What operating system is used for the Company's iPhone line?</code> |
575
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
576
+ ```json
577
+ {
578
+ "loss": "MultipleNegativesRankingLoss",
579
+ "matryoshka_dims": [
580
+ 768,
581
+ 512,
582
+ 256,
583
+ 128,
584
+ 64
585
+ ],
586
+ "matryoshka_weights": [
587
+ 1,
588
+ 1,
589
+ 1,
590
+ 1,
591
+ 1
592
+ ],
593
+ "n_dims_per_step": -1
594
+ }
595
+ ```
596
+
597
+ ### Training Hyperparameters
598
+ #### Non-Default Hyperparameters
599
+
600
+ - `eval_strategy`: epoch
601
+ - `per_device_train_batch_size`: 32
602
+ - `per_device_eval_batch_size`: 16
603
+ - `gradient_accumulation_steps`: 16
604
+ - `learning_rate`: 2e-05
605
+ - `num_train_epochs`: 4
606
+ - `lr_scheduler_type`: cosine
607
+ - `warmup_ratio`: 0.1
608
+ - `bf16`: True
609
+ - `tf32`: True
610
+ - `load_best_model_at_end`: True
611
+ - `optim`: adamw_torch_fused
612
+ - `batch_sampler`: no_duplicates
613
+
614
+ #### All Hyperparameters
615
+ <details><summary>Click to expand</summary>
616
+
617
+ - `overwrite_output_dir`: False
618
+ - `do_predict`: False
619
+ - `eval_strategy`: epoch
620
+ - `prediction_loss_only`: True
621
+ - `per_device_train_batch_size`: 32
622
+ - `per_device_eval_batch_size`: 16
623
+ - `per_gpu_train_batch_size`: None
624
+ - `per_gpu_eval_batch_size`: None
625
+ - `gradient_accumulation_steps`: 16
626
+ - `eval_accumulation_steps`: None
627
+ - `torch_empty_cache_steps`: None
628
+ - `learning_rate`: 2e-05
629
+ - `weight_decay`: 0.0
630
+ - `adam_beta1`: 0.9
631
+ - `adam_beta2`: 0.999
632
+ - `adam_epsilon`: 1e-08
633
+ - `max_grad_norm`: 1.0
634
+ - `num_train_epochs`: 4
635
+ - `max_steps`: -1
636
+ - `lr_scheduler_type`: cosine
637
+ - `lr_scheduler_kwargs`: {}
638
+ - `warmup_ratio`: 0.1
639
+ - `warmup_steps`: 0
640
+ - `log_level`: passive
641
+ - `log_level_replica`: warning
642
+ - `log_on_each_node`: True
643
+ - `logging_nan_inf_filter`: True
644
+ - `save_safetensors`: True
645
+ - `save_on_each_node`: False
646
+ - `save_only_model`: False
647
+ - `restore_callback_states_from_checkpoint`: False
648
+ - `no_cuda`: False
649
+ - `use_cpu`: False
650
+ - `use_mps_device`: False
651
+ - `seed`: 42
652
+ - `data_seed`: None
653
+ - `jit_mode_eval`: False
654
+ - `use_ipex`: False
655
+ - `bf16`: True
656
+ - `fp16`: False
657
+ - `fp16_opt_level`: O1
658
+ - `half_precision_backend`: auto
659
+ - `bf16_full_eval`: False
660
+ - `fp16_full_eval`: False
661
+ - `tf32`: True
662
+ - `local_rank`: 0
663
+ - `ddp_backend`: None
664
+ - `tpu_num_cores`: None
665
+ - `tpu_metrics_debug`: False
666
+ - `debug`: []
667
+ - `dataloader_drop_last`: False
668
+ - `dataloader_num_workers`: 0
669
+ - `dataloader_prefetch_factor`: None
670
+ - `past_index`: -1
671
+ - `disable_tqdm`: False
672
+ - `remove_unused_columns`: True
673
+ - `label_names`: None
674
+ - `load_best_model_at_end`: True
675
+ - `ignore_data_skip`: False
676
+ - `fsdp`: []
677
+ - `fsdp_min_num_params`: 0
678
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
679
+ - `fsdp_transformer_layer_cls_to_wrap`: None
680
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
681
+ - `deepspeed`: None
682
+ - `label_smoothing_factor`: 0.0
683
+ - `optim`: adamw_torch_fused
684
+ - `optim_args`: None
685
+ - `adafactor`: False
686
+ - `group_by_length`: False
687
+ - `length_column_name`: length
688
+ - `ddp_find_unused_parameters`: None
689
+ - `ddp_bucket_cap_mb`: None
690
+ - `ddp_broadcast_buffers`: False
691
+ - `dataloader_pin_memory`: True
692
+ - `dataloader_persistent_workers`: False
693
+ - `skip_memory_metrics`: True
694
+ - `use_legacy_prediction_loop`: False
695
+ - `push_to_hub`: False
696
+ - `resume_from_checkpoint`: None
697
+ - `hub_model_id`: None
698
+ - `hub_strategy`: every_save
699
+ - `hub_private_repo`: False
700
+ - `hub_always_push`: False
701
+ - `gradient_checkpointing`: False
702
+ - `gradient_checkpointing_kwargs`: None
703
+ - `include_inputs_for_metrics`: False
704
+ - `eval_do_concat_batches`: True
705
+ - `fp16_backend`: auto
706
+ - `push_to_hub_model_id`: None
707
+ - `push_to_hub_organization`: None
708
+ - `mp_parameters`:
709
+ - `auto_find_batch_size`: False
710
+ - `full_determinism`: False
711
+ - `torchdynamo`: None
712
+ - `ray_scope`: last
713
+ - `ddp_timeout`: 1800
714
+ - `torch_compile`: False
715
+ - `torch_compile_backend`: None
716
+ - `torch_compile_mode`: None
717
+ - `dispatch_batches`: None
718
+ - `split_batches`: None
719
+ - `include_tokens_per_second`: False
720
+ - `include_num_input_tokens_seen`: False
721
+ - `neftune_noise_alpha`: None
722
+ - `optim_target_modules`: None
723
+ - `batch_eval_metrics`: False
724
+ - `eval_on_start`: False
725
+ - `eval_use_gather_object`: False
726
+ - `batch_sampler`: no_duplicates
727
+ - `multi_dataset_batch_sampler`: proportional
728
+
729
+ </details>
730
+
731
+ ### Training Logs
732
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
733
+ |:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
734
+ | 0.9697 | 6 | - | 0.7248 | 0.7459 | 0.7534 | 0.6859 | 0.7549 |
735
+ | 1.6162 | 10 | 2.3046 | - | - | - | - | - |
736
+ | 1.9394 | 12 | - | 0.7456 | 0.7601 | 0.7590 | 0.7111 | 0.7599 |
737
+ | 2.9091 | 18 | - | 0.7470 | 0.7652 | 0.7618 | 0.7165 | 0.7622 |
738
+ | 3.2323 | 20 | 1.0018 | - | - | - | - | - |
739
+ | **3.8788** | **24** | **-** | **0.7475** | **0.7662** | **0.7617** | **0.7189** | **0.7633** |
740
+
741
+ * The bold row denotes the saved checkpoint.
742
+
743
+ ### Framework Versions
744
+ - Python: 3.10.12
745
+ - Sentence Transformers: 3.0.1
746
+ - Transformers: 4.44.0
747
+ - PyTorch: 2.4.0+cu121
748
+ - Accelerate: 0.33.0
749
+ - Datasets: 2.21.0
750
+ - Tokenizers: 0.19.1
751
+
752
+ ## Citation
753
+
754
+ ### BibTeX
755
+
756
+ #### Sentence Transformers
757
+ ```bibtex
758
+ @inproceedings{reimers-2019-sentence-bert,
759
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
760
+ author = "Reimers, Nils and Gurevych, Iryna",
761
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
762
+ month = "11",
763
+ year = "2019",
764
+ publisher = "Association for Computational Linguistics",
765
+ url = "https://arxiv.org/abs/1908.10084",
766
+ }
767
+ ```
768
+
769
+ #### MatryoshkaLoss
770
+ ```bibtex
771
+ @misc{kusupati2024matryoshka,
772
+ title={Matryoshka Representation Learning},
773
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
774
+ year={2024},
775
+ eprint={2205.13147},
776
+ archivePrefix={arXiv},
777
+ primaryClass={cs.LG}
778
+ }
779
+ ```
780
+
781
+ #### MultipleNegativesRankingLoss
782
+ ```bibtex
783
+ @misc{henderson2017efficient,
784
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
785
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
786
+ year={2017},
787
+ eprint={1705.00652},
788
+ archivePrefix={arXiv},
789
+ primaryClass={cs.CL}
790
+ }
791
+ ```
792
+
793
+ <!--
794
+ ## Glossary
795
+
796
+ *Clearly define terms in order to be accessible across audiences.*
797
+ -->
798
+
799
+ <!--
800
+ ## Model Card Authors
801
+
802
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
803
+ -->
804
+
805
+ <!--
806
+ ## Model Card Contact
807
+
808
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
809
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.44.0",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.44.0",
5
+ "pytorch": "2.4.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06dd6de9d8abfbeed2daa12e2db63dbf785020f2efd8a33237265690e07bf96d
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff