Muennighoff commited on
Commit
e5687b2
1 Parent(s): d07d854

Add model_meta to make space show up

Browse files
Files changed (1) hide show
  1. model_meta.yml +256 -0
model_meta.yml ADDED
@@ -0,0 +1,256 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_meta:
2
+ sentence-transformers/all-MiniLM-L6-v2:
3
+ link: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
4
+ revision: 8b3219a92973c328a8e22fadcfa821b5dc75636a
5
+ desc: all-MiniLM-L6-v2 by Sentence Transformers
6
+ seq_len: 512
7
+ size: 23
8
+ dim: 384
9
+ license: Apache 2.0
10
+ organization: Sentence Transformers
11
+ mteb_overall: 56.26
12
+ mteb_retrieval: 41.95
13
+ mteb_sts: 78.90
14
+ mteb_clustering: 42.35
15
+ intfloat/multilingual-e5-small:
16
+ link: https://huggingface.co/intfloat/multilingual-e5-small
17
+ revision: e4ce9877abf3edfe10b0d82785e83bdcb973e22e
18
+ desc: multilingual-e5-small by Microsoft
19
+ seq_len: 512
20
+ size: 44
21
+ dim: 384
22
+ license: MIT License
23
+ organization: Microsoft
24
+ mteb_overall: 57.87
25
+ mteb_retrieval: 46.64
26
+ mteb_sts: 79.10
27
+ mteb_clustering: 37.08
28
+ intfloat/multilingual-e5-large-instruct:
29
+ link: https://huggingface.co/intfloat/multilingual-e5-large-instruct
30
+ revision: baa7be480a7de1539afce709c8f13f833a510e0a
31
+ desc: multilingual-e5-large-instruct by Microsoft
32
+ seq_len: 514
33
+ size: 560
34
+ dim: 1024
35
+ license: MIT License
36
+ organization: Microsoft
37
+ instruction_query_arxiv: Given a query, retrieve a relevant paper title and abstract from arXiv
38
+ instruction_query_wikipedia: Given a query, retrieve a relevant title and passage from Wikipedia
39
+ instruction_query_stackexchange: Given a query, retrieve a relevant question and answer from Stack Exchange
40
+ instruction_sts: Retrieve semantically similar text
41
+ instruction_clustering: Identify the topic/theme/category of the text
42
+ mteb_overall: 64.41
43
+ mteb_retrieval: 52.47
44
+ mteb_sts: 84.78
45
+ mteb_clustering: 47.10
46
+ intfloat/e5-mistral-7b-instruct:
47
+ link: https://huggingface.co/intfloat/e5-mistral-7b-instruct
48
+ revision: 07163b72af1488142a360786df853f237b1a3ca1
49
+ desc: e5-mistral-7b-instruct by Microsoft
50
+ seq_len: 32768
51
+ size: 7111
52
+ dim: 4096
53
+ license: MIT License
54
+ organization: Microsoft
55
+ instruction_query_arxiv: Given a query, retrieve a relevant paper title and abstract from arXiv
56
+ instruction_query_wikipedia: Given a query, retrieve a relevant title and passage from Wikipedia
57
+ instruction_query_stackexchange: Given a query, retrieve a relevant question and answer from Stack Exchange
58
+ instruction_sts: Retrieve semantically similar text
59
+ instruction_clustering: Identify the topic/theme/category of the text
60
+ mteb_overall: 66.63
61
+ mteb_retrieval: 56.89
62
+ mteb_sts: 84.63
63
+ mteb_clustering: 50.26
64
+ GritLM/GritLM-7B:
65
+ link: https://huggingface.co/GritLM/GritLM-7B
66
+ revision: 13f00a0e36500c80ce12870ea513846a066004af
67
+ desc: GritLM-7B by Contextual AI, HKU, Microsoft
68
+ seq_len: 32768
69
+ size: 7240
70
+ dim: 4096
71
+ license: Apache 2.0
72
+ organization: Contextual AI, HKU, Microsoft
73
+ instruction_query_arxiv: Given a query, retrieve a relevant paper title and abstract from arXiv
74
+ instruction_query_wikipedia: Given a query, retrieve a relevant title and passage from Wikipedia
75
+ instruction_query_stackexchange: Given a query, retrieve a relevant question and answer from Stack Exchange
76
+ instruction_sts: Retrieve semantically similar text
77
+ instruction_clustering: Identify the topic/theme/category of the text
78
+ mteb_overall: 66.76
79
+ mteb_retrieval: 57.41
80
+ mteb_sts: 83.35
81
+ mteb_clustering: 50.61
82
+ BAAI/bge-large-en-v1.5:
83
+ link: https://huggingface.co/BAAI/bge-large-en-v1.5
84
+ revision: d4aa6901d3a41ba39fb536a557fa166f842b0e09
85
+ desc: bge-large-en-v1.5 by BAAI
86
+ seq_len: 512
87
+ size: 335
88
+ dim: 1024
89
+ license: MIT
90
+ organization: BAAI
91
+ mteb_overall: 64.23
92
+ mteb_retrieval: 54.29
93
+ mteb_sts: 83.11
94
+ mteb_clustering: 46.08
95
+ nvidia/NV-Embed-v1:
96
+ link: https://huggingface.co/nvidia/NV-Embed-v1
97
+ revision: 77b11725df91ca45663471a0f2ec6c06e04cbadb
98
+ desc: NV-Embed-v1 by Nvidia
99
+ seq_len: 32768
100
+ size: 7851
101
+ dim: 4096
102
+ license: CC-BY-NC-4.0
103
+ organization: Nvidia
104
+ mteb_overall: 69.32
105
+ mteb_retrieval: 59.36
106
+ mteb_sts: 82.84
107
+ mteb_clustering: 52.8
108
+ Alibaba-NLP/gte-Qwen2-7B-instruct:
109
+ link: https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct
110
+ revision: e26182b2122f4435e8b3ebecbf363990f409b45b
111
+ desc: gte-Qwen2-7B-instruct by Alibaba
112
+ seq_len: 131072
113
+ size: 7613
114
+ dim: 3584
115
+ license: Apache 2.0
116
+ organization: Alibaba
117
+ instruction_query_arxiv: Given a query, retrieve a relevant paper title and abstract from arXiv
118
+ instruction_query_wikipedia: Given a query, retrieve a relevant title and passage from Wikipedia
119
+ instruction_query_stackexchange: Given a query, retrieve a relevant question and answer from Stack Exchange
120
+ instruction_clustering: Identify the topic/theme/category of the text
121
+ instruction_sts: Retrieve semantically similar text
122
+ mteb_overall: 70.24
123
+ mteb_retrieval: 60.25
124
+ mteb_sts: 83.04
125
+ mteb_clustering: 56.92
126
+ Salesforce/SFR-Embedding-2_R:
127
+ link: https://huggingface.co/Salesforce/SFR-Embedding-2_R
128
+ revision: 91762139d94ed4371a9fa31db5551272e0b83818
129
+ desc: SFR-Embedding-2_R by Salesforce
130
+ seq_len: 32768
131
+ size: 7111
132
+ dim: 4096
133
+ license: CC-BY-NC-4.0
134
+ organization: Salesforce
135
+ instruction_query_arxiv: Given a query, retrieve a relevant paper title and abstract from arXiv
136
+ instruction_query_wikipedia: Given a query, retrieve a relevant title and passage from Wikipedia
137
+ instruction_query_stackexchange: Given a query, retrieve a relevant question and answer from Stack Exchange
138
+ instruction_clustering: Identify the topic/theme/category of the text
139
+ instruction_sts: Retrieve semantically similar text
140
+ mteb_overall: 70.31
141
+ mteb_retrieval: 60.18
142
+ mteb_sts: 81.26
143
+ mteb_clustering: 56.17
144
+ jinaai/jina-embeddings-v2-base-en:
145
+ link: https://huggingface.co/jinaai/jina-embeddings-v2-base-en
146
+ revision: 31b72fbf354fea65264ec54edf0b189d94b92d39
147
+ desc: jina-embeddings-v2-base-en by Jina AI
148
+ seq_len: 8192
149
+ size: 137
150
+ dim: 768
151
+ license: Apache 2.0
152
+ organization: Jina AI
153
+ mteb_overall: 60.38
154
+ mteb_retrieval: 47.87
155
+ mteb_sts: 80.70
156
+ mteb_clustering: 41.73
157
+ mixedbread-ai/mxbai-embed-large-v1:
158
+ link: https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1
159
+ revision: 990580e27d329c7408b3741ecff85876e128e203
160
+ desc: mxbai-embed-large-v1 by mixedbread.ai
161
+ seq_len: 512
162
+ size: 335
163
+ dim: 1024
164
+ license: Apache 2.0
165
+ organization: mixedbread.ai
166
+ mteb_overall: 64.68
167
+ mteb_retrieval: 54.39
168
+ mteb_sts: 85.00
169
+ mteb_clustering: 46.71
170
+ nomic-ai/nomic-embed-text-v1.5:
171
+ link: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5
172
+ revision: b0753ae76394dd36bcfb912a46018088bca48be0
173
+ desc: nomic-embed-text-v1.5 by nomic.ai
174
+ seq_len: 8192
175
+ size: 137
176
+ dim: 768
177
+ license: Apache 2.0
178
+ organization: nomic.ai
179
+ mteb_overall: 62.28
180
+ mteb_retrieval: 53.01
181
+ mteb_sts: 81.94
182
+ mteb_clustering: 43.93
183
+ McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised:
184
+ link: https://huggingface.co/McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised
185
+ revision: baa8ebf04a1c2500e61288e7dad65e8ae42601a7
186
+ desc: LLM2Vec by McGill
187
+ seq_len: 8192
188
+ size: 7505
189
+ dim: 4096
190
+ license: MIT
191
+ organization: McGill
192
+ mteb_overall: 65.01
193
+ mteb_retrieval: 56.63
194
+ mteb_sts: 83.58
195
+ mteb_clustering: 46.45
196
+ voyage-multilingual-2:
197
+ link: https://docs.voyageai.com/docs/embeddings
198
+ revision: "1"
199
+ desc: voyage-multilingual-2 by Voyage AI
200
+ seq_len: 32000
201
+ dim: 1024
202
+ license: Proprietary
203
+ organization: Voyage AI
204
+ voyage-large-2-instruct:
205
+ link: https://docs.voyageai.com/docs/embeddings
206
+ revision: "1"
207
+ desc: voyage-large-2-instruct by Voyage AI
208
+ seq_len: 16000
209
+ dim: 1024
210
+ license: Proprietary
211
+ organization: Voyage AI
212
+ mteb_overall: 68.28
213
+ mteb_retrieval: 58.28
214
+ mteb_sts: 84.58
215
+ mteb_clustering: 53.35
216
+ text-embedding-004:
217
+ link: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api
218
+ revision: "1"
219
+ desc: text-embedding-004 by Google
220
+ seq_len: 2048
221
+ dim: 768
222
+ license: Proprietary
223
+ organization: Google
224
+ mteb_overall: 66.31
225
+ mteb_retrieval: 55.7
226
+ mteb_sts: 85.07
227
+ mteb_clustering: 47.48
228
+ text-embedding-3-large:
229
+ link: https://platform.openai.com/docs/guides/embeddings
230
+ revision: "1"
231
+ desc: text-embedding-3-large by OpenAI
232
+ seq_len: 8191
233
+ dim: 3072
234
+ license: Proprietary
235
+ organization: OpenAI
236
+ mteb_overall: 64.59
237
+ mteb_retrieval: 55.44
238
+ mteb_sts: 81.73
239
+ mteb_clustering: 49.01
240
+ embed-english-v3.0:
241
+ link: https://docs.cohere.com/docs/cohere-embed
242
+ revision: "1"
243
+ desc: embed-english-v3.0 by Cohere
244
+ seq_len: 512
245
+ dim: 1024
246
+ license: Proprietary
247
+ organization: Cohere
248
+ mteb_overall: 64.47
249
+ mteb_retrieval: 55
250
+ mteb_sts: 82.62
251
+ mteb_clustering: 47.43
252
+ BM25:
253
+ link: https://github.com/xhluca/bm25s
254
+ desc: Fast lexical search via BM25
255
+ license: MIT
256
+ mteb_retrieval: 42.4