Update README.md
Browse files
README.md
CHANGED
@@ -2620,9 +2620,9 @@ model-index:
|
|
2620 |
|
2621 |
## Intended Usage & Model Info
|
2622 |
|
2623 |
-
`jina-
|
2624 |
It is based on a Bert architecture (JinaBert) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
|
2625 |
-
The backbone `jina-bert-
|
2626 |
The model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives.
|
2627 |
These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
|
2628 |
|
@@ -2634,21 +2634,21 @@ Additionally, we provide the following embedding models:
|
|
2634 |
|
2635 |
### V1 (Based on T5, 512 Seq)
|
2636 |
|
2637 |
-
- [`jina-
|
2638 |
-
- [`jina-
|
2639 |
-
- [`jina-
|
2640 |
|
2641 |
### V2 (Based on JinaBert, 8k Seq)
|
2642 |
|
2643 |
-
- [`jina-
|
2644 |
-
- [`jina-
|
2645 |
-
- [`jina-
|
2646 |
|
2647 |
## Data & Parameters
|
2648 |
|
2649 |
-
Jina
|
2650 |
|
2651 |
-
Jina
|
2652 |
|
2653 |
## Usage
|
2654 |
|
@@ -2659,7 +2659,7 @@ from transformers import AutoModel
|
|
2659 |
from numpy.linalg import norm
|
2660 |
|
2661 |
cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
|
2662 |
-
model = AutoModel.from_pretrained('jinaai/jina-
|
2663 |
embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
|
2664 |
print(cos_sim(embeddings[0], embeddings[1]))
|
2665 |
```
|
|
|
2620 |
|
2621 |
## Intended Usage & Model Info
|
2622 |
|
2623 |
+
`jina-embeddings-v2-small-en` is an English, monolingual **embedding model** supporting **8192 sequence length**.
|
2624 |
It is based on a Bert architecture (JinaBert) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
|
2625 |
+
The backbone `jina-bert-v2-small-en` is pretrained on the C4 dataset.
|
2626 |
The model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives.
|
2627 |
These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
|
2628 |
|
|
|
2634 |
|
2635 |
### V1 (Based on T5, 512 Seq)
|
2636 |
|
2637 |
+
- [`jina-embeddings-v1-small-en`](https://huggingface.co/jinaai/jina-embedding-s-en-v1): 35 million parameters.
|
2638 |
+
- [`jina-embeddings-v1-base-en`](https://huggingface.co/jinaai/jina-embedding-b-en-v1): 110 million parameters.
|
2639 |
+
- [`jina-embeddings-v2-large-en`](https://huggingface.co/jinaai/jina-embedding-l-en-v1): 330 million parameters.
|
2640 |
|
2641 |
### V2 (Based on JinaBert, 8k Seq)
|
2642 |
|
2643 |
+
- [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters **(you are here)**.
|
2644 |
+
- [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
|
2645 |
+
- [`jina-embeddings-v2-large-en`](): 435 million parameters (releasing soon).
|
2646 |
|
2647 |
## Data & Parameters
|
2648 |
|
2649 |
+
Jina Embeddings V2 technical report coming soon.
|
2650 |
|
2651 |
+
Jina Embeddings V1 [technical report](https://arxiv.org/abs/2307.11224).
|
2652 |
|
2653 |
## Usage
|
2654 |
|
|
|
2659 |
from numpy.linalg import norm
|
2660 |
|
2661 |
cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
|
2662 |
+
model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-small-en', trust_remote_code=True) # trust_remote_code is needed to use the encode method
|
2663 |
embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
|
2664 |
print(cos_sim(embeddings[0], embeddings[1]))
|
2665 |
```
|