feat: remove the colber indexing & searching part
Browse files
README.md
CHANGED
@@ -123,6 +123,9 @@ This new release adds new functionality and performance improvements:
|
|
123 |
- [Matryoshka embeddings](https://huggingface.co/blog/matryoshka), which allow users to trade between efficiency and precision flexibly.
|
124 |
- Superior retrieval performance when compared to the English-only `jina-colbert-v1-en`.
|
125 |
|
|
|
|
|
|
|
126 |
## Usage
|
127 |
|
128 |
### Installation
|
@@ -157,52 +160,6 @@ results = RAG.search(query)
|
|
157 |
```
|
158 |
|
159 |
### Stanford ColBERT
|
160 |
-
Typically, you would run the following code to index using the Stanford ColBERT library on a GPU machine. Check the reference at [Stanford ColBERT](https://github.com/stanford-futuredata/ColBERT?tab=readme-ov-file#installation) for more details.
|
161 |
-
|
162 |
-
#### Indexing
|
163 |
-
|
164 |
-
```python
|
165 |
-
from colbert import Indexer
|
166 |
-
from colbert.infra import ColBERTConfig
|
167 |
-
|
168 |
-
if __name__ == "__main__":
|
169 |
-
config = ColBERTConfig(
|
170 |
-
doc_maxlen=512,
|
171 |
-
nbits=2
|
172 |
-
)
|
173 |
-
indexer = Indexer(
|
174 |
-
checkpoint="jinaai/jina-colbert-v2",
|
175 |
-
config=config,
|
176 |
-
)
|
177 |
-
docs = [
|
178 |
-
"ColBERT is a novel ranking model that adapts deep LMs for efficient retrieval.",
|
179 |
-
"Jina-ColBERT is a ColBERT-style model but based on JinaBERT so it can support both 8k context length, fast and accurate retrieval."
|
180 |
-
]
|
181 |
-
indexer.index(name='demo', collection=docs)
|
182 |
-
```
|
183 |
-
|
184 |
-
#### Searching
|
185 |
-
|
186 |
-
```python
|
187 |
-
from colbert import Searcher
|
188 |
-
from colbert.infra import ColBERTConfig
|
189 |
-
|
190 |
-
k = 10
|
191 |
-
|
192 |
-
if __name__ == "__main__":
|
193 |
-
config = ColBERTConfig(
|
194 |
-
query_maxlen=128
|
195 |
-
)
|
196 |
-
searcher = Searcher(
|
197 |
-
index='demo',
|
198 |
-
config=config
|
199 |
-
)
|
200 |
-
query = 'What does ColBERT do?'
|
201 |
-
results = searcher.search(query, k=k)
|
202 |
-
|
203 |
-
```
|
204 |
-
|
205 |
-
#### Creating vectors
|
206 |
|
207 |
```python
|
208 |
from colbert.infra import ColBERTConfig
|
@@ -324,6 +281,8 @@ Additionally, we provide the following embedding models, you can also use them f
|
|
324 |
- [`jina-embeddings-v2-base-zh`](https://huggingface.co/jinaai/jina-embeddings-v2-base-zh): 161 million parameters Chinese-English bilingual model.
|
325 |
- [`jina-embeddings-v2-base-de`](https://huggingface.co/jinaai/jina-embeddings-v2-base-de): 161 million parameters German-English bilingual model.
|
326 |
- [`jina-embeddings-v2-base-es`](https://huggingface.co/jinaai/jina-embeddings-v2-base-es): 161 million parameters Spanish-English bilingual model.
|
|
|
|
|
327 |
|
328 |
## Contact
|
329 |
|
|
|
123 |
- [Matryoshka embeddings](https://huggingface.co/blog/matryoshka), which allow users to trade between efficiency and precision flexibly.
|
124 |
- Superior retrieval performance when compared to the English-only `jina-colbert-v1-en`.
|
125 |
|
126 |
+
[`jina-colbert-v1-en`](https://huggingface.co/jinaai/jina-colbert-v1-en)
|
127 |
+
|
128 |
+
|
129 |
## Usage
|
130 |
|
131 |
### Installation
|
|
|
160 |
```
|
161 |
|
162 |
### Stanford ColBERT
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
163 |
|
164 |
```python
|
165 |
from colbert.infra import ColBERTConfig
|
|
|
281 |
- [`jina-embeddings-v2-base-zh`](https://huggingface.co/jinaai/jina-embeddings-v2-base-zh): 161 million parameters Chinese-English bilingual model.
|
282 |
- [`jina-embeddings-v2-base-de`](https://huggingface.co/jinaai/jina-embeddings-v2-base-de): 161 million parameters German-English bilingual model.
|
283 |
- [`jina-embeddings-v2-base-es`](https://huggingface.co/jinaai/jina-embeddings-v2-base-es): 161 million parameters Spanish-English bilingual model.
|
284 |
+
- [`jina-reranker-v2`](https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual): multilingual reranker model.
|
285 |
+
- [`jina-clip-v1`](https://huggingface.co/jinaai/jina-clip-v1): English multimodal (text-image) embedding model.
|
286 |
|
287 |
## Contact
|
288 |
|