Jack Morris commited on
Commit
66e6e6c
β€’
1 Parent(s): 53aaec6

README formatting

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -8648,10 +8648,14 @@ model-index:
8648
  Our new model that naturally integrates "context tokens" into the embedding process. As of October 1st, 2024, `cde-small-v1` is the best small model (under 400M params) on the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for text embedding models, with an average score of 65.00.
8649
 
8650
  πŸ‘‰ <b><a href="https://colab.research.google.com/drive/1r8xwbp7_ySL9lP-ve4XMJAHjidB9UkbL?usp=sharing">Try on Colab</a></b>
 
8651
  πŸ‘‰ <b><a href="https://arxiv.org/abs/2410.02525">Contextual Document Embeddings (ArXiv)</a></b>
8652
 
8653
  ![CDE Overview Figure](https://i.imgur.com/LyXJZjM.png)
8654
 
 
 
 
8655
  # How to use `cde-small-v1`
8656
 
8657
  Our embedding model needs to be used in *two stages*. The first stage is to gather some dataset information by embedding a subset of the corpus using our "first-stage" model. The second stage is to actually embed queries and documents, conditioning on the corpus information from the first stage. Note that we can do the first stage part offline and only use the second-stage weights at inference time.
 
8648
  Our new model that naturally integrates "context tokens" into the embedding process. As of October 1st, 2024, `cde-small-v1` is the best small model (under 400M params) on the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for text embedding models, with an average score of 65.00.
8649
 
8650
  πŸ‘‰ <b><a href="https://colab.research.google.com/drive/1r8xwbp7_ySL9lP-ve4XMJAHjidB9UkbL?usp=sharing">Try on Colab</a></b>
8651
+ <br>
8652
  πŸ‘‰ <b><a href="https://arxiv.org/abs/2410.02525">Contextual Document Embeddings (ArXiv)</a></b>
8653
 
8654
  ![CDE Overview Figure](https://i.imgur.com/LyXJZjM.png)
8655
 
8656
+ <br>
8657
+ <hr>
8658
+
8659
  # How to use `cde-small-v1`
8660
 
8661
  Our embedding model needs to be used in *two stages*. The first stage is to gather some dataset information by embedding a subset of the corpus using our "first-stage" model. The second stage is to actually embed queries and documents, conditioning on the corpus information from the first stage. Note that we can do the first stage part offline and only use the second-stage weights at inference time.