joe32140 commited on
Commit
ec7e318
1 Parent(s): c00b71b

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -9068,8 +9068,9 @@ model-index:
9068
 
9069
  This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the [msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1](https://huggingface.co/datasets/sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
9070
 
9071
- I finetune ModernBERT-base using script from offical repo [train_st.py](https://github.com/AnswerDotAI/ModernBERT/blob/main/examples/train_st.py) on a RTX 4090 GPU with batch size 128. Training for 1 epoch takes less than an hour.
9072
- With larger batch size, the finetuned model performs better than that recorded in the paper. See MTEB results in the result folder [mteb](https://huggingface.co/joe32140/ModernBERT-base-msmarco/tree/main/mteb).
 
9073
 
9074
  Training logs can be found here: https://api.wandb.ai/links/joe32140/ekuauaao.
9075
 
 
9068
 
9069
  This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the [msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1](https://huggingface.co/datasets/sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
9070
 
9071
+ I finetune ModernBERT-base using script from offical repo [train_st.py](https://github.com/AnswerDotAI/ModernBERT/blob/main/examples/train_st.py) on a RTX 4090 GPU with the only change of setting mini-batch size of `CachedMultipleNegativesRankingLoss` to 128. Training for 1 epoch takes less than an hour.
9072
+
9073
+ The mini-batch size of GradCache should not change model performnace, but the finetuned model performs better than that recorded in the paper. See MTEB results in the result folder [mteb](https://huggingface.co/joe32140/ModernBERT-base-msmarco/tree/main/mteb).
9074
 
9075
  Training logs can be found here: https://api.wandb.ai/links/joe32140/ekuauaao.
9076