joe32140
/

ModernBERT-large-msmarco

Sentence Similarity

sentence-transformers

feature-extraction

Generated from Trainer

dataset_size:11662655

loss:CachedMultipleNegativesRankingLoss

Inference Endpoints

Model card Files Files and versions Community

joe32140 commited on 3 days ago

Commit

1692eb9

•

1 Parent(s): 16f9f23

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -113,7 +113,7 @@ model-index:
 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) on the [msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1](https://huggingface.co/datasets/sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1) dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
-I finetune ModernBERT-base using script from offical repo [train_st.py](https://github.com/AnswerDotAI/ModernBERT/blob/main/examples/train_st.py) on a RTX 4090 GPU with the only change of setting mini-batch size of `CachedMultipleNegativesRankingLoss` to 64. Training for 1 epoch takes less than an hour.
 The mini-batch size of GradCache should not change model performnace, but the finetuned model performs better than that recorded in the paper.

 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) on the [msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1](https://huggingface.co/datasets/sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1) dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+I finetune ModernBERT-base using script from offical repo [train_st.py](https://github.com/AnswerDotAI/ModernBERT/blob/main/examples/train_st.py) on a RTX 4090 GPU with the only change of setting mini-batch size of `CachedMultipleNegativesRankingLoss` to 64. Training for 1 epoch takes less than 2 hours.
 The mini-batch size of GradCache should not change model performnace, but the finetuned model performs better than that recorded in the paper.