Turkish-NLI/legal_nli_TR_V1 · Loss function recommendation

Hello!

I quite like the work that you've put into your dataset: https://huggingface.co/datasets/Turkish-NLI/legal_nli_TR_V1
So I wanted to share that I normally get the best performance by training with an "anchor-positive-negative" dataset, i.e. every sample consists of a text, a related text (e.g. an entailment) and an unrelated text (e.g. a contradiction and potentially also the neutral option). See for example https://huggingface.co/datasets/sentence-transformers/all-nli/viewer/triplet

Then, I use one of the losses for (anchor, positive, negative) triplets, usually MultipleNegativesRankingLoss.
I think there's a good chance that this will give better performance at relatively little time investment, but it's certainly up to you if you'd like to consider it!

Tom Aarsen