SetFit with jinaai/jina-embeddings-v2-base-en

This is a SetFit model that can be used for Text Classification. This SetFit model uses jinaai/jina-embeddings-v2-base-en as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: jinaai/jina-embeddings-v2-base-en
Classification head: a LogisticRegression instance
Maximum Sequence Length: 8192 tokens
Number of Classes: 9 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
ccro:BasedOn	'The axiomatizations presented in Quesada (2010, 2011) also dispense with strong monotonicity.'
ccro:Basedon	'A formal mathematical description of the h-index introduced by Hirsch (2005)' 'Woeginger (2008a, b) and Quesada (2009, 2010) have already suggested characterizations of the Hirsch index' 'Woeginger (2008a, b) and Quesada (2009, 2010) have already suggested characterizations of the Hirsch index'
ccro:Compare	'Instead, a variety of studies [8, 9] have shown that the h index by and large agrees with other objective and subjective measures of scientific quality in a variety of different disciplines (10–15),' 'Instead, a variety of studies [8, 9] have shown that the h index by and large agrees with other objective and subjective measures of scientific quality in a variety of different disciplines (10–15),' 'Instead, a variety of studies [8, 9] have shown that the h index by and large agrees with other objective and subjective measures of scientific quality in a variety of different disciplines (10–15),'
ccro:Contrast	'Hirsch (2005) argues that two individuals with similar Hirsch-index are comparable in terms of their overall scientific impact, even if their total number of papers or their total number of citations is very different.' 'The three differ from Woeginger’s (2008a) characterization in requiring fewer axioms (three instead of five)' 'Marchant (2009), instead of characterizing the index itself, characterizes the ranking that the Hirsch index induces on outputs.'
ccro:Criticize	'The h-index does not take into account that some papers may have extraordinarily many citations, and the g-index tries to compensate for this; see also Egghe (2006b) and Tol (2008).' 'The h-index does not take into account that some papers may have extraordinarily many citations, and the g-index tries to compensate for this; see also Egghe (2006b) and Tol (2008).' 'Woeginger (2008a, p. 227) stresses that his axioms should be interpreted within the context of MON.'
ccro:Discuss	'The relation between N and h will depend on the detailed form of the particular distribution (HI0501-01)' 'As discussed by Redner (HI0501-03), most papers earn their citations over a limited period of popularity and then they are no longer cited.' 'It is also possible that papers "drop out" and then later come back into the h count, as would occur for the kind of papers termed "sleeping beauties" (HI0501-04).'
ccro:Extend	'In [3] the analogous formula for the g-index has been proved'
ccro:Incorporate	'In this paper, we provide an axiomatic characterization of the Hirsch-index, in very much the same spirit as Arrow (1950, 1951), May (1952), and Moulin (1988) did for numerous other problems in mathematical decision making.' 'In this paper, we provide an axiomatic characterization of the Hirsch-index, in very much the same spirit as Arrow (1950, 1951), May (1952), and Moulin (1988) did for numerous other problems in mathematical decision making.' 'In this paper, we provide an axiomatic characterization of the Hirsch-index, in very much the same spirit as Arrow (1950, 1951), May (1952), and Moulin (1988) did for numerous other problems in mathematical decision making.'
ccro:Negate	'Recently, Lehmann et al. (2, 3) have argued that the mean number of citations per paper (nc = Nc/Np) is a superior indicator.' 'If one chose instead to use as indicator of scientific achievement the mean number of citations per paper [following Lehmann et al. (2, 3)], our results suggest that (as in the stock market) ‘‘past performance is not predictive of future performance.’’' 'It has been argued in the literature that one drawback of the h index is that it does not give enough ‘‘credit’’ to very highly cited papers, and various modifications have been proposed to correct this, in particular, Egghe’s g index (4), Jin et al.’s AR index (5), and Komulski’s H(2) index (6).'

Evaluation

Metrics

Label	Accuracy
all	0.6667

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("Corran/CCRO2")
# Run inference
preds = model("One of the referees recommends mentioning Quesada (2008) as another characterization of the Hirsch index relying as well on monotonicity.")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	6	25.7812	53

Label	Training Sample Count
ccro:BasedOn	1
ccro:Basedon	11
ccro:Compare	21
ccro:Contrast	3
ccro:Criticize	4
ccro:Discuss	37
ccro:Extend	1
ccro:Incorporate	14
ccro:Negate	4

Training Hyperparameters

batch_size: (32, 32)
num_epochs: (1, 1)
max_steps: -1
sampling_strategy: oversampling
num_iterations: 100
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0017	1	0.311	-
0.0833	50	0.1338	-
0.1667	100	0.0054	-
0.25	150	0.0017	-
0.3333	200	0.0065	-
0.4167	250	0.0003	-
0.5	300	0.0003	-
0.5833	350	0.0005	-
0.6667	400	0.0004	-
0.75	450	0.0002	-
0.8333	500	0.0002	-
0.9167	550	0.0002	-
1.0	600	0.0002	-

Framework Versions

Python: 3.10.12
SetFit: 1.0.3
Sentence Transformers: 2.2.2
Transformers: 4.35.2
PyTorch: 2.1.0+cu121
Datasets: 2.16.1
Tokenizers: 0.15.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

Corran
/

CCRO2