SetFit with mini1013/master_domain

This is a SetFit model that can be used for Text Classification. This SetFit model uses mini1013/master_domain as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: mini1013/master_domain
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 8 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
6.0	'듬뿍담은 안동식 순살 찜닭 밀키트 711g 주식회사 프레시지' '우렁쌈장 (2인분) 밀키트 쿠킹박스 우렁살 2개 추가(100g) 농업회사법인 주식회사 아임셰프' '홍수계 매콤 당면듬뿍 순살 찜닭 850g 2인분 냉동 밀키트 셀린'
1.0	'[마이셰프] 찹스테이크(1인)(프리미엄박스) 주식회사 마이셰프' '소문난 청정원 호밍스 마포식 돼지양념구이 210g 정원이샵 홈파티음식 캠핑요리 맥주안주 야식 간편식 홈캉스 풍미업 모에모에큥 에스더블유디자인' '심쿡 슈페리어 연어 스테이크 455g 밀키트 쿠킹박스 인영이네'
5.0	'골든벨통상골든벨 심영순쇠고기국간장250ml 주식회사 에스에스지닷컴' 'CJ 튀김가루 1kg 1개 주식회사 에스에스지닷컴' '(치즈박스)쉐프가 만든 캠핑 와인안주세트(고기 포함 안됨 X) 캘리포니아 키친 실속형(-2500)_11/20 월요일 캘리포니아키친(california kitchen)'
4.0	'소고기 버섯 잡채 (2인분) 주식회사 프레시지' '야식메뉴 청정원 호밍스 춘천식 치즈닭갈비 220g 저녁반찬 자취요리 규비에스코퍼레이션' '하림 궁중 국물 닭떡볶이 700g 밀키트 바이라이프'
0.0	'올바르고반듯한 떡볶이 원조시장 떡볶이 (냉동), 575g, 1개 하누코지' '두끼 즉석떡볶이 560G 아이스박스 포장/선택 인터드림' '두끼 매콤 고소 로제떡볶이 3팩 450g 주식회사 다른'
3.0	'[강원팜] 홈스랑 곤드레감자밥 쉽게만들기6인분 강원팜' '마이셰프 즉석밥 일상정원 명란 솥밥 (냉동), 233g, 1개 하누코지' '여름철 보양식 전복죽 200g 1팩 더블제이doubleJ'
7.0	'우정옥 여주 한우 특곰탕 1kg(2인분) 한우사골곰탕 도가니탕 1000g(약 2인분) 주식회사 우정옥' '25년 전통 수복 얼큰 감자탕 [기본팩] 캠핑요리 밀키트 우거지 리얼감자탕 알뜰팩(라면사리X / 야채X) 수복얼큰감자탕' '인천 정통 맛집 장금수 스페셜 부대전골 부대찌개 2-3인분 술안주 캠핑 집들이 밀키트 더렌'
2.0	'1분완성 개별포장 매콤 알싸 비빔 막국수 막국수 1팩 (주)데이지웰푸드' '동원 면발의신 얼큰칼국수 268g 엄마손맛 육수 쉬운요리 감칠맛 자취 풍미 레시피 소스 인영' '샐러드미인 쉐프엠 미트파스타 230g 주식회사 엠디에스코리아'

Evaluation

Metrics

Label	Metric
all	0.9173

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("mini1013/master_cate_fd8")
# Run inference
preds = model("[CJ](신세계 의정부점) 비비고 누룽지닭다리삼계탕 550g  주식회사 에스에스지닷컴")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	3	9.3575	20

Label	Training Sample Count
0.0	50
1.0	50
2.0	50
3.0	50
4.0	50
5.0	50
6.0	50
7.0	50

Training Hyperparameters

batch_size: (512, 512)
num_epochs: (20, 20)
max_steps: -1
sampling_strategy: oversampling
num_iterations: 40
body_learning_rate: (2e-05, 2e-05)
head_learning_rate: 2e-05
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0159	1	0.4347	-
0.7937	50	0.2865	-
1.5873	100	0.0903	-
2.3810	150	0.0636	-
3.1746	200	0.0401	-
3.9683	250	0.003	-
4.7619	300	0.0016	-
5.5556	350	0.0017	-
6.3492	400	0.0025	-
7.1429	450	0.0007	-
7.9365	500	0.0001	-
8.7302	550	0.0001	-
9.5238	600	0.0002	-
10.3175	650	0.0001	-
11.1111	700	0.0008	-
11.9048	750	0.0001	-
12.6984	800	0.0001	-
13.4921	850	0.0	-
14.2857	900	0.0001	-
15.0794	950	0.0	-
15.8730	1000	0.0	-
16.6667	1050	0.0	-
17.4603	1100	0.0	-
18.2540	1150	0.0	-
19.0476	1200	0.0	-
19.8413	1250	0.0	-

Framework Versions

Python: 3.10.12
SetFit: 1.1.0.dev0
Sentence Transformers: 3.1.1
Transformers: 4.46.1
PyTorch: 2.4.0+cu121
Datasets: 2.20.0
Tokenizers: 0.20.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

mini1013
/

master_cate_fd8